|
Este artigo foi originalmente publicado por Tony DeYoung - no e-zine: WEBREFERENCE UPDATE NEWSLETTER - June 7, 2001 Este material só poderá ser republicado se citadas as fontes, e usado em site não fechado ao público em geral. Francisco Panizo Web Master dos portais das dicas:
|
Artigo Especial #1: Visual Search Engines Search technology may be the foundation of the Internet, but if you're looking for rich media content, today's text and keyword-based searches are woefully inadequate. For photography, graphics, logos, audio or video, text descriptions rarely convey the most valuable information. For example, if you're looking for a certain shade of blue sky, it's nearly impossible to find a match in a stock photo library using only textual descriptions. Or if you are using a search engine to look for photographs of tigers on the Web, what keywords do you to get less than 10,000 search results? And how do you ensure that those search results are relevant and worth filtering through? E-commerce sites face the same difficulty. Every try to find something on eBay? Even after you locate one1 item that looks like it might be interesting, how do you find other similar items to comparison shop? Invariably you have to manually search through scores of mostly irrelevant or undecipherable photographs. Shoppers like me have a low tolerance for this and quickly leave. Maybe this is one of the reasons why only 3% of today's online lookers become buyers. Even with the most meticulously keyworded content, searching with text inevitably means that a Web site visitor must know about the keywords used by that site or master a complex syntax for specifying non-trivial searches. (e.g. I wrote to a stock photo site looking for help on finding an image of a cowboy. They responded back that I should type "cowboy =man cowboy =male cowboy =portrait cowboy =close." Now that is intuitive...NOT.) And of course text-based search engines only work if you speak fluent English. As Web site content grows exponentially in non-textual types of information, it is apparent that text-based search engines are becoming less equipped to provide good results. In response to this deficit, several university labs and commercial software companies have developed tools that allow a visual search of images and products. With visual search, a user can make selections based on images rather than text. Most of these systems operate in a similar way the user performs a query by choosing an image which is somewhat similar to the desired images and then the engine does a pattern recognition search using global/local comparisons of color, shape or texture. So for example, you find a sample of a sunset and then ask the search engine to find images with similar red and gold colors. This approach works when the entire image scene is distinctive and relevant, but it gives a lot of obviously wrong results for complex images or large databases. I recently began experimenting with a beta release of a visual search engine (VSE) toolkit that implements an object-based approach to improve the accuracy of visual search results. The Java-based toolkit is by eVision (http//www.evisionglobal.com) and is available as a free download. Even if you are not a top notch Java programmer, you can produce basic visual search applications using just the high-level API. The approach that eVision takes is to 1) treat photographs as a collection of objects, rather than as one big
undifferentiated image and >Object-based searches When we look at photographs, we look for patterns and objects. We identify a photograph that is 10% brown and 90% green as a brown horse in a grassy field. So when searching for similar images, we would not be confused by a photograph of a green river dotted with 10% brown fallen tree branches. But general-purpose VSEs could identify a horse in the field and tree branches in a green river as very similar. They look at the image as one big undifferentiated group of RGB values.An object-based VSE like eVision, tries to first identify the objects in an image before doing a comparative search. While it can't attribute the meaning of horse to the brown object, it can say that the photo is composed of two distinct objects - a brown one with a particular shape and a green background. Then it runs visual comparisons to other images based on these regions. For example, with the photo of a horse in a green field and a color similarity search, a general-VSE would say "This photo has 90% green in it and 10% brown so find photos that have this same proportion of colors." eVision would say "This photo has two objects in it, 1 object is 100% green and the other is 100% brown so find photos that contain a 100% green object and a 100% brown object." For the non-object way, you would get horses in fields, a forest (trunks are brown), brown scum in a green river, a green lawn covered in 10% dog droppings etc. With eVision you would get horses in fields, a horse-sized dog on a grassy background etc. The latter matches are certainly closer to the sample image and much much more like the way humans see things. We see objects, not distributions of color. [voltar] |