When searching images for a certain query, ambiguity problems may arise. For instance, when searching for the query “colored Volkswagen beetle” the most common search engines would retrieve some images with the famous car, but also the bug and even a bird that also recalls for that name. Query expansion would avoid this problem by using a context to the query and adding related concepts that may additionally enrich the results or increase its precision. For example and going back to the former example, it would be maybe more useful that the results would contain images of other cars similar to the Volkswagen beetle rather than the fauna images.
Figure 1: Image retrieval with query expansion process diagram
In Figure 1 the process for image retrieval including Query expansion is shown. It starts with the original query Qo which will be the entry point for the Query Expansion module which will then search for several complimenting terms and phrases that will be combined to deliver a Qe enriched query. Qe will be introduced in the regular search engine. The big question here is which are those terms to enrich the query and how we do find them? Accordingly to the research “Massive Query Expansion by Exploiting Graph Knowledge Bases for Image Retrieval” by authors Joan Guisado-Gámez, David Dominguez-Sal and Josep L. Larriba-Pey the answer may be graph techniques.
Their method is based on the topological analysis of the graph built out of a knowledge base. They differ from previous works because they are not considering the links of each article individually, instead they are mining the global link structure of the knowledge base to find related terms using graph mining techniques. The technique consist in, first, locating the most relevant terms in the knowledge base, and connecting them by a path using the knowledge base relations, and second, extracting communities of concepts around the detected path. With this technique they are able to identify: (i) concepts that match with the user need, and (ii) a set of semantically related concepts. Matching concepts provide equivalent reformulations of the query that reduce the vocabulary mismatch. Semantically related concepts introduce a significant set of different terms that are likely to appear in a relevant document.
While in the first phase of this innovative Query Expansion technique it uses the shortest path to find the most direct way to connect a particular word to another related, on the second they are using community search on a graph to enrich the previously computed paths with articles that are closely related.
For the example query, the system finds 182 shortest paths using the English Wikipedia. Among them, nine score 3/2 that is the top score:
volkswagen fox→volkswagen beetle
volkswagen passat→volkswagen beetle
volkswagen type 2→volkswagen beetle
volkswagen golf→volkswagen beetle
volkswagen jetta →volkswagen beetle
volkswagen touareg→volkswagen beetle
volkswagen golf mk4 →volkswagen beetle
volkswagen beetle→volkswagen transporter
The first path in the list is specially relevant because it connects the generic concept Volkswagen to the most specific context Volkswagen Beetle. The rest of the paths are also valuable because they refer to the real intent of the user and discard any path that contains articles about other interpretations of the term beetle.
For the second part of the technique, the most direct solution would be to enrich the path with all the neighbors of the Wikipedia articles. However, when they tested this naive solution, it did not work because it introduced articles that were loosely related to the path. Wikipedia articles have typically many links, and many of them refer to topics that have some type of relation but semantically are very distant. They implemented a community search algorithm to distinguish the semantically strong and weak links. A community in a graph is a set of closely linked nodes which are similar among them but are different from other nodes in the rest of the graph. On our example, on this second part of the query expansion process related terms such as German cars, Volkswagen group, vw bug, VW Type 1, Wolfsburg (which is the city where they were manufactured), Baja bugs (which refers to an original Volkswagen Beetle modified to operate off-road) or Cal Look (name used to refer to customized version of Volkswagen Beetle cars that follow a style coined in California in 1969) are added to enrich the results.
Check the complete article for the experimental results and precision measures!
Read more articles about research using graph databases in our blog. If you think your research will benefit from using a graph database consider using Sparksee, available for Phyton, Java, .Net, C++ and Objective-C. Request one of our FREE licenses under our Research Program.