Mobile Semantic Search

Ricardo Baeza-Yates

Sala Philippe Flajolet (3er. piso edificio poniente), DCC, Universidad de Chile.

07:00 05/01/2018

Semantic search lies in the cross roads of information retrieval and natural language processing and is the current frontier of search technology. The first part consist in building a semantically annotated index with the help of a knowledge base. For this we first need to predict the language of each document and parse it accordingly to that language. Second, we need to extract all entities and concepts mentioned in the document with the help of the knowledge base. All the knowledge base infrastructure needs to be independent of the language and we instantiate each language in the lexicon of the knowledge base.
The second part is predicting the intention behind the query, which implies doing semantic query understanding. This process implies the same semantic processing as document. After, based on all this information, we have to predict one or more possible intentions with a certain probability, which is particularly important for ambiguous queries. These scores will be one of the inputs for the final semantic ranking. For example, given the query ``bond'', possible results for query understanding are a financial instrument, the movie character, a chemical reaction, or a term for endearment. 
Semantic ranking refers to ranking search results using semantic information. In a standard search engine, a rank is computed by using signals or features coming from the search query, from the documents in the collection being searched and from the search context, such as the language and device being used. In our case we add semantic relations between the entities and concepts found in the query was the same objects in the documents, that will come from different data sources. For this we use machine learning in several stages. The first stage selects the data sources that we should use to answer the query. In the second stage, each data source generates a set of answers using ``learning to rank.'' The third and final stage ranks these data sources, selecting and ordering the intentions as well as the answers inside each intention (e.g., news) that will appear in the final composite answer. All these stages are language independent, but may use language dependent features.