Since its creation, the Web has been a main object of research for information management, which has been primarily studied using classical paradigms. However, since the early 2000s, we are witnessing drastic changes in the area of Web data management. If we had to summarize them in one sentence, it would be: real distribution of big data.
In this new scenario, capturing the meaning of heterogeneous data and developing tools for its processing play a crucial role. The Semantic Web is an enormous initiative led by the World Wide Web Consortium whose main objective is to achieve these goals, thus transforming the current Web of documents into a Web of data, where human users and computer applications can take a better advantage of the massive amount of information stored on it. Some key steps have been made to achieve these goals. However, we are still far from having techniques that take full advantage of the semantics and the logic behind Web data, once its structure, scale and distribution –altogether– are considered as a full-fledged phenomenon.
The main goal of the Center for Semantic Web Research is to study how to effectively extract semantic data from the Web, and to develop the basic tools for such effective extraction. This is an initiative that brings together professors, researchers and students from Pontifical Catholic University of Chile, University of Chile and University of Talca, and which is funded by the Iniciativa Científica Milenio.
Reverse engineering problems for conjunctive queries (CQs), such as query by example (QBE) or definability, take a set of user examples and convert them into an explanatory CQ. Despite their importance, the complexity of these problems is prohibitively high (coNEXPTIME-complete). We isolate their two main sources of complexity and propose relaxations of them that reduce the complexity while having meaningful theoretical interpretations. The first relaxation is based on the idea of using existential pebble games for approximating homomorphism tests. We show that this characterizes QBE/definability for CQs up to treewidth k while reducing the complexity to EXPTIME. As a side result, we obtain that the complexity of the QBE/definability problems for CQs of treewidth k is EXPTIME-complete for each k≥1. The second relaxation is based on the idea of "desynchronizing" direct products, which characterizes QBE/definability for unions of CQs and reduces the complexity to coNP. The combination of these two relaxations yields tractability for QBE and characterizes it in terms of unions of CQs of treewidth at most k. We also study the complexity of these problems for conjunctive regular path queries over graph databases, showing them to be no more difficult than for CQs.
Computing certain answers is the standard way of answering queries over incomplete data; it is also used in many applications such as data integration, data exchange, consistent query answering, ontology-based data access, etc. Unfortunately certain answers are often computationally expensive, and in most applications their complexity is intolerable if one goes beyond the class of conjunctive queries (CQs), or a slight extension thereof. However, high computational complexity does not yet mean one cannot approximate certain answers efficiently. In this talk we survey several recent results on finding such efficient and correct approximations, going significantly beyond CQs. We do so in a setting of databases with missing values, and first-order (relational calculus/algebra) queries. Even the class of queries where the standard database evaluation produces correct answers is larger than previously thought. When it comes to approximations, we present two schemes with good theoretical complexity. One of them also performs very well in practice, and restores correctness of SQL query evaluation on databases with nulls.
Jorge Pérez, Marcelo Arenas and Claudio Gutierrez were awarded the Semantic Web Science Association (SWSA) Ten-Year Award as the International Semantic Web Conference (ISWC 2016) in recognition of the considerable impact that their paper "Semantics and Complexity of SPARQL" has had on the community since its original publication at ISWC 2006. The paper outlined a formal framework for SPARQL that has been cited and reused in hundreds of subsequent works studying the query language.
Renzo Angles and Claudio Gutierrez were awarded "Best Research Paper" at the International Semantic Web Conference (ISWC 2016) for their paper entitled "The multiset semantics of SPARQL patterns". The paper provides a formal comparison of the various ways in which negation can be expressed in SPARQL and possible semantics to characterise negation when duplicate results are allowed.
El investigador Jorge Pérez participó en el Festival de Innovación Social "Más mujeres en Tecnología" cómo parte de las actividades del Núcleo Milenio Centro de Investigación de la Web Semántica, en este panel, Academia Ada Lovelace de Girls in Tech Chile reunió a líderes de Gobierno, Comunidad Mujer, Mujeres del Pacífico, Laboratoria, C100, InnovaCien y Kodea para aunar esfuerzos y revisar distintas prácticas y estrategias para generar un ecosistema de creación tecnológica con más mujeres.
Como parte de las actividades de vinculación con el medio, el investigador del Núcleo Milenio de la Web Semántica participó en Taller LEARN sobre Datos de Investigación Implementación de Políticas y Estrategias en América Latina y el Caribe a través de la charla "Datos abiertos en el mundo de la Ciencia”. Más información en este enlace.