Since its creation, the Web has been a main object of research for information management, which has been primarily studied using classical paradigms. However, since the early 2000s, we are witnessing drastic changes in the area of Web data management. If we had to summarize them in one sentence, it would be: real distribution of big data.
In this new scenario, capturing the meaning of heterogeneous data and developing tools for its processing play a crucial role. The Semantic Web is an enormous initiative led by the World Wide Web Consortium whose main objective is to achieve these goals, thus transforming the current Web of documents into a Web of data, where human users and computer applications can take a better advantage of the massive amount of information stored on it. Some key steps have been made to achieve these goals. However, we are still far from having techniques that take full advantage of the semantics and the logic behind Web data, once its structure, scale and distribution –altogether– are considered as a full-fledged phenomenon.
The main goal of the Center for Semantic Web Research is to study how to effectively extract semantic data from the Web, and to develop the basic tools for such effective extraction. This is an initiative that brings together professors, researchers and students from Pontifical Catholic University of Chile, University of Chile and University of Talca, and which is funded by the Iniciativa Científica Milenio.
Computing certain answers is the standard way of answering queries over incomplete data; it is also used in many applications such as data integration, data exchange, consistent query answering, ontology-based data access, etc. Unfortunately certain answers are often computationally expensive, and in most applications their complexity is intolerable if one goes beyond the class of conjunctive queries (CQs), or a slight extension thereof. However, high computational complexity does not yet mean one cannot approximate certain answers efficiently. In this talk we survey several recent results on finding such efficient and correct approximations, going significantly beyond CQs. We do so in a setting of databases with missing values, and first-order (relational calculus/algebra) queries. Even the class of queries where the standard database evaluation produces correct answers is larger than previously thought. When it comes to approximations, we present two schemes with good theoretical complexity. One of them also performs very well in practice, and restores correctness of SQL query evaluation on databases with nulls.
Modern computing tasks such as real-time analytics require constant refresh of query results under high update rates. Incremental View Maintenance (IVM) approaches this problem by materializing results in order to avoid recomputations. IVM naturally induces a trade-off between the space needed to maintain the materialized results and the time used to process updates. In this talk we will discuss a new approach for evaluating queries under updates. Instead of the materialization of results, we aim to maintain a data structure featuring (1) efficient maintenance under updates, (2) constant-delay enumeration of the output, (3) constant-time lookups in the output, while (4) using only linear space in the size of the database. We will describe DYN, a dynamic version of the Yannakakis algorithm, and show that it yields such data structures for the class of free-connex acyclic marginalize-join queries. We show that this is optimal in the sense that such a data structure cannot exist for marginalize-join queries that are not free-connex acyclic. In addition, we identify a sub-class of queries for which DYN features constant-time update per tuple. We introduce a cost model for DYN under different join trees, and describe how to evaluate under updates the more general class of acyclic aggregate join queries. Finally, using the industry-standard benchmarks TPC-H and TPC-DS, we experimentally compare DYN and a higher-order IVM (HIVM) engine.
Jorge Pérez, Marcelo Arenas and Claudio Gutierrez were awarded the Semantic Web Science Association (SWSA) Ten-Year Award as the International Semantic Web Conference (ISWC 2016) in recognition of the considerable impact that their paper "Semantics and Complexity of SPARQL" has had on the community since its original publication at ISWC 2006. The paper outlined a formal framework for SPARQL that has been cited and reused in hundreds of subsequent works studying the query language.
Renzo Angles and Claudio Gutierrez were awarded "Best Research Paper" at the International Semantic Web Conference (ISWC 2016) for their paper entitled "The multiset semantics of SPARQL patterns". The paper provides a formal comparison of the various ways in which negation can be expressed in SPARQL and possible semantics to characterise negation when duplicate results are allowed.
El investigador Jorge Pérez participó en el Festival de Innovación Social "Más mujeres en Tecnología" cómo parte de las actividades del Núcleo Milenio Centro de Investigación de la Web Semántica, en este panel, Academia Ada Lovelace de Girls in Tech Chile reunió a líderes de Gobierno, Comunidad Mujer, Mujeres del Pacífico, Laboratoria, C100, InnovaCien y Kodea para aunar esfuerzos y revisar distintas prácticas y estrategias para generar un ecosistema de creación tecnológica con más mujeres.
Como parte de las actividades de vinculación con el medio, el investigador del Núcleo Milenio de la Web Semántica participó en Taller LEARN sobre Datos de Investigación Implementación de Políticas y Estrategias en América Latina y el Caribe a través de la charla "Datos abiertos en el mundo de la Ciencia”. Más información en este enlace.