Since its creation, the Web has been a main object of research for information management, which has been primarily studied using classical paradigms. However, since the early 2000s, we are witnessing drastic changes in the area of Web data management. If we had to summarize them in one sentence, it would be: real distribution of big data.
In this new scenario, capturing the meaning of heterogeneous data and developing tools for its processing play a crucial role. The Semantic Web is an enormous initiative led by the World Wide Web Consortium whose main objective is to achieve these goals, thus transforming the current Web of documents into a Web of data, where human users and computer applications can take a better advantage of the massive amount of information stored on it. Some key steps have been made to achieve these goals. However, we are still far from having techniques that take full advantage of the semantics and the logic behind Web data, once its structure, scale and distribution –altogether– are considered as a full-fledged phenomenon.
The main goal of the Center for Semantic Web Research is to study how to effectively extract semantic data from the Web, and to develop the basic tools for such effective extraction. This is an initiative that brings together professors, researchers and students from Pontifical Catholic University of Chile, University of Chile and University of Talca, and which is funded by the Iniciativa Científica Milenio.
We will present and algebraic and logic semantics for the core of SPARQL, a fragment that correspond to Multiset Datalog with safe negation and Multiset Relational Algebra. We will show that one can leverage for multisets, the classic correspondence between logic, algebra and query languages existing for set semantics.
Statistical query (SQ) algorithms are the class of learning algorithms that can be implemented using approximate expectations of any given function of the input distribution, as opposed to direct access to i.i.d. samples. This computational model has a number of applications, ranging from noise-tolerant learning to differential privacy, and it has been used to obtain unconditional lower bounds on conjectured hard problems over distributions. In this talk I will give an introduction to the theory of SQ algorithms, and we will see some recent developments in the case of stochastic convex optimization. Our main contribution is establishing nearly optimal SQ algorithms for mean vector estimation (this includes stochastic linear programming), which serves as a basis to obtain SQ versions of various gradient-type and polynomial time algorithms for stochastic convex optimization. Time permitting, I will show some consequences of our results for learning of halfspaces, differential privacy, and proving unconditional lower on the power of convex relaxations for random constraint satisfaction problems. This talk is based on joint work with Vitaly Feldman and Santosh Vempala, to appear in SODA 2017.
Jorge Pérez, Marcelo Arenas and Claudio Gutierrez were awarded the Semantic Web Science Association (SWSA) Ten-Year Award as the International Semantic Web Conference (ISWC 2016) in recognition of the considerable impact that their paper "Semantics and Complexity of SPARQL" has had on the community since its original publication at ISWC 2006. The paper outlined a formal framework for SPARQL that has been cited and reused in hundreds of subsequent works studying the query language.
Renzo Angles and Claudio Gutierrez were awarded "Best Research Paper" at the International Semantic Web Conference (ISWC 2016) for their paper entitled "The multiset semantics of SPARQL patterns". The paper provides a formal comparison of the various ways in which negation can be expressed in SPARQL and possible semantics to characterise negation when duplicate results are allowed.
El investigador Jorge Pérez participó en el Festival de Innovación Social "Más mujeres en Tecnología" cómo parte de las actividades del Núcleo Milenio Centro de Investigación de la Web Semántica, en este panel, Academia Ada Lovelace de Girls in Tech Chile reunió a líderes de Gobierno, Comunidad Mujer, Mujeres del Pacífico, Laboratoria, C100, InnovaCien y Kodea para aunar esfuerzos y revisar distintas prácticas y estrategias para generar un ecosistema de creación tecnológica con más mujeres.
Como parte de las actividades de vinculación con el medio, el investigador del Núcleo Milenio de la Web Semántica participó en Taller LEARN sobre Datos de Investigación Implementación de Políticas y Estrategias en América Latina y el Caribe a través de la charla "Datos abiertos en el mundo de la Ciencia”. Más información en este enlace.