Since its creation, the Web has been a main object of research for information management, which has been primarily studied using classical paradigms. However, since the early 2000s, we are witnessing drastic changes in the area of Web data management. If we had to summarize them in one sentence, it would be: real distribution of big data.
In this new scenario, capturing the meaning of heterogeneous data and developing tools for its processing play a crucial role. The Semantic Web is an enormous initiative led by the World Wide Web Consortium whose main objective is to achieve these goals, thus transforming the current Web of documents into a Web of data, where human users and computer applications can take a better advantage of the massive amount of information stored on it. Some key steps have been made to achieve these goals. However, we are still far from having techniques that take full advantage of the semantics and the logic behind Web data, once its structure, scale and distribution –altogether– are considered as a full-fledged phenomenon.
The main goal of the Center for Semantic Web Research is to study how to effectively extract semantic data from the Web, and to develop the basic tools for such effective extraction. This is an initiative that brings together professors, researchers and students from Pontifical Catholic University of Chile, University of Chile and University of Talca, and which is funded by the Iniciativa Científica Milenio.
This talk addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets. We build Twitter-specific opinion lexicons by training words-level classifiers using representations that exploit different sources of information such as (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the generated lexicons produce significant improvements over existing manually annotated lexicons for tweet-level polarity classification. In the second part, we develop distant supervision methods for generating synthetic training data for Twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.
Weakly-sticky (WS ) Datalog is an expressive member of the family of Datalog programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs from the point of view of the behavior of the chase procedure, extending the stickiness property of the chase to that of generalized stickiness of the chase (gsch-property). With this property we specify the semantic class of GSCh programs, which includes sticky and WS programs, and other syntactic subclasses that we identify. In particular, we introduce joint-weakly-sticky (JWS) programs, that include WS programs. We also propose a bottom-up QA algorithm for a range of subclasses of GSCh. The algorithm runs in polynomial time (in data) for JWS programs. Unlike the WS class, JWS is closed under a general magic-sets rewriting procedure for the optimization of programs with existential rules. We apply the magic-sets rewriting in combination with the proposed QA algorithm for the optimization of QA over JWS programs.
Former Undergraduate student and current PhD student Pablo Muñoz under the supervision of Pablo Barceló obtained the "Vienna Center for Logic and Algorithms Outstanding Undergraduate Research Award". This award is given by one of the most important institutions in Computer Science in Europe and Pablo has been invited to present his research work at the center this year.
The Council of Professors and Heads of Computing (CPHC), in conjunction with the British Computer Society (BCS) and the BCS Academy of Computing has selected Dr. Juan Reutter’s dissertation as the winner of the BCS Distinguished Dissertation Award, that annually selects for publication the best British PhD/DPhil dissertation in computer science.
El sitio Web http://constitucionabierta.cl/ donde los ciudadanos pueden subir y visualizar actas de los Encuentros Locales Autoconvocados que ellos mismos decidan hacer públicas, ha tenido interés de la prensa nacional. A continuación puedes leer las notas que han sido publicadas en diferentes medios nacionales:
Nuestro Investigador Jorge Pérez presentó en Arica, una charla titulada: "Sharing Economy: Economía colaborativa, datos y algoritmos". "Tengo un mensaje más general de educación de tratar que estos temas se entiendan, más allá que exista una pelea particular sobre que haya o no un mejor servicio; la idea es que se entiendan las implicancias que van detrás de eso, sobre todo el uso de datos que tienen que ver con tecnología de computación". http://bit.ly/24P5IDz