Querying Wikidata: Comparing SPARQL, Relational and Graph Databases

Wikidata is a new knowledge-base overseen by the Wikimedia foundation and collaboratively edited by a community of thousands of users. The goal of Wikidata is to provide a common interoperable source of factual information for Wikimedia projects, foremost of which is Wikipedia. In this talk, we present the result of our experiments that compare experimentally the efficiency of various database engines for the purposes of querying the Wikidata knowledge-base, which can be conceptualized as a directed edge-labelled graph where edges can be annotated with meta-information called qualifiers. We take two popular SPARQL databases (Virtuoso, Blazegraph), a popular relational database (PostgreSQL), and a popular graph database (Neo4J) for comparison and discuss various options as to how Wikidata can b e represented in the models of each engine. We design a set of experiments to test the relative query performance of these representations in the context of their respective engines. We first execute a large set of atomic lookups to establish a baseline performance for each test setting, and subsequently perform experiments on instances of more complex graph patterns based on real-world examples. We conclude with a summary of the strengths and limitations of the engines observed. The talk is bases on a paper made with Aidan Hogan, Christian Riveros, Carlos Rojas and Enzo Zerega, that will be presented in ISWC 2016.