Efficient Computation of Certain Answers: Breaking the CQ Barrier

Computing certain answers is the standard way of answering queries over incomplete data; it is also used in many applications such as data integration, data exchange, consistent query answering, ontology-based data access, etc. Unfortunately certain answers are often computationally expensive, and in most applications their complexity is intolerable if one goes beyond the class of conjunctive queries (CQs), or a slight extension thereof. However, high computational complexity does not yet mean one cannot approximate certain answers efficiently. In this talk we survey several recent results on finding such efficient and correct approximations, going significantly beyond CQs. We do so in a setting of databases with missing values, and first-order (relational calculus/algebra) queries. Even the class of queries where the standard database evaluation produces correct answers is larger than previously thought. When it comes to approximations, we present two schemes with good theoretical complexity. One of them also performs very well in practice, and restores correctness of SQL query evaluation on databases with nulls.