subscribe to arXiv mailings

Practical Reasoning in DatalogMTL

Authors: Dingmin Wang, Przemysław A. Wałęga, Pan Hu, Bernardo Cuenca Grau

Abstract: DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented i… ▽ More DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented in a system called MeTeoR. Our approach effectively combines an optimised (but generally non-terminating) materialisation (a.k.a. forward chaining) procedure, which provides scalable behaviour, with an automata-based component that guarantees termination and completeness. To ensure favourable scalability of the materialisation component, we propose a novel seminaïve materialisation procedure for DatalogMTL enjoying the non-repetition property, which ensures that each specific rule application will be considered at most once throughout the entire execution of the algorithm. Moreover, our materialisation procedure is enhanced with additional optimisations which further reduce the number of redundant computations performed during materialisation by disregarding rules as soon as it is certain that they cannot derive new facts in subsequent materialisation steps. Our extensive evaluation supports the practicality of our approach. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: text overlap with arXiv:2208.07100

arXiv:2306.07625 [pdf, ps, other]

The Stable Model Semantics of Datalog with Metric Temporal Operators

Authors: Przemysław A. Wałęga, David J. Tena Cucala, Bernardo Cuenca Grau, Egor V. Kostylev

Abstract: We introduce negation under the stable model semantics in DatalogMTL - a temporal extension of Datalog with metric temporal operators. As a result, we obtain a rule language which combines the power of answer set programming with the temporal dimension provided by metric operators. We show that, in this setting, reasoning becomes undecidable over the rational timeline, and decidable in EXPSPACE in… ▽ More We introduce negation under the stable model semantics in DatalogMTL - a temporal extension of Datalog with metric temporal operators. As a result, we obtain a rule language which combines the power of answer set programming with the temporal dimension provided by metric operators. We show that, in this setting, reasoning becomes undecidable over the rational timeline, and decidable in EXPSPACE in data complexity over the integer timeline. We also show that, if we restrict our attention to forward-propagating programs, reasoning over the integer timeline becomes PSPACE-complete in data complexity, and hence, no harder than over positive programs; however, reasoning over the rational timeline in this fragment remains undecidable. Under consideration in Theory and Practice of Logic Programming (TPLP). △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

arXiv:2306.04814 [pdf, ps, other]

Revisiting Inferential Benchmarks for Knowledge Graph Completion

Authors: Shuwen Liu, Bernardo Cuenca Grau, Ian Horrocks, Egor V. Kostylev

Abstract: Knowledge Graph (KG) completion is the problem of extending an incomplete KG with missing facts. A key feature of Machine Learning approaches for KG completion is their ability to learn inference patterns, so that the predicted facts are the results of applying these patterns to the KG. Standard completion benchmarks, however, are not well-suited for evaluating models' abilities to learn patterns,… ▽ More Knowledge Graph (KG) completion is the problem of extending an incomplete KG with missing facts. A key feature of Machine Learning approaches for KG completion is their ability to learn inference patterns, so that the predicted facts are the results of applying these patterns to the KG. Standard completion benchmarks, however, are not well-suited for evaluating models' abilities to learn patterns, because the training and test sets of these benchmarks are a random split of a given KG and hence do not capture the causality of inference patterns. We propose a novel approach for designing KG completion benchmarks based on the following principles: there is a set of logical rules so that the missing facts are the results of the rules' application; the training set includes both premises matching rule antecedents and the corresponding conclusions; the test set consists of the results of applying the rules to the training set; the negative examples are designed to discourage the models from learning rules not entailed by the rule set. We use our methodology to generate several benchmarks and evaluate a wide range of existing KG completion systems. Our results provide novel insights on the ability of existing models to induce inference patterns from incomplete KGs. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted by the 20th International Conference on Principles of Knowledge Representation and Reasoning (KR 2023)

arXiv:2305.18015 [pdf, ps, other]

On the Correspondence Between Monotonic Max-Sum GNNs and Datalog

Authors: David Tena Cucala, Bernardo Cuenca Grau, Boris Motik, Egor V. Kostylev

Abstract: Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN ca… ▽ More Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs. △ Less

Submitted 15 June, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

arXiv:2208.07100 [pdf, ps, other]

Seminaive Materialisation in DatalogMTL

Authors: Dingmin Wang, Przemysław Andrzej Wałęga, Bernardo Cuenca Grau

Abstract: DatalogMTL is an extension of Datalog with metric temporal operators that has found applications in temporal ontology-based data access and query answering, as well as in stream reasoning. Practical algorithms for DatalogMTL are reliant on materialisation-based reasoning, where temporal facts are derived in a forward chaining manner in successive rounds of rule applications. Current materialisatio… ▽ More DatalogMTL is an extension of Datalog with metric temporal operators that has found applications in temporal ontology-based data access and query answering, as well as in stream reasoning. Practical algorithms for DatalogMTL are reliant on materialisation-based reasoning, where temporal facts are derived in a forward chaining manner in successive rounds of rule applications. Current materialisation-based procedures are, however, based on a naive evaluation strategy, where the main source of inefficiency stems from redundant computations. In this paper, we propose a materialisation-based procedure which, analogously to the classical seminaive algorithm in Datalog, aims at minimising redundant computation by ensuring that each temporal rule instance is considered at most once during the execution of the algorithm. Our experiments show that our optimised seminaive strategy for DatalogMTL is able to significantly reduce materialisation times. △ Less

Submitted 27 September, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

Comments: Accepted by Declarative AI 2022 (RuleML+RR 2022)

Report number: 19

arXiv:2206.00362 [pdf, other]

An Empirical Study of Retrieval-enhanced Graph Neural Networks

Authors: Dingmin Wang, Shengchao Liu, Hanchen Wang, Bernardo Cuenca Grau, Linfeng Song, Jian Tang, Song Le, Qi Liu

Abstract: Graph Neural Networks (GNNs) are effective tools for graph representation learning. Most GNNs rely on a recursive neighborhood aggregation scheme, named message passing, thereby their theoretical expressive power is limited to the first-order Weisfeiler-Lehman test (1-WL). An effective approach to this challenge is to explicitly retrieve some annotated examples used to enhance GNN models. While re… ▽ More Graph Neural Networks (GNNs) are effective tools for graph representation learning. Most GNNs rely on a recursive neighborhood aggregation scheme, named message passing, thereby their theoretical expressive power is limited to the first-order Weisfeiler-Lehman test (1-WL). An effective approach to this challenge is to explicitly retrieve some annotated examples used to enhance GNN models. While retrieval-enhanced models have been proved to be effective in many language and vision domains, it remains an open question how effective retrieval-enhanced GNNs are when applied to graph datasets. Motivated by this, we want to explore how the retrieval idea can help augment the useful information learned in the graph neural networks, and we design a retrieval-enhanced scheme called GRAPHRETRIEVAL, which is agnostic to the choice of graph neural network models. In GRAPHRETRIEVAL, for each input graph, similar graphs together with their ground-true labels are retrieved from an existing database. Thus they can act as a potential enhancement to complete various graph property predictive tasks. We conduct comprehensive experiments over 13 datasets, and we observe that GRAPHRETRIEVAL is able to reach substantial improvements over existing GNNs. Moreover, our empirical study also illustrates that retrieval enhancement is a promising remedy for alleviating the long-tailed label distribution problem. △ Less

Submitted 17 September, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted by ECAI 2023

arXiv:2205.09901 [pdf, other]

Cardinality-Minimal Explanations for Monotonic Neural Networks

Authors: Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks

Abstract: In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, howe… ▽ More In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms. △ Less

Submitted 2 May, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2201.04596 [pdf, ps, other]

MeTeoR: Practical Reasoning in Datalog with Metric Temporal Operators

Authors: Dingmin Wang, Pan Hu, Przemysław Andrzej Wałęga, Bernardo Cuenca Grau

Abstract: DatalogMTL is an extension of Datalog with operators from metric temporal logic which has received significant attention in recent years. It is a highly expressive knowledge representation language that is well-suited for applications in temporal ontology-based query answering and stream processing. Reasoning in DatalogMTL is, however, of high computational complexity, making implementation challe… ▽ More DatalogMTL is an extension of Datalog with operators from metric temporal logic which has received significant attention in recent years. It is a highly expressive knowledge representation language that is well-suited for applications in temporal ontology-based query answering and stream processing. Reasoning in DatalogMTL is, however, of high computational complexity, making implementation challenging and hindering its adoption in applications. In this paper, we present a novel approach for practical reasoning in DatalogMTL which combines materialisation (a.k.a. forward chaining) with automata-based techniques. We have implemented this approach in a reasoner called MeTeoR and evaluated its performance using a temporal extension of the Lehigh University Benchmark and a benchmark based on real-world meteorological data. Our experiments show that MeTeoR is a scalable system which enables reasoning over complex temporal rules and datasets involving tens of millions of temporal facts. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: Accepted To AAAI 2022

arXiv:2102.07238 [pdf, other]

Double-descent curves in neural networks: a new perspective using Gaussian processes

Authors: Ouns El Harzli, Bernardo Cuenca Grau, Guillermo Valle-Pérez, Ard A. Louis

Abstract: Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribut… ▽ More Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expression allows us to study the generalisation behavior of the corresponding kernel and GP regression, and provides a new interpretation of the double-descent phenomenon, namely as governed by the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel. △ Less

Submitted 25 May, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

arXiv:1808.02291 [pdf, other]

The Window Validity Problem in Rule-Based Stream Reasoning

Authors: Alessandro Ronca, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Rule-based temporal query languages provide the expressive power and flexibility required to capture in a natural way complex analysis tasks over streaming data. Stream processing applications, however, typically require near real-time response using limited resources. In particular, it becomes essential that the underpinning query language has favourable computational properties and that stream p… ▽ More Rule-based temporal query languages provide the expressive power and flexibility required to capture in a natural way complex analysis tasks over streaming data. Stream processing applications, however, typically require near real-time response using limited resources. In particular, it becomes essential that the underpinning query language has favourable computational properties and that stream processing algorithms are able to keep only a small number of previously received facts in memory at any point in time without sacrificing correctness. In this paper, we propose a recursive fragment of temporal Datalog with tractable data complexity and study the properties of a generic stream reasoning algorithm for this fragment. We focus on the window validity problem as a way to minimise the number of time points for which the stream reasoning algorithm needs to keep data in memory at any point in time. △ Less

Submitted 15 November, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

arXiv:1805.01396 [pdf, ps, other]

Consequence-based Reasoning for Description Logics with Disjunction, Inverse Roles, Number Restrictions, and Nominals

Authors: David Tena Cucala, Bernardo Cuenca Grau, Ian Horrocks

Abstract: We present a consequence-based calculus for concept subsumption and classification in the description logic ALCHOIQ, which extends ALC with role hierarchies, inverse roles, number restrictions, and nominals. By using standard transformations, our calculus extends to SROIQ, which covers all of OWL 2 DL except for datatypes. A key feature of our calculus is its pay-as-you-go behaviour: unlike existi… ▽ More We present a consequence-based calculus for concept subsumption and classification in the description logic ALCHOIQ, which extends ALC with role hierarchies, inverse roles, number restrictions, and nominals. By using standard transformations, our calculus extends to SROIQ, which covers all of OWL 2 DL except for datatypes. A key feature of our calculus is its pay-as-you-go behaviour: unlike existing algorithms, our calculus is worst-case optimal for all the well-known proper fragments of ALCHOIQ, albeit not for the full logic. △ Less

Submitted 3 May, 2018; originally announced May 2018.

arXiv:1804.09473 [pdf, other]

Stratified Negation in Limit Datalog Programs

Authors: Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks

Abstract: There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are t… ▽ More There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of $\mathit{limit\ programs}$ was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Comments: 14 pages; full version of a paper accepted at IJCAI-18

arXiv:1711.04013 [pdf, other]

Stream Reasoning in Temporal Datalog

Authors: Alessandro Ronca, Mark Kaminski, Bernardo Cuenca Grau, Boris Motik, Ian Horrocks

Abstract: In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received,… ▽ More In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Stream reasoning algorithms, however, must be able to stream out query answers as soon as possible, and can only keep a limited number of previous input facts in memory. In this paper, we propose novel reasoning problems to deal with these challenges, and study their computational properties on Datalog extended with a temporal sort and the successor function (a core rule-based language for stream reasoning applications). △ Less

Submitted 15 November, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

arXiv:1705.07105 [pdf, other]

The Bag Semantics of Ontology-Based Data Access

Authors: Charalampos Nikolaou, Egor V. Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the… ▽ More Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags. △ Less

Submitted 19 May, 2017; originally announced May 2017.

arXiv:1705.06927 [pdf, other]

Foundations of Declarative Data Analysis Using Limit Datalog Programs

Authors: Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks

Abstract: Motivated by applications in declarative data analysis, we study $\mathit{Datalog}_{\mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $\mathit{limit}~\mathit{Datalog}_{\mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailmen… ▽ More Motivated by applications in declarative data analysis, we study $\mathit{Datalog}_{\mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $\mathit{limit}~\mathit{Datalog}_{\mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailment is coNExpTime-complete in combined, and coNP-complete in data complexity. Moreover, an additional $\mathit{stability}$ requirement causes the complexity to drop to ExpTime and PTime, respectively. Finally, we show that stable $\mathit{Datalog}_{\mathbb{Z}}$ can express many useful data analysis tasks, and so our results provide a sound foundation for the development of advanced information systems. △ Less

Submitted 12 November, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

Comments: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some typos and improves the acknowledgments

arXiv:1602.04498 [pdf, other]

Extending Consequence-Based Reasoning to SRIQ

Authors: Andrew Bate, Boris Motik, Bernardo Cuenca Grau, František Simančík, Ian Horrocks

Abstract: Consequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus fo… ▽ More Consequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus for SRIQ---a rich DL that supports both features. This extension is non-trivial since the intermediate consequences that need to be derived during reasoning cannot be captured using DLs themselves. The results of our preliminary performance evaluation suggest the feasibility of our approach in practice. △ Less

Submitted 23 February, 2016; v1 submitted 14 February, 2016; originally announced February 2016.

arXiv:1504.06529 [pdf, other]

Controlled Query Evaluation for Datalog and OWL 2 Profile Ontologies

Authors: Bernardo Cuenca Grau, Evgeny Kharlamov, Egor V. Kostylev, Dmitriy Zheleznyakov

Abstract: We study confidentiality enforcement in ontologies under the Controlled Query Evaluation framework, where a policy specifies the sensitive information and a censor ensures that query answers that may compromise the policy are not returned. We focus on censors that ensure confidentiality while maximising information access, and consider both Datalog and the OWL 2 profiles as ontology languages. We study confidentiality enforcement in ontologies under the Controlled Query Evaluation framework, where a policy specifies the sensitive information and a censor ensures that query answers that may compromise the policy are not returned. We focus on censors that ensure confidentiality while maximising information access, and consider both Datalog and the OWL 2 profiles as ontology languages. △ Less

Submitted 24 April, 2015; originally announced April 2015.

arXiv:1504.05150 [pdf, other]

Computing Horn Rewritings of Description Logics Ontologies

Authors: Mark Kaminski, Bernardo Cuenca Grau

Abstract: We study the problem of rewriting an ontology O1 expressed in a DL L1 into an ontology O2 in a Horn DL L2 such that O1 and O2 are equisatisfiable when extended with an arbitrary dataset. Ontologies that admit such rewritings are amenable to reasoning techniques ensuring tractability in data complexity. After showing undecidability whenever L1 extends ALCF, we focus on devising efficiently checkabl… ▽ More We study the problem of rewriting an ontology O1 expressed in a DL L1 into an ontology O2 in a Horn DL L2 such that O1 and O2 are equisatisfiable when extended with an arbitrary dataset. Ontologies that admit such rewritings are amenable to reasoning techniques ensuring tractability in data complexity. After showing undecidability whenever L1 extends ALCF, we focus on devising efficiently checkable conditions that ensure existence of a Horn rewriting. By lifting existing techniques for rewriting Disjunctive Datalog programs into plain Datalog to the case of arbitrary first-order programs with function symbols, we identify a class of ontologies that admit Horn rewritings of polynomial size. Our experiments indicate that many real-world ontologies satisfy our sufficient conditions and thus admit polynomial Horn rewritings. △ Less

Submitted 21 April, 2015; v1 submitted 20 April, 2015; originally announced April 2015.

Comments: 15 pages. To appear in IJCAI-15

arXiv:1411.5313 [pdf, other]

Ontology Module Extraction via Datalog Reasoning

Authors: Ana Armas Romero, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Module extraction - the task of computing a (preferably small) fragment M of an ontology T that preserves entailments over a signature S - has found many applications in recent years. Extracting modules of minimal size is, however, computationally hard, and often algorithmically infeasible. Thus, practical techniques are based on approximations, where M provably captures the relevant entailments,… ▽ More Module extraction - the task of computing a (preferably small) fragment M of an ontology T that preserves entailments over a signature S - has found many applications in recent years. Extracting modules of minimal size is, however, computationally hard, and often algorithmically infeasible. Thus, practical techniques are based on approximations, where M provably captures the relevant entailments, but is not guaranteed to be minimal. Existing approximations, however, ensure that M preserves all second-order entailments of T w.r.t. S, which is stronger than is required in many applications, and may lead to large modules in practice. In this paper we propose a novel approach in which module extraction is reduced to a reasoning problem in datalog. Our approach not only generalises existing approximations in an elegant way, but it can also be tailored to preserve only specific kinds of entailments, which allows us to extract significantly smaller modules. An evaluation on widely-used ontologies has shown very encouraging results. △ Less

Submitted 20 November, 2014; v1 submitted 19 November, 2014; originally announced November 2014.

Comments: 13 pages. To appear in AAAI-15

arXiv:1406.4110 [pdf]

doi 10.1613/jair.3949

Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies

Authors: Bernardo Cuenca Grau, Ian Horrocks, Markus Krötzsch, Clemens Kupke, Despoina Magka, Boris Motik, Zhe Wang

Abstract: Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, howev… ▽ More Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, however, does not terminate necessarily, and checking whether the chase terminates on a given set of rules and facts is undecidable. Numerous acyclicity notions were proposed as sufficient conditions for chase termination. In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). Furthermore, we investigate the landscape of the known acyclicity notions and establish a complete taxonomy of all notions known to us. Finally, we show that MFA and MSA generalise most of these notions. Existential rules are closely related to the Horn fragments of the OWL 2 ontology language; furthermore, several prominent OWL 2 reasoners implement CQ answering by using the chase to materialise all relevant facts. In order to avoid termination problems, many of these systems handle only the OWL 2 RL profile of OWL 2; furthermore, some systems go beyond OWL 2 RL, but without any termination guarantees. In this paper we also investigate whether various acyclicity notions can provide a principled and practical solution to these problems. On the theoretical side, we show that query answering for acyclic ontologies is of lower complexity than for general ontologies. On the practical side, we show that many of the commonly used OWL 2 ontologies are MSA, and that the number of facts obtained by materialisation is not too large. Our results thus suggest that principled development of materialisation-based OWL 2 reasoners is practically feasible. △ Less

Submitted 3 February, 2014; originally announced June 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 47, pages 741-808, 2013

arXiv:1404.3141 [pdf, other]

Datalog Rewritability of Disjunctive Datalog Programs and its Applications to Ontology Reasoning

Authors: Mark Kaminski, Yavor Nenov, Bernardo Cuenca Grau

Abstract: We study the problem of rewriting a disjunctive datalog program into plain datalog. We show that a disjunctive program is rewritable if and only if it is equivalent to a linear disjunctive program, thus providing a novel characterisation of datalog rewritability. Motivated by this result, we propose weakly linear disjunctive datalog---a novel rule-based KR language that extends both datalog and li… ▽ More We study the problem of rewriting a disjunctive datalog program into plain datalog. We show that a disjunctive program is rewritable if and only if it is equivalent to a linear disjunctive program, thus providing a novel characterisation of datalog rewritability. Motivated by this result, we propose weakly linear disjunctive datalog---a novel rule-based KR language that extends both datalog and linear disjunctive datalog and for which reasoning is tractable in data complexity. We then explore applications of weakly linear programs to ontology reasoning and propose a tractable extension of OWL 2 RL with disjunctive axioms. Our empirical results suggest that many non-Horn ontologies can be reduced to weakly linear programs and that query answering over such ontologies using a datalog engine is feasible in practice. △ Less

Submitted 11 April, 2014; originally announced April 2014.

Comments: 14 pages. To appear at AAAI-14

arXiv:1401.5853 [pdf]

doi 10.1613/jair.3579

Reasoning over Ontologies with Hidden Content: The Import-by-Query Approach

Authors: Bernardo Cuenca Grau, Boris Motik

Abstract: There is currently a growing interest in techniques for hiding parts of the signature of an ontology Kh that is being reused by another ontology Kv. Towards this goal, in this paper we propose the import-by-query framework, which makes the content of Kh accessible through a limited query interface. If Kv reuses the symbols from Kh in a certain restricted way, one can reason over Kv U Kh by accessi… ▽ More There is currently a growing interest in techniques for hiding parts of the signature of an ontology Kh that is being reused by another ontology Kv. Towards this goal, in this paper we propose the import-by-query framework, which makes the content of Kh accessible through a limited query interface. If Kv reuses the symbols from Kh in a certain restricted way, one can reason over Kv U Kh by accessing only Kv and the query interface. We map out the landscape of the import-by-query problem. In particular, we outline the limitations of our framework and prove that certain restrictions on the expressivity of Kh and the way in which Kv reuses symbols from Kh are strictly necessary to enable reasoning in our setting. We also identify cases in which reasoning is possible and we present suitable import-by-query reasoning algorithms. △ Less

Submitted 22 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 45, pages 197-255, 2012

arXiv:1401.4604 [pdf]

doi 10.1613/jair.3470

Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice

Authors: Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, Ian Horrocks

Abstract: To achieve scalability of query answering, the developers of Semantic Web applications are often forced to use incomplete OWL 2 reasoners, which fail to derive all answers for at least one query, ontology, and data set. The lack of completeness guarantees, however, may be unacceptable for applications in areas such as health care and defence, where missing answers can adversely affect the applicat… ▽ More To achieve scalability of query answering, the developers of Semantic Web applications are often forced to use incomplete OWL 2 reasoners, which fail to derive all answers for at least one query, ontology, and data set. The lack of completeness guarantees, however, may be unacceptable for applications in areas such as health care and defence, where missing answers can adversely affect the applications functionality. Furthermore, even if an application can tolerate some level of incompleteness, it is often advantageous to estimate how many and what kind of answers are being lost. In this paper, we present a novel logic-based framework that allows one to check whether a reasoner is complete for a given query Q and ontology T---that is, whether the reasoner is guaranteed to compute all answers to Q w.r.t. T and an arbitrary data set A. Since ontologies and typical queries are often fixed at application design time, our approach allows application developers to check whether a reasoner known to be incomplete in general is actually complete for the kinds of input relevant for the application. We also present a technique that, given a query Q, an ontology T, and reasoners R_1 and R_2 that satisfy certain assumptions, can be used to determine whether, for each data set A, reasoner R_1 computes more answers to Q w.r.t. T and A than reasoner R_2. This allows application developers to select the reasoner that provides the highest degree of completeness for Q and T that is compatible with the applications scalability requirements. Our results thus provide a theoretical and practical foundation for the design of future ontology-based information systems that maximise scalability while minimising or even eliminating incompleteness of query answers. △ Less

Submitted 18 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 43, pages 419-476, 2012

arXiv:1304.1402 [pdf, ps, other]

Computing Datalog Rewritings beyond Horn Ontologies

Authors: Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, Ian Horrocks

Abstract: Rewriting-based approaches for answering queries over an OWL 2 DL ontology have so far been developed mainly for Horn fragments of OWL 2 DL. In this paper, we study the possibilities of answering queries over non-Horn ontologies using datalog rewritings. We prove that this is impossible in general even for very simple ontology languages, and even if PTIME = NP. Furthermore, we present a resolution… ▽ More Rewriting-based approaches for answering queries over an OWL 2 DL ontology have so far been developed mainly for Horn fragments of OWL 2 DL. In this paper, we study the possibilities of answering queries over non-Horn ontologies using datalog rewritings. We prove that this is impossible in general even for very simple ontology languages, and even if PTIME = NP. Furthermore, we present a resolution-based procedure for $\SHI$ ontologies that, in case it terminates, produces a datalog rewriting of the ontology. Our procedure necessarily terminates on DL-Lite_{bool}^H ontologies---an extension of OWL 2 QL with transitive roles and Boolean connectives. △ Less

Submitted 8 April, 2013; v1 submitted 4 April, 2013; originally announced April 2013.

Comments: 14 pages. To appear at IJCAI 2013

arXiv:1208.3148 [pdf, other]

Evaluating Ontology Matching Systems on Large, Multilingual and Real-world Test Cases

Authors: Christian Meilicke, Ondrej Sváb-Zamazal, Cássia Trojahn, Ernesto Jiménez-Ruiz, José-Luis Aguirre, Heiner Stuckenschmidt, Bernardo Cuenca Grau

Abstract: In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign ar… ▽ More In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign are divided in five tracks. Three of these tracks are new or have been improved compared to previous OAEI campaigns. Overall, we evaluated 18 matching systems. We discuss lessons learned, in terms of scalability, multilingual issues and the ability do deal with real world cases from different domains. △ Less

Submitted 15 August, 2012; originally announced August 2012.

Comments: Technical Report of the OAEI 2011.5 Evaluation Campaign

arXiv:1012.1659 [pdf, other]

First steps in the logic-based assessment of post-composed phenotypic descriptions

Authors: Ernesto Jimenez-Ruiz, Bernardo Cuenca Grau, Rafael Berlanga, Dietrich Rebholz-Schuhmann

Abstract: In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies. In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies. △ Less

Submitted 7 December, 2010; originally announced December 2010.

Comments: in Adrian Paschke, Albert Burger, Andrea Splendiani, M. Scott Marshall, Paolo Romano: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences, Berlin,Germany, December 8-10, 2010

Report number: SWAT4LS 2010 ACM Class: J.3

Showing 1–26 of 26 results for author: Grau, B C