subscribe to arXiv mailings

Ontology Embedding: A Survey of Methods, Applications and Resources

Authors: Jiaoyan Chen, Olga Mashkova, Fernando Zhapa-Camacho, Robert Hoehndorf, Yuan He, Ian Horrocks

Abstract: Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine l… ▽ More Ontologies are widely used for representing domain knowledge and meta data, playing an increasingly important role in Information Systems, the Semantic Web, Bioinformatics and many other domains. However, logical reasoning that ontologies can directly support are quite limited in learning, approximation and prediction. One straightforward solution is to integrate statistical analysis and machine learning. To this end, automatically learning vector representation for knowledge of an ontology i.e., ontology embedding has been widely investigated in recent years. Numerous papers have been published on ontology embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field. To bridge this gap, we write this survey paper, which first introduces different kinds of semantics of ontologies, and formally defines ontology embedding from the perspectives of both mathematics and machine learning, as well as its property of faithfulness. Based on this, it systematically categorises and analyses a relatively complete set of over 80 papers, according to the ontologies and semantics that they aim at, and their technical solutions including geometric modeling, sequence modeling and graph propagation. This survey also introduces the applications of ontology embedding in ontology engineering, machine learning augmentation and life sciences, presents a new library mOWL, and discusses the challenges and future directions. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: A comprehensive survey on ontology embedding

arXiv:2402.17897 [pdf, other]

A Language Model based Framework for New Concept Placement in Ontologies

Authors: Hang Dong, Jiaoyan Chen, Yuan He, Yongsheng Gao, Ian Horrocks

Abstract: We investigate the task of inserting new concepts extracted from texts into an ontology using language models. We explore an approach with three steps: edge search which is to find a set of candidate locations to insert (i.e., subsumptions between concepts), edge formation and enrichment which leverages the ontological structure to produce and enhance the edge candidates, and edge selection which… ▽ More We investigate the task of inserting new concepts extracted from texts into an ontology using language models. We explore an approach with three steps: edge search which is to find a set of candidate locations to insert (i.e., subsumptions between concepts), edge formation and enrichment which leverages the ontological structure to produce and enhance the edge candidates, and edge selection which eventually locates the edge to be placed into. In all steps, we propose to leverage neural methods, where we apply embedding-based methods and contrastive learning with Pre-trained Language Models (PLMs) such as BERT for edge search, and adapt a BERT fine-tuning-based multi-label Edge-Cross-encoder, and Large Language Models (LLMs) such as GPT series, FLAN-T5, and Llama 2, for edge selection. We evaluate the methods on recent datasets created using the SNOMED CT ontology and the MedMentions entity linking benchmark. The best settings in our framework use fine-tuned PLM for search and a multi-label Cross-encoder for selection. Zero-shot prompting of LLMs is still not adequate for the task, and we propose explainable instruction tuning of LLMs for improved performance. Our study shows the advantages of PLMs and highlights the encouraging performance of LLMs that motivates future studies. △ Less

Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: 20 pages, 3 figures, accepted for ESWC 2024

ACM Class: I.2.7; I.2.4

arXiv:2401.11374 [pdf, other]

Language Models as Hierarchy Encoders

Authors: Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks

Abstract: Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing th… ▽ More Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincaré ball with a curvature that adapts to the embedding dimension, followed by re-training on hyperbolic cluster and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained and fine-tuned LMs, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform both pre-trained and fine-tuned LMs in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2312.11297 [pdf, other]

Optimised Storage for Datalog Reasoning

Authors: Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks

Abstract: Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures… ▽ More Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures that compactly represent the reasoning result and can be efficiently queried when necessary. In this paper, we present a general framework that allows for the integration of such optimised storage schemes with standard materialisation algorithms. Moreover, we devise optimised storage schemes targeting at transitive rules and union rules, two types of (combination of) rules that commonly occur in practice. Our experimental evaluation shows that our approach significantly improves memory consumption, sometimes by orders of magnitude, while remaining competitive in terms of query answering time. △ Less

Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2309.07172 [pdf, other]

Exploring Large Language Models for Ontology Alignment

Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks

Abstract: This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account c… ▽ More This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account concept labels and structural contexts. Preliminary findings suggest that LLMs have the potential to outperform existing ontology alignment systems like BERTMap, given careful framework and prompt design. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: Accepted at ISWC 2023 (Posters and Demos)

arXiv:2307.03067 [pdf, other]

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota

Abstract: Integrating deep learning techniques, particularly language models (LMs), with knowledge representation techniques like ontologies has raised widespread attention, urging the need of a platform that supports both paradigms. Although packages such as OWL API and Jena offer robust support for basic ontology processing features, they lack the capability to transform various types of information withi… ▽ More Integrating deep learning techniques, particularly language models (LMs), with knowledge representation techniques like ontologies has raised widespread attention, urging the need of a platform that supports both paradigms. Although packages such as OWL API and Jena offer robust support for basic ontology processing features, they lack the capability to transform various types of information within ontologies into formats suitable for downstream deep learning-based applications. Moreover, widely-used ontology APIs are primarily Java-based while deep learning frameworks like PyTorch and Tensorflow are mainly for Python programming. To address the needs, we present DeepOnto, a Python package designed for ontology engineering with deep learning. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to incorporate other essential components including reasoning, verbalisation, normalisation, taxonomy, projection, and more. Building on this module, DeepOnto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methods, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of DeepOnto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI). △ Less

Submitted 8 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted by the Semantic Web Journal

arXiv:2306.14704 [pdf, other]

doi 10.1145/3583780.3615126

Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement

Authors: Hang Dong, Jiaoyan Chen, Yuan He, Ian Horrocks

Abstract: Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases (KB), e.g., ontologies and taxonomies. Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery; (ii) only using the concept label as the input along with the KB and thus lacking… ▽ More Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases (KB), e.g., ontologies and taxonomies. Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery; (ii) only using the concept label as the input along with the KB and thus lacking the contexts of a concept label; and (iii) mostly focusing on concept placement w.r.t a taxonomy of atomic concepts, instead of complex concepts, i.e., with logical operators. To address these issues, we propose a new benchmark, adapting MedMentions dataset (PubMed abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases sub-category and the broader categories of Clinical finding, Procedure, and Pharmaceutical / biologic product. We provide usage on the evaluation with the dataset for out-of-KB mention discovery and concept placement, adapting recent Large Language Model based methods. △ Less

Submitted 1 September, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 5 pages, 1 figure, accepted for CIKM 2023. The dataset, data construction scripts, and baseline implementation are available at https://zenodo.org/record/8228005 (Zenodo) and https://github.com/KRR-Oxford/OET (GitHub)

ACM Class: I.2.7

arXiv:2306.04814 [pdf, ps, other]

Revisiting Inferential Benchmarks for Knowledge Graph Completion

Authors: Shuwen Liu, Bernardo Cuenca Grau, Ian Horrocks, Egor V. Kostylev

Abstract: Knowledge Graph (KG) completion is the problem of extending an incomplete KG with missing facts. A key feature of Machine Learning approaches for KG completion is their ability to learn inference patterns, so that the predicted facts are the results of applying these patterns to the KG. Standard completion benchmarks, however, are not well-suited for evaluating models' abilities to learn patterns,… ▽ More Knowledge Graph (KG) completion is the problem of extending an incomplete KG with missing facts. A key feature of Machine Learning approaches for KG completion is their ability to learn inference patterns, so that the predicted facts are the results of applying these patterns to the KG. Standard completion benchmarks, however, are not well-suited for evaluating models' abilities to learn patterns, because the training and test sets of these benchmarks are a random split of a given KG and hence do not capture the causality of inference patterns. We propose a novel approach for designing KG completion benchmarks based on the following principles: there is a set of logical rules so that the missing facts are the results of the rules' application; the training set includes both premises matching rule antecedents and the corresponding conclusions; the test set consists of the results of applying the rules to the training set; the negative examples are designed to discourage the models from learning rules not entailed by the rule set. We use our methodology to generate several benchmarks and evaluate a wide range of existing KG completion systems. Our results provide novel insights on the ability of existing models to induce inference patterns from incomplete KGs. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted by the 20th International Conference on Principles of Knowledge Representation and Reasoning (KR 2023)

arXiv:2305.06854 [pdf, other]

Enhancing Datalog Reasoning with Hypertree Decompositions

Authors: Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks

Abstract: Datalog reasoning based on the seminaïve evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental… ▽ More Datalog reasoning based on the seminaïve evaluation strategy evaluates rules using traditional join plans, which often leads to redundancy and inefficiency in practice, especially when the rules are complex. Hypertree decompositions help identify efficient query plans and reduce similar redundancy in query answering. However, it is unclear how this can be applied to materialisation and incremental reasoning with recursive Datalog programs. Moreover, hypertree decompositions require additional data structures and thus introduce nonnegligible overhead in both runtime and memory consumption. In this paper, we provide algorithms that exploit hypertree decompositions for the materialisation and incremental evaluation of Datalog programs. Furthermore, we combine this approach with standard Datalog reasoning algorithms in a modular fashion so that the overhead caused by the decompositions is reduced. Our empirical evaluation shows that, when the program contains complex rules, the combined approach is usually significantly faster than the baseline approach, sometimes by orders of magnitude. △ Less

Submitted 15 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2302.07189 [pdf, other]

doi 10.1145/3583780.3615036

Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking

Authors: Hang Dong, Jiaoyan Chen, Yuan He, Yinan Liu, Ian Horrocks

Abstract: Discovering entity mentions that are out of a Knowledge Base (KB) from texts plays a critical role in KB maintenance, but has not yet been fully explored. The current methods are mostly limited to the simple threshold-based approach and feature-based classification, and the datasets for evaluation are relatively rare. We propose BLINKout, a new BERT-based Entity Linking (EL) method which can ident… ▽ More Discovering entity mentions that are out of a Knowledge Base (KB) from texts plays a critical role in KB maintenance, but has not yet been fully explored. The current methods are mostly limited to the simple threshold-based approach and feature-based classification, and the datasets for evaluation are relatively rare. We propose BLINKout, a new BERT-based Entity Linking (EL) method which can identify mentions that do not have corresponding KB entities by matching them to a special NIL entity. To better utilize BERT, we propose new techniques including NIL entity representation and classification, with synonym enhancement. We also apply KB Pruning and Versioning strategies to automatically construct out-of-KB datasets from common in-KB EL datasets. Results on five datasets of clinical notes, biomedical publications, and Wikipedia articles in various domains show the advantages of BLINKout over existing methods to identify out-of-KB mentions for the medical ontologies, UMLS, SNOMED CT, and the general KB, WikiData. △ Less

Submitted 1 September, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 11 pages, 3 figures, accepted for CIKM 2023

ACM Class: I.2.7

arXiv:2302.06761 [pdf, other]

Language Model Analysis for Ontology Subsumption Inference

Authors: Yuan He, Jiaoyan Chen, Ernesto Jiménez-Ruiz, Hang Dong, Ian Horrocks

Abstract: Investigating whether pre-trained language models (LMs) can function as knowledge bases (KBs) has raised wide research interests recently. However, existing works focus on simple, triple-based, relational KBs, but omit more sophisticated, logic-based, conceptualised KBs such as OWL ontologies. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tas… ▽ More Investigating whether pre-trained language models (LMs) can function as knowledge bases (KBs) has raised wide research interests recently. However, existing works focus on simple, triple-based, relational KBs, but omit more sophisticated, logic-based, conceptualised KBs such as OWL ontologies. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tasks and datasets from ontology subsumption axioms involving both atomic and complex concepts. We conduct extensive experiments on ontologies of different domains and scales, and our results demonstrate that LMs encode relatively less background knowledge of Subsumption Inference (SI) than traditional Natural Language Inference (NLI) but can improve on SI significantly when a small number of samples are given. We will open-source our code and datasets. △ Less

Submitted 8 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted at Findings of ACL 2023; OntoLAMA Datasets are available at: https://huggingface.co/datasets/krr-oxford/OntoLAMA (Huggingface) or https://doi.org/10.5281/zenodo.6480540 (Zenodo)

arXiv:2301.11118 [pdf, other]

Dual Box Embeddings for the Description Logic EL++

Authors: Mathias Jackermeier, Jiaoyan Chen, Ian Horrocks

Abstract: OWL ontologies, whose formal semantics are rooted in Description Logic (DL), have been widely used for knowledge representation. Similar to Knowledge Graphs (KGs), ontologies are often incomplete, and maintaining and constructing them has proved challenging. While classical deductive reasoning algorithms use the precise formal semantics of an ontology to predict missing facts, recent years have wi… ▽ More OWL ontologies, whose formal semantics are rooted in Description Logic (DL), have been widely used for knowledge representation. Similar to Knowledge Graphs (KGs), ontologies are often incomplete, and maintaining and constructing them has proved challenging. While classical deductive reasoning algorithms use the precise formal semantics of an ontology to predict missing facts, recent years have witnessed growing interest in inductive reasoning techniques that can derive probable facts from an ontology. Similar to KGs, a promising approach is to learn ontology embeddings in a latent vector space, while additionally ensuring they adhere to the semantics of the underlying DL. While a variety of approaches have been proposed, current ontology embedding methods suffer from several shortcomings, especially that they all fail to faithfully model one-to-many, many-to-one, and many-to-many relations and role inclusion axioms. To address this problem and improve ontology completion performance, we propose a novel ontology embedding method named Box$^2$EL for the DL EL++, which represents both concepts and roles as boxes (i.e., axis-aligned hyperrectangles), and models inter-concept relationships using a bumping mechanism. We theoretically prove the soundness of Box$^2$EL and conduct an extensive experimental evaluation, achieving state-of-the-art results across a variety of datasets on the tasks of subsumption prediction, role assertion prediction, and approximating deductive reasoning. △ Less

Submitted 25 March, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: Updated license information

arXiv:2205.09901 [pdf, other]

Cardinality-Minimal Explanations for Monotonic Neural Networks

Authors: Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks

Abstract: In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, howe… ▽ More In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms. △ Less

Submitted 2 May, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2205.03447 [pdf, ps, other]

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jiménez-Ruiz, Ali Hadian, Ian Horrocks

Abstract: Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limi… ▽ More Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022. △ Less

Submitted 22 July, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

Comments: Accepted paper (Best Resource Paper Candidate) in the 21st International Semantic Web Conference (ISWC-2022); Bio-ML Dataset: https://doi.org/10.5281/zenodo.6510086

arXiv:2202.09791 [pdf, other]

Contextual Semantic Embeddings for Ontology Subsumption Prediction

Authors: Jiaoyan Chen, Yuan He, Yuxia Geng, Ernesto Jimenez-Ruiz, Hang Dong, Ian Horrocks

Abstract: Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumpti… ▽ More Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumption prediction method named BERTSubs for classes of OWL ontology. It exploits the pre-trained language model BERT to compute contextual embeddings of a class, where customized templates are proposed to incorporate the class context (e.g., neighbouring classes) and the logical existential restriction. BERTSubs is able to predict multiple kinds of subsumers including named classes from the same ontology or another ontology, and existential restrictions from the same ontology. Extensive evaluation on five real-world ontologies for three different subsumption tasks has shown the effectiveness of the templates and that BERTSubs can dramatically outperform the baselines that use (literal-aware) knowledge graph embeddings, non-contextual word embeddings and the state-of-the-art OWL ontology embeddings. △ Less

Submitted 18 March, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: Accepted by World Wide Web Journal

arXiv:2112.10006 [pdf, other]

Zero-shot and Few-shot Learning with Knowledge Graphs: A Comprehensive Survey

Authors: Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Jeff Z. Pan, Yuan He, Wen Zhang, Ian Horrocks, Huajun Chen

Abstract: Machine learning especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for supervision. As sufficient labeled training data are not always ready due to e.g., continuously emerging prediction targets and costly sample annotation in real world applications, machine learning with sample shortage is now being widely investigated. Among a… ▽ More Machine learning especially deep neural networks have achieved great success but many of them often rely on a number of labeled samples for supervision. As sufficient labeled training data are not always ready due to e.g., continuously emerging prediction targets and costly sample annotation in real world applications, machine learning with sample shortage is now being widely investigated. Among all these studies, many prefer to utilize auxiliary information including those in the form of Knowledge Graph (KG) to reduce the reliance on labeled samples. In this survey, we have comprehensively reviewed over 90 papers about KG-aware research for two major sample shortage settings -- zero-shot learning (ZSL) where some classes to be predicted have no labeled samples, and few-shot learning (FSL) where some classes to be predicted have only a small number of labeled samples that are available. We first introduce KGs used in ZSL and FSL as well as their construction methods, and then systematically categorize and summarize KG-aware ZSL and FSL methods, dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based. We next present different applications, including not only KG augmented prediction tasks such as image classification, question answering, text classification and knowledge extraction, but also KG completion tasks, and some typical evaluation resources for each task. We eventually discuss some challenges and open problems from different perspectives. △ Less

Submitted 3 December, 2022; v1 submitted 18 December, 2021; originally announced December 2021.

Comments: A survey on Zero-shot and Few-shot Learning with Knowledge Graph. It has collected 96 ZSL and FSL papers on this topic, with 11 figures and 4 tables

arXiv:2112.02682 [pdf, other]

BERTMap: A BERT-based Ontology Alignment System

Authors: Yuan He, Jiaoyan Chen, Denvar Antonyrajah, Ian Horrocks

Abstract: Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we prop… ▽ More Ontology alignment (a.k.a ontology matching (OM)) plays a critical role in knowledge integration. Owing to the success of machine learning in many domains, it has been applied in OM. However, the existing methods, which often adopt ad-hoc feature engineering or non-contextual word embeddings, have not yet outperformed rule-based systems especially in an unsupervised setting. In this paper, we propose a novel OM system named BERTMap which can support both unsupervised and semi-supervised settings. It first predicts mappings using a classifier based on fine-tuning the contextual embedding model BERT on text semantics corpora extracted from ontologies, and then refines the mappings through extension and repair by utilizing the ontology structure and logic. Our evaluation with three alignment tasks on biomedical ontologies demonstrates that BERTMap can often perform better than the leading OM systems LogMap and AML. △ Less

Submitted 3 May, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: Full version (with appendix) of the accepted paper in 36th AAAI Conference on Artificial Intelligence 2022

arXiv:2109.03655 [pdf, other]

doi 10.1109/BigData.2017.8258105

On Event-Driven Knowledge Graph Completion in Digital Factories

Authors: Martin Ringsquandl, Evgeny Kharlamov, Daria Stepanova, Steffen Lamparter, Raffaello Lepratti, Ian Horrocks, Peer Kröger

Abstract: Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation an… ▽ More Smart factories are equipped with machines that can sense their manufacturing environments, interact with each other, and control production processes. Smooth operation of such factories requires that the machines and engineering personnel that conduct their monitoring and diagnostics share a detailed common industrial knowledge about the factory, e.g., in the form of knowledge graphs. Creation and maintenance of such knowledge is expensive and requires automation. In this work we show how machine learning that is specifically tailored towards industrial applications can help in knowledge graph completion. In particular, we show how knowledge completion can benefit from event logs that are common in smart factories. We evaluate this on the knowledge graph from a real world-inspired smart factory with encouraging results. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Journal ref: 2017 IEEE International Conference on Big Data (BIGDATA)

arXiv:2107.00369 [pdf, ps, other]

Computing CQ lower-bounds over OWL 2 through approximation to RSA

Authors: Federico Igne, Stefano Germano, Ian Horrocks

Abstract: Conjunctive query (CQ) answering over knowledge bases is an important reasoning task. However, with expressive ontology languages such as OWL, query answering is computationally very expensive. The PAGOdA system addresses this issue by using a tractable reasoner to compute lower and upper-bound approximations, falling back to a fully-fledged OWL reasoner only when these bounds don't coincide. The… ▽ More Conjunctive query (CQ) answering over knowledge bases is an important reasoning task. However, with expressive ontology languages such as OWL, query answering is computationally very expensive. The PAGOdA system addresses this issue by using a tractable reasoner to compute lower and upper-bound approximations, falling back to a fully-fledged OWL reasoner only when these bounds don't coincide. The effectiveness of this approach critically depends on the quality of the approximations, and in this paper we explore a technique for computing closer approximations via RSA, an ontology language that subsumes all the OWL 2 profiles while still maintaining tractability. We present a novel approximation of OWL 2 ontologies into RSA, and an algorithm to compute a closer (than PAGOdA) lower bound approximation using the RSA combined approach. We have implemented these algorithms in a prototypical CQ answering system, and we present a preliminary evaluation of our system that shows significant performance improvements w.r.t. PAGOdA. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: 26 pages, 1 figure

arXiv:2103.00070 [pdf, ps, other]

Knowledge-aware Zero-Shot Learning: Survey and Perspective

Authors: Jiaoyan Chen, Yuxia Geng, Zhuo Chen, Ian Horrocks, Jeff Z. Pan, Huajun Chen

Abstract: Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a.k.a. side information) has been widely investigated. In this paper we present a literature review towards ZSL in the perspective of external knowledge, where we categorize the external knowledge, review their methods and compare different external knowledge. With the l… ▽ More Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a.k.a. side information) has been widely investigated. In this paper we present a literature review towards ZSL in the perspective of external knowledge, where we categorize the external knowledge, review their methods and compare different external knowledge. With the literature review, we further discuss and outlook the role of symbolic knowledge in addressing ZSL and other machine learning sample shortage issues. △ Less

Submitted 10 May, 2021; v1 submitted 26 February, 2021; originally announced March 2021.

Comments: Accepted by IJCAI'21 Survey Track

arXiv:2009.14654 [pdf, other]

OWL2Vec*: Embedding of OWL Ontologies

Authors: Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, Ian Horrocks

Abstract: Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies which can express a much wider range of semantics than knowledge graphs and have… ▽ More Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies which can express a much wider range of semantics than knowledge graphs and have been widely adopted in domains such as bioinformatics. In this paper, we propose a random walk and word embedding based ontology embedding method named OWL2Vec*, which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors. Our empirical evaluation with three real world datasets suggests that OWL2Vec* benefits from these three different aspects of an ontology in class membership prediction and class subsumption prediction tasks. Furthermore, OWL2Vec* often significantly outperforms the state-of-the-art methods in our experiments. △ Less

Submitted 25 January, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

arXiv:2001.06917 [pdf, other]

doi 10.1145/3366423.3380226

Correcting Knowledge Base Assertions

Authors: Jiaoyan Chen, Xi Chen, Ian Horrocks, Ernesto Jimenez-Ruiz, Erik B. Myklebus

Abstract: The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking.… ▽ More The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: Accepted by The Web Conference (WWW) 2020

ACM Class: I.2

arXiv:1908.10177 [pdf, ps, other]

doi 10.1145/3357384.3358147

Datalog Reasoning over Compressed RDF Knowledge Bases

Authors: Pan Hu, Jacopo Urbani, Boris Motik, Ian Horrocks

Abstract: Materialisation is often used in RDF systems as a preprocessing step to derive all facts implied by given RDF triples and rules. Although widely used, materialisation considers all possible rule applications and can use a lot of memory for storing the derived facts, which can hinder performance. We present a novel materialisation technique that compresses the RDF triples so that the rules can some… ▽ More Materialisation is often used in RDF systems as a preprocessing step to derive all facts implied by given RDF triples and rules. Although widely used, materialisation considers all possible rule applications and can use a lot of memory for storing the derived facts, which can hinder performance. We present a novel materialisation technique that compresses the RDF triples so that the rules can sometimes be applied to multiple facts at once, and the derived facts can be represented using structure sharing. Our technique can thus require less space, as well as skip certain rule applications. Our experiments show that our technique can be very effective: when the rules are relatively simple, our system is both faster and requires less memory than prominent state-of-the-art RDF systems. △ Less

Submitted 29 August, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: CIKM 2019

arXiv:1906.11180 [pdf, other]

Canonicalizing Knowledge Base Literals

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks

Abstract: Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that i… ▽ More Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Journal ref: International Semantic Web Conference (ISWC) 2019

arXiv:1906.10261 [pdf, ps, other]

Datalog Materialisation in Distributed RDF Stores with Dynamic Data Exchange

Authors: Temitope Ajileye, Boris Motik, Ian Horrocks

Abstract: Several centralised RDF systems support datalog reasoning by precomputing and storing all logically implied triples using the wellknown seminaive algorithm. Large RDF datasets often exceed the capacity of centralised RDF systems, and a common solution is to distribute the datasets in a cluster of shared-nothing servers. While numerous distributed query answering techniques are known, distributed s… ▽ More Several centralised RDF systems support datalog reasoning by precomputing and storing all logically implied triples using the wellknown seminaive algorithm. Large RDF datasets often exceed the capacity of centralised RDF systems, and a common solution is to distribute the datasets in a cluster of shared-nothing servers. While numerous distributed query answering techniques are known, distributed seminaive evaluation of arbitrary datalog rules is less understood. In fact, most distributed RDF stores either support no reasoning or can handle only limited datalog fragments. In this paper we extend the dynamic data exchange approach for distributed query answering by Potter et al. [12] to a reasoning algorithm that can handle arbitrary rules while preserving important properties such as nonrepetition of inferences. We also show empirically that our algorithm scales well to very large RDF datasets △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 16 pages, ISWC conference

arXiv:1906.00781 [pdf, other]

Learning Semantic Annotations for Tabular Data

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton

Abstract: The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), a… ▽ More The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm.It exhibits good performance not only on individual table sets, but also when transferring from one table set to another. △ Less

Submitted 30 May, 2019; originally announced June 2019.

Comments: 7 pages

Journal ref: IJCAI 2019

arXiv:1811.02304 [pdf, ps, other]

Modular Materialisation of Datalog Programs

Authors: Pan Hu, Boris Motik, Ian Horrocks

Abstract: The seminaïve algorithm can materialise all consequences of arbitrary datalog rules, and it also forms the basis for incremental algorithms that update a materialisation as the input facts change. Certain (combinations of) rules, however, can be handled much more efficiently using custom algorithms. To integrate such algorithms into a general reasoning approach that can handle arbitrary rules, we… ▽ More The seminaïve algorithm can materialise all consequences of arbitrary datalog rules, and it also forms the basis for incremental algorithms that update a materialisation as the input facts change. Certain (combinations of) rules, however, can be handled much more efficiently using custom algorithms. To integrate such algorithms into a general reasoning approach that can handle arbitrary rules, we propose a modular framework for materialisation computation and its maintenance. We split a datalog program into modules that can be handled using specialised algorithms, and handle the remaining rules using the seminaïve algorithm. We also present two algorithms for computing the transitive and the symmetric-transitive closure of a relation that can be used within our framework. Finally, we show empirically that our framework can handle arbitrary datalog programs while outperforming existing approaches, often by orders of magnitude. △ Less

Submitted 13 November, 2018; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: Accepted at AAAI 2019

arXiv:1811.01304 [pdf, other]

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton

Abstract: Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation frame… ▽ More Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation framework named ColNet which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction. The prediction model not only considers the contextual semantics within a cell using word representation, but also embeds the semantics of a column by learning locality features from multiple cells. The method is evaluated with DBPedia and two different web table datasets, T2Dv2 from the general Web and Limaye from Wikipedia pages, and achieves higher performance than the state-of-the-art approaches. △ Less

Submitted 14 November, 2018; v1 submitted 3 November, 2018; originally announced November 2018.

Comments: AAAI 2019

arXiv:1808.02291 [pdf, other]

The Window Validity Problem in Rule-Based Stream Reasoning

Authors: Alessandro Ronca, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Rule-based temporal query languages provide the expressive power and flexibility required to capture in a natural way complex analysis tasks over streaming data. Stream processing applications, however, typically require near real-time response using limited resources. In particular, it becomes essential that the underpinning query language has favourable computational properties and that stream p… ▽ More Rule-based temporal query languages provide the expressive power and flexibility required to capture in a natural way complex analysis tasks over streaming data. Stream processing applications, however, typically require near real-time response using limited resources. In particular, it becomes essential that the underpinning query language has favourable computational properties and that stream processing algorithms are able to keep only a small number of previously received facts in memory at any point in time without sacrificing correctness. In this paper, we propose a recursive fragment of temporal Datalog with tractable data complexity and study the properties of a generic stream reasoning algorithm for this fragment. We focus on the window validity problem as a way to minimise the number of time points for which the stream reasoning algorithm needs to keep data in memory at any point in time. △ Less

Submitted 15 November, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

arXiv:1807.08372 [pdf, other]

Knowledge-based Transfer Learning Explanation

Authors: Jiaoyan Chen, Freddy Lecue, Jeff Z. Pan, Ian Horrocks, Huajun Chen

Abstract: Machine learning explanation can significantly boost machine learning's application in decision making, but the usability of current methods is limited in human-centric explanation, especially for transfer learning, an important machine learning branch that aims at utilizing knowledge from one learning domain (i.e., a pair of dataset and prediction task) to enhance prediction model training in ano… ▽ More Machine learning explanation can significantly boost machine learning's application in decision making, but the usability of current methods is limited in human-centric explanation, especially for transfer learning, an important machine learning branch that aims at utilizing knowledge from one learning domain (i.e., a pair of dataset and prediction task) to enhance prediction model training in another learning domain. In this paper, we propose an ontology-based approach for human-centric explanation of transfer learning. Three kinds of knowledge-based explanatory evidence, with different granularities, including general factors, particular narrators and core contexts are first proposed and then inferred with both local ontologies and external knowledge bases. The evaluation with US flight data and DBpedia has presented their confidence and availability in explaining the transferability of feature representation in flight departure delay forecasting. △ Less

Submitted 22 July, 2018; originally announced July 2018.

Comments: Accepted by International Conference on Principles of Knowledge Representation and Reasoning, 2018

arXiv:1805.01396 [pdf, ps, other]

Consequence-based Reasoning for Description Logics with Disjunction, Inverse Roles, Number Restrictions, and Nominals

Authors: David Tena Cucala, Bernardo Cuenca Grau, Ian Horrocks

Abstract: We present a consequence-based calculus for concept subsumption and classification in the description logic ALCHOIQ, which extends ALC with role hierarchies, inverse roles, number restrictions, and nominals. By using standard transformations, our calculus extends to SROIQ, which covers all of OWL 2 DL except for datatypes. A key feature of our calculus is its pay-as-you-go behaviour: unlike existi… ▽ More We present a consequence-based calculus for concept subsumption and classification in the description logic ALCHOIQ, which extends ALC with role hierarchies, inverse roles, number restrictions, and nominals. By using standard transformations, our calculus extends to SROIQ, which covers all of OWL 2 DL except for datatypes. A key feature of our calculus is its pay-as-you-go behaviour: unlike existing algorithms, our calculus is worst-case optimal for all the well-known proper fragments of ALCHOIQ, albeit not for the full logic. △ Less

Submitted 3 May, 2018; originally announced May 2018.

arXiv:1804.09473 [pdf, other]

Stratified Negation in Limit Datalog Programs

Authors: Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks

Abstract: There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are t… ▽ More There has recently been an increasing interest in declarative data analysis, where analytic tasks are specified using a logical language, and their implementation and optimisation are delegated to a general-purpose query engine. Existing declarative languages for data analysis can be formalised as variants of logic programming equipped with arithmetic function symbols and/or aggregation, and are typically undecidable. In prior work, the language of $\mathit{limit\ programs}$ was proposed, which is sufficiently powerful to capture many analysis tasks and has decidable entailment problem. Rules in this language, however, do not allow for negation. In this paper, we study an extension of limit programs with stratified negation-as-failure. We show that the additional expressive power makes reasoning computationally more demanding, and provide tight data complexity bounds. We also identify a fragment with tractable data complexity and sufficient expressivity to capture many relevant tasks. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Comments: 14 pages; full version of a paper accepted at IJCAI-18

arXiv:1711.04013 [pdf, other]

Stream Reasoning in Temporal Datalog

Authors: Alessandro Ronca, Mark Kaminski, Bernardo Cuenca Grau, Boris Motik, Ian Horrocks

Abstract: In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received,… ▽ More In recent years, there has been an increasing interest in extending traditional stream processing engines with logical, rule-based, reasoning capabilities. This poses significant theoretical and practical challenges since rules can derive new information and propagate it both towards past and future time points; as a result, streamed query answers can depend on data that has not yet been received, as well as on data that arrived far in the past. Stream reasoning algorithms, however, must be able to stream out query answers as soon as possible, and can only keep a limited number of previous input facts in memory. In this paper, we propose novel reasoning problems to deal with these challenges, and study their computational properties on Datalog extended with a temporal sort and the successor function (a core rule-based language for stream reasoning applications). △ Less

Submitted 15 November, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

arXiv:1711.03987 [pdf, ps, other]

Optimised Maintenance of Datalog Materialisations

Authors: Pan Hu, Boris Motik, Ian Horrocks

Abstract: To efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate ru… ▽ More To efficiently answer queries, datalog systems often materialise all consequences of a datalog program, so the materialisation must be updated whenever the input facts change. Several solutions to the materialisation update problem have been proposed. The Delete/Rederive (DRed) and the Backward/Forward (B/F) algorithms solve this problem for general datalog, but both contain steps that evaluate rules 'backwards' by matching their heads to a fact and evaluating the partially instantiated rule bodies as queries. We show that this can be a considerable source of overhead even on very small updates. In contrast, the Counting algorithm does not evaluate the rules 'backwards', but it can handle only nonrecursive rules. We present two hybrid approaches that combine DRed and B/F with Counting so as to reduce or even eliminate 'backward' rule evaluation while still handling arbitrary datalog programs. We show empirically that our hybrid algorithms are usually significantly faster than existing approaches, sometimes by orders of magnitude. △ Less

Submitted 20 November, 2017; v1 submitted 10 November, 2017; originally announced November 2017.

Comments: AAAI 2018

arXiv:1705.07105 [pdf, other]

The Bag Semantics of Ontology-Based Data Access

Authors: Charalampos Nikolaou, Egor V. Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the… ▽ More Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign views over the data to ontology predicates. Motivated by the need for OBDA systems supporting database-style aggregate queries, we propose a bag semantics for OBDA, where duplicate tuples in the views defined by the mappings are retained, as is the case in standard databases. We show that bag semantics makes conjunctive query answering in OBDA coNP-hard in data complexity. To regain tractability, we consider a rather general class of queries and show its rewritability to a generalisation of the relational calculus to bags. △ Less

Submitted 19 May, 2017; originally announced May 2017.

arXiv:1705.06927 [pdf, other]

Foundations of Declarative Data Analysis Using Limit Datalog Programs

Authors: Mark Kaminski, Bernardo Cuenca Grau, Egor V. Kostylev, Boris Motik, Ian Horrocks

Abstract: Motivated by applications in declarative data analysis, we study $\mathit{Datalog}_{\mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $\mathit{limit}~\mathit{Datalog}_{\mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailmen… ▽ More Motivated by applications in declarative data analysis, we study $\mathit{Datalog}_{\mathbb{Z}}$---an extension of positive Datalog with arithmetic functions over integers. This language is known to be undecidable, so we propose two fragments. In $\mathit{limit}~\mathit{Datalog}_{\mathbb{Z}}$ predicates are axiomatised to keep minimal/maximal numeric values, allowing us to show that fact entailment is coNExpTime-complete in combined, and coNP-complete in data complexity. Moreover, an additional $\mathit{stability}$ requirement causes the complexity to drop to ExpTime and PTime, respectively. Finally, we show that stable $\mathit{Datalog}_{\mathbb{Z}}$ can express many useful data analysis tasks, and so our results provide a sound foundation for the development of advanced information systems. △ Less

Submitted 12 November, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

Comments: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some typos and improves the acknowledgments

arXiv:1607.05351 [pdf, other]

Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

Authors: Evgeny Kharlamov, Yannis Kotidis, Theofilos Mailis, Christian Neuenstadt, Charalampos Nikolaou, Özgür Özcep, Christoforos Svingos, Dmitriy Zheleznyakov, Sebastian Brandt, Ian Horrocks, Yannis Ioannidis, Steffen Lamparter, Ralf Möller

Abstract: Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications.… ▽ More Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data. △ Less

Submitted 15 August, 2016; v1 submitted 18 July, 2016; originally announced July 2016.

arXiv:1602.04498 [pdf, other]

Extending Consequence-Based Reasoning to SRIQ

Authors: Andrew Bate, Boris Motik, Bernardo Cuenca Grau, František Simančík, Ian Horrocks

Abstract: Consequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus fo… ▽ More Consequence-based calculi are a family of reasoning algorithms for description logics (DLs), and they combine hypertableau and resolution in a way that often achieves excellent performance in practice. Up to now, however, they were proposed for either Horn DLs (which do not support disjunction), or for DLs without counting quantifiers. In this paper we present a novel consequence-based calculus for SRIQ---a rich DL that supports both features. This extension is non-trivial since the intermediate consequences that need to be derived during reasoning cannot be captured using DLs themselves. The results of our preliminary performance evaluation suggest the feasibility of our approach in practice. △ Less

Submitted 23 February, 2016; v1 submitted 14 February, 2016; originally announced February 2016.

arXiv:1505.00212 [pdf, other]

Combining Rewriting and Incremental Materialisation Maintenance for Datalog Programs with Equality

Authors: Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks

Abstract: Materialisation precomputes all consequences of a set of facts and a datalog program so that queries can be evaluated directly (i.e., independently from the program). Rewriting optimises materialisation for datalog programs with equality by replacing all equal constants with a single representative; and incremental maintenance algorithms can efficiently update a materialisation for small changes i… ▽ More Materialisation precomputes all consequences of a set of facts and a datalog program so that queries can be evaluated directly (i.e., independently from the program). Rewriting optimises materialisation for datalog programs with equality by replacing all equal constants with a single representative; and incremental maintenance algorithms can efficiently update a materialisation for small changes in the input facts. Both techniques are critical to practical applicability of datalog systems; however, we are unaware of an approach that combines rewriting and incremental maintenance. In this paper we present the first such combination, and we show empirically that it can speed up updates by several orders of magnitude compared to using either rewriting or incremental maintenance in isolation. △ Less

Submitted 1 May, 2015; originally announced May 2015.

Comments: All proofs contained in the appendix. 7 pages + 4 pages appendix. 7 algorithms and one table with evaluation results

arXiv:1411.5313 [pdf, other]

Ontology Module Extraction via Datalog Reasoning

Authors: Ana Armas Romero, Mark Kaminski, Bernardo Cuenca Grau, Ian Horrocks

Abstract: Module extraction - the task of computing a (preferably small) fragment M of an ontology T that preserves entailments over a signature S - has found many applications in recent years. Extracting modules of minimal size is, however, computationally hard, and often algorithmically infeasible. Thus, practical techniques are based on approximations, where M provably captures the relevant entailments,… ▽ More Module extraction - the task of computing a (preferably small) fragment M of an ontology T that preserves entailments over a signature S - has found many applications in recent years. Extracting modules of minimal size is, however, computationally hard, and often algorithmically infeasible. Thus, practical techniques are based on approximations, where M provably captures the relevant entailments, but is not guaranteed to be minimal. Existing approximations, however, ensure that M preserves all second-order entailments of T w.r.t. S, which is stronger than is required in many applications, and may lead to large modules in practice. In this paper we propose a novel approach in which module extraction is reduced to a reasoning problem in datalog. Our approach not only generalises existing approximations in an elegant way, but it can also be tailored to preserve only specific kinds of entailments, which allows us to extract significantly smaller modules. An evaluation on widely-used ontologies has shown very encouraging results. △ Less

Submitted 20 November, 2014; v1 submitted 19 November, 2014; originally announced November 2014.

Comments: 13 pages. To appear in AAAI-15

arXiv:1411.3622 [pdf, ps, other]

Handling owl:sameAs via Rewriting

Authors: Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks

Abstract: Rewriting is widely used to optimise owl:sameAs reasoning in materialisation based OWL 2 RL systems. We investigate issues related to both the correctness and efficiency of rewriting, and present an algorithm that guarantees correctness, improves efficiency, and can be effectively parallelised. Our evaluation shows that our approach can reduce reasoning times on practical data sets by orders of ma… ▽ More Rewriting is widely used to optimise owl:sameAs reasoning in materialisation based OWL 2 RL systems. We investigate issues related to both the correctness and efficiency of rewriting, and present an algorithm that guarantees correctness, improves efficiency, and can be effectively parallelised. Our evaluation shows that our approach can reduce reasoning times on practical data sets by orders of magnitude. △ Less

Submitted 13 November, 2014; originally announced November 2014.

Comments: This is the technical report supporting the AAAI 2015 Conference submission with the same title

arXiv:1406.4110 [pdf]

doi 10.1613/jair.3949

Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies

Authors: Bernardo Cuenca Grau, Ian Horrocks, Markus Krötzsch, Clemens Kupke, Despoina Magka, Boris Motik, Zhe Wang

Abstract: Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, howev… ▽ More Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, however, does not terminate necessarily, and checking whether the chase terminates on a given set of rules and facts is undecidable. Numerous acyclicity notions were proposed as sufficient conditions for chase termination. In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). Furthermore, we investigate the landscape of the known acyclicity notions and establish a complete taxonomy of all notions known to us. Finally, we show that MFA and MSA generalise most of these notions. Existential rules are closely related to the Horn fragments of the OWL 2 ontology language; furthermore, several prominent OWL 2 reasoners implement CQ answering by using the chase to materialise all relevant facts. In order to avoid termination problems, many of these systems handle only the OWL 2 RL profile of OWL 2; furthermore, some systems go beyond OWL 2 RL, but without any termination guarantees. In this paper we also investigate whether various acyclicity notions can provide a principled and practical solution to these problems. On the theoretical side, we show that query answering for acyclic ontologies is of lower complexity than for general ontologies. On the practical side, we show that many of the commonly used OWL 2 ontologies are MSA, and that the number of facts obtained by materialisation is not too large. Our results thus suggest that principled development of materialisation-based OWL 2 reasoners is practically feasible. △ Less

Submitted 3 February, 2014; originally announced June 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 47, pages 741-808, 2013

arXiv:1401.4604 [pdf]

doi 10.1613/jair.3470

Completeness Guarantees for Incomplete Ontology Reasoners: Theory and Practice

Authors: Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, Ian Horrocks

Abstract: To achieve scalability of query answering, the developers of Semantic Web applications are often forced to use incomplete OWL 2 reasoners, which fail to derive all answers for at least one query, ontology, and data set. The lack of completeness guarantees, however, may be unacceptable for applications in areas such as health care and defence, where missing answers can adversely affect the applicat… ▽ More To achieve scalability of query answering, the developers of Semantic Web applications are often forced to use incomplete OWL 2 reasoners, which fail to derive all answers for at least one query, ontology, and data set. The lack of completeness guarantees, however, may be unacceptable for applications in areas such as health care and defence, where missing answers can adversely affect the applications functionality. Furthermore, even if an application can tolerate some level of incompleteness, it is often advantageous to estimate how many and what kind of answers are being lost. In this paper, we present a novel logic-based framework that allows one to check whether a reasoner is complete for a given query Q and ontology T---that is, whether the reasoner is guaranteed to compute all answers to Q w.r.t. T and an arbitrary data set A. Since ontologies and typical queries are often fixed at application design time, our approach allows application developers to check whether a reasoner known to be incomplete in general is actually complete for the kinds of input relevant for the application. We also present a technique that, given a query Q, an ontology T, and reasoners R_1 and R_2 that satisfy certain assumptions, can be used to determine whether, for each data set A, reasoner R_1 computes more answers to Q w.r.t. T and A than reasoner R_2. This allows application developers to select the reasoner that provides the highest degree of completeness for Q and T that is compatible with the applications scalability requirements. Our results thus provide a theoretical and practical foundation for the design of future ontology-based information systems that maximise scalability while minimising or even eliminating incompleteness of query answers. △ Less

Submitted 18 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 43, pages 419-476, 2012

arXiv:1401.3485 [pdf]

doi 10.1613/jair.2811

Hypertableau Reasoning for Description Logics

Authors: Boris Motik, Rob Shearer, Ian Horrocks

Abstract: We present a novel reasoning calculus for the description logic SHOIQ^+---a knowledge representation formalism with applications in areas such as the Semantic Web. Unnecessary nondeterminism and the construction of large models are two primary sources of inefficiency in the tableau-based reasoning calculi used in state-of-the-art reasoners. In order to reduce nondeterminism, we base our calculus o… ▽ More We present a novel reasoning calculus for the description logic SHOIQ^+---a knowledge representation formalism with applications in areas such as the Semantic Web. Unnecessary nondeterminism and the construction of large models are two primary sources of inefficiency in the tableau-based reasoning calculi used in state-of-the-art reasoners. In order to reduce nondeterminism, we base our calculus on hypertableau and hyperresolution calculi, which we extend with a blocking condition to ensure termination. In order to reduce the size of the constructed models, we introduce anywhere pairwise blocking. We also present an improved nominal introduction rule that ensures termination in the presence of nominals, inverse roles, and number restrictions---a combination of DL constructs that has proven notoriously difficult to handle. Our implementation shows significant performance improvements over state-of-the-art reasoners on several well-known ontologies. △ Less

Submitted 15 January, 2014; originally announced January 2014.

Journal ref: Journal Of Artificial Intelligence Research, Volume 36, pages 165-228, 2009

arXiv:1304.1402 [pdf, ps, other]

Computing Datalog Rewritings beyond Horn Ontologies

Authors: Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, Ian Horrocks

Abstract: Rewriting-based approaches for answering queries over an OWL 2 DL ontology have so far been developed mainly for Horn fragments of OWL 2 DL. In this paper, we study the possibilities of answering queries over non-Horn ontologies using datalog rewritings. We prove that this is impossible in general even for very simple ontology languages, and even if PTIME = NP. Furthermore, we present a resolution… ▽ More Rewriting-based approaches for answering queries over an OWL 2 DL ontology have so far been developed mainly for Horn fragments of OWL 2 DL. In this paper, we study the possibilities of answering queries over non-Horn ontologies using datalog rewritings. We prove that this is impossible in general even for very simple ontology languages, and even if PTIME = NP. Furthermore, we present a resolution-based procedure for $\SHI$ ontologies that, in case it terminates, produces a datalog rewriting of the ontology. Our procedure necessarily terminates on DL-Lite_{bool}^H ontologies---an extension of OWL 2 QL with transitive roles and Boolean connectives. △ Less

Submitted 8 April, 2013; v1 submitted 4 April, 2013; originally announced April 2013.

Comments: 14 pages. To appear at IJCAI 2013

arXiv:1303.7430 [pdf, other]

Introducing Nominals to the Combined Query Answering Approaches for EL

Authors: Giorgio Stefanoni, Boris Motik, Ian Horrocks

Abstract: So-called combined approaches answer a conjunctive query over a description logic ontology in three steps: first, they materialise certain consequences of the ontology and the data; second, they evaluate the query over the data; and third, they filter the result of the second phase to eliminate unsound answers. Such approaches were developed for various members of the DL-Lite and the EL families o… ▽ More So-called combined approaches answer a conjunctive query over a description logic ontology in three steps: first, they materialise certain consequences of the ontology and the data; second, they evaluate the query over the data; and third, they filter the result of the second phase to eliminate unsound answers. Such approaches were developed for various members of the DL-Lite and the EL families of languages, but none of them can handle ontologies containing nominals. In our work, we bridge this gap and present a combined query answering approach for ELHO---a logic that contains all features of the OWL 2 EL standard apart from transitive roles and complex role inclusions. This extension is nontrivial because nominals require equality reasoning, which introduces complexity into the first and the third step. Our empirical evaluation suggests that our technique is suitable for practical application, and so it provides a practical basis for conjunctive query answering in a large fragment of OWL 2 EL. △ Less

Submitted 1 April, 2013; v1 submitted 29 March, 2013; originally announced March 2013.

Comments: Extended version of a paper to appear on AAAI-13

arXiv:1201.4089 [pdf, ps, other]

A Description Logic Primer

Authors: Markus Krötzsch, Frantisek Simancik, Ian Horrocks

Abstract: This paper provides a self-contained first introduction to description logics (DLs). The main concepts and features are explained with examples before syntax and semantics of the DL SROIQ are defined in detail. Additional sections review light-weight DL languages, discuss the relationship to the Web Ontology Language OWL and give pointers to further reading. This paper provides a self-contained first introduction to description logics (DLs). The main concepts and features are explained with examples before syntax and semantics of the DL SROIQ are defined in detail. Additional sections review light-weight DL languages, discuss the relationship to the Web Ontology Language OWL and give pointers to further reading. △ Less

Submitted 3 June, 2013; v1 submitted 19 January, 2012; originally announced January 2012.

ACM Class: I.2.4; F.4.1

arXiv:1111.0049 [pdf, ps, other]

doi 10.1613/jair.2372

Conjunctive Query Answering for the Description Logic SHIQ

Authors: Birte Glimm, Ian Horrocks, Carsten Lutz, Ulrike Sattler

Abstract: Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHI… ▽ More Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP. △ Less

Submitted 31 October, 2011; originally announced November 2011.

Journal ref: Journal Of Artificial Intelligence Research, Volume 31, pages 157-204, 2008

arXiv:1111.0039 [pdf, ps, other]

doi 10.1613/jair.2279

Reasoning with Very Expressive Fuzzy Description Logics

Authors: I. Horrocks, J. Z. Pan, G. Stamou, G. Stoilos, V. Tzouvaras

Abstract: It is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to their decidability and the existence of empirically high performance of reasoning algorithms. In this… ▽ More It is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to their decidability and the existence of empirically high performance of reasoning algorithms. In this paper, we extend the well known fuzzy ALC DL to the fuzzy SHIN DL, which extends the fuzzy ALC DL with transitive role axioms (S), inverse roles (I), role hierarchies (H) and number restrictions (N). We illustrate why transitive role axioms are difficult to handle in the presence of fuzzy interpretations and how to handle them properly. Then we extend these results by adding role hierarchies and finally number restrictions. The main contributions of the paper are the decidability proof of the fuzzy DL languages fuzzy-SI and fuzzy-SHIN, as well as decision procedures for the knowledge base satisfiability problem of the fuzzy-SI and fuzzy-SHIN. △ Less

Submitted 31 October, 2011; originally announced November 2011.

Journal ref: Journal Of Artificial Intelligence Research, Volume 30, pages 273-320, 2007

arXiv:cs/0005014 [pdf, ps, other]

Practical Reasoning for Expressive Description Logics

Authors: Ian Horrocks, Ulrike Sattler, Stephan Tobies

Abstract: Description Logics (DLs) are a family of knowledge representation formalisms mainly characterised by constructors to build complex concepts and roles from atomic ones. Expressive role constructors are important in many applications, but can be computationally problematical. We present an algorithm that decides satisfiability of the DL ALC extended with transitive and inverse roles, role hierarch… ▽ More Description Logics (DLs) are a family of knowledge representation formalisms mainly characterised by constructors to build complex concepts and roles from atomic ones. Expressive role constructors are important in many applications, but can be computationally problematical. We present an algorithm that decides satisfiability of the DL ALC extended with transitive and inverse roles, role hierarchies, and qualifying number restrictions. Early experiments indicate that this algorithm is well-suited for implementation. Additionally, we show that ALC extended with just transitive and inverse roles is still in PSPACE. Finally, we investigate the limits of decidability for this family of DLs. △ Less

Submitted 10 May, 2000; originally announced May 2000.

Comments: This paper appeared in the Proceedings of LPAR'99

ACM Class: F.4.1, I.2.3, I.2.4

Showing 1–50 of 52 results for author: Horrocks, I