subscribe to arXiv mailings

What can knowledge graph alignment gain with Neuro-Symbolic learning approaches?

Authors: Pedro Giesteira Cotovio, Ernesto Jimenez-Ruiz, Catia Pesquita

Abstract: Knowledge Graphs (KG) are the backbone of many data-intensive applications since they can represent data coupled with its meaning and context. Aligning KGs across different domains and providers is necessary to afford a fuller and integrated representation. A severe limitation of current KG alignment (KGA) algorithms is that they fail to articulate logical thinking and reasoning with lexical, stru… ▽ More Knowledge Graphs (KG) are the backbone of many data-intensive applications since they can represent data coupled with its meaning and context. Aligning KGs across different domains and providers is necessary to afford a fuller and integrated representation. A severe limitation of current KG alignment (KGA) algorithms is that they fail to articulate logical thinking and reasoning with lexical, structural, and semantic data learning. Deep learning models are increasingly popular for KGA inspired by their good performance in other tasks, but they suffer from limitations in explainability, reasoning, and data efficiency. Hybrid neurosymbolic learning models hold the promise of integrating logical and data perspectives to produce high-quality alignments that are explainable and support validation through human-centric approaches. This paper examines the current state of the art in KGA and explores the potential for neurosymbolic integration, highlighting promising research directions for combining these fields. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.17255 [pdf, other]

doi 10.4230/TGDK.1.1.5

Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and Opportunities

Authors: Jiaoyan Chen, Hang Dong, Janna Hastings, Ernesto Jiménez-Ruiz, Vanessa López, Pierre Monnin, Catia Pesquita, Petr Škoda, Valentina Tamma

Abstract: The term life sciences refers to the disciplines that study living organisms and life processes, and include chemistry, biology, medicine, and a range of other related disciplines. Research efforts in life sciences are heavily data-driven, as they produce and consume vast amounts of scientific data, much of which is intrinsically relational and graph-structured. The volume of data and the comple… ▽ More The term life sciences refers to the disciplines that study living organisms and life processes, and include chemistry, biology, medicine, and a range of other related disciplines. Research efforts in life sciences are heavily data-driven, as they produce and consume vast amounts of scientific data, much of which is intrinsically relational and graph-structured. The volume of data and the complexity of scientific concepts and relations referred to therein promote the application of advanced knowledge-driven technologies for managing and interpreting data, with the ultimate aim to advance scientific discovery. In this survey and position paper, we discuss recent developments and advances in the use of graph-based technologies in life sciences and set out a vision for how these technologies will impact these fields into the future. We focus on three broad topics: the construction and management of Knowledge Graphs (KGs), the use of KGs and associated technologies in the discovery of new knowledge, and the use of KGs in artificial intelligence applications to support explanations (explainable AI). We select a few exemplary use cases for each topic, discuss the challenges and open research questions within these topics, and conclude with a perspective and outlook that summarizes the overarching challenges and their potential solutions as a guide for future research. △ Less

Submitted 20 December, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: 33 pages, 1 figure, camera-ready version, accepted for Transactions on Graph Data and Knowledge (TGDK)

ACM Class: I.2.4; J.3

arXiv:2305.13258 [pdf, other]

NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

Authors: David Herron, Ernesto Jiménez-Ruiz, Giacomo Tarroni, Tillman Weyde

Abstract: NeSy4VRD is a multifaceted resource designed to support the development of neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images of the VRD dataset and couples them with an extensively revised, quality-improved version of the VRD visual relationship annotations. Crucially, NeSy4VRD provides a well-aligned, companion OWL ontology that describes the dataset domain.It… ▽ More NeSy4VRD is a multifaceted resource designed to support the development of neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images of the VRD dataset and couples them with an extensively revised, quality-improved version of the VRD visual relationship annotations. Crucially, NeSy4VRD provides a well-aligned, companion OWL ontology that describes the dataset domain.It comes with open source infrastructure that provides comprehensive support for extensibility of the annotations (which, in turn, facilitates extensibility of the ontology), and open source code for loading the annotations to/from a knowledge graph. We are contributing NeSy4VRD to the computer vision, NeSy and Semantic Web communities to help foster more NeSy research using OWL-based knowledge graphs. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2302.06761 [pdf, other]

Language Model Analysis for Ontology Subsumption Inference

Authors: Yuan He, Jiaoyan Chen, Ernesto Jiménez-Ruiz, Hang Dong, Ian Horrocks

Abstract: Investigating whether pre-trained language models (LMs) can function as knowledge bases (KBs) has raised wide research interests recently. However, existing works focus on simple, triple-based, relational KBs, but omit more sophisticated, logic-based, conceptualised KBs such as OWL ontologies. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tas… ▽ More Investigating whether pre-trained language models (LMs) can function as knowledge bases (KBs) has raised wide research interests recently. However, existing works focus on simple, triple-based, relational KBs, but omit more sophisticated, logic-based, conceptualised KBs such as OWL ontologies. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tasks and datasets from ontology subsumption axioms involving both atomic and complex concepts. We conduct extensive experiments on ontologies of different domains and scales, and our results demonstrate that LMs encode relatively less background knowledge of Subsumption Inference (SI) than traditional Natural Language Inference (NLI) but can improve on SI significantly when a small number of samples are given. We will open-source our code and datasets. △ Less

Submitted 8 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: Accepted at Findings of ACL 2023; OntoLAMA Datasets are available at: https://huggingface.co/datasets/krr-oxford/OntoLAMA (Huggingface) or https://doi.org/10.5281/zenodo.6480540 (Zenodo)

arXiv:2211.00192 [pdf, other]

AI Assistants: A Framework for Semi-Automated Data Wrangling

Authors: Tomas Petricek, Gerrit J. J. van den Burg, Alfredo Nazábal, Taha Ceritli, Ernesto Jiménez-Ruiz, Christopher K. I. Williams

Abstract: Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline… ▽ More Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline data wrangling. An AI assistant guides the analyst through a specific data wrangling task by recommending a suitable data transformation that respects the constraints obtained through interaction with the analyst. We formally define the structure of AI assistants and describe how existing tools that treat data cleaning as an optimization problem fit the definition. We implement AI assistants for four common data wrangling tasks and make AI assistants easily accessible to data analysts in an open-source notebook environment for data science, by leveraging the common structure they follow. We evaluate our AI assistants both quantitatively and qualitatively through three example scenarios. We show that the unified and interactive design makes it easy to perform tasks that would be difficult to do manually or with a fully automatic tool. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering

arXiv:2210.15985 [pdf, other]

Understanding Adverse Biological Effect Predictions Using Knowledge Graphs

Authors: Erik Bryhn Myklebust, Ernesto Jimenez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen

Abstract: Extrapolation of adverse biological (toxic) effects of chemicals is an important contribution to expand available hazard data in (eco)toxicology without the use of animals in laboratory experiments. In this work, we extrapolate effects based on a knowledge graph (KG) consisting of the most relevant effect data as domain-specific background knowledge. An effect prediction model, with and without ba… ▽ More Extrapolation of adverse biological (toxic) effects of chemicals is an important contribution to expand available hazard data in (eco)toxicology without the use of animals in laboratory experiments. In this work, we extrapolate effects based on a knowledge graph (KG) consisting of the most relevant effect data as domain-specific background knowledge. An effect prediction model, with and without background knowledge, was used to predict mean adverse biological effect concentration of chemicals as a prototypical type of stressors. The background knowledge improves the model prediction performance by up to 40\% in terms of $R^2$ (\ie coefficient of determination). We use the KG and KG embeddings to provide quantitative and qualitative insights into the predictions. These insights are expected to improve the confidence in effect prediction. Larger scale implementation of such extrapolation models should be expected to support hazard and risk assessment, by simplifying and reducing testing needs. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Under review. 29 pages

arXiv:2209.11089 [pdf, other]

doi 10.1007/978-3-031-11609-4_23

Query-based Industrial Analytics over Knowledge Graphs with Ontology Reshaping

Authors: Zhuoxun Zheng, Baifan Zhou, Dongzhuoran Zhou, Gong Cheng, Ernesto Jiménez-Ruiz, Ahmet Soylu, Evgeny Kharlamo

Abstract: Industrial analytics that includes among others equipment diagnosis and anomaly detection heavily relies on integration of heterogeneous production data. Knowledge Graphs (KGs) as the data format and ontologies as the unified data schemata are a prominent solution that offers high quality data integration and a convenient and standardised way to exchange data and to layer analytical applications o… ▽ More Industrial analytics that includes among others equipment diagnosis and anomaly detection heavily relies on integration of heterogeneous production data. Knowledge Graphs (KGs) as the data format and ontologies as the unified data schemata are a prominent solution that offers high quality data integration and a convenient and standardised way to exchange data and to layer analytical applications over it. However, poor design of ontologies of high degree of mismatch between them and industrial data naturally lead to KGs of low quality that impede the adoption and scalability of industrial analytics. Indeed, such KGs substantially increase the training time of writing queries for users, consume high volume of storage for redundant information, and are hard to maintain and update. To address this problem we propose an ontology reshaping approach to transform ontologies into KG schemata that better reflect the underlying data and thus help to construct better KGs. In this poster we present a preliminary discussion of our on-going research, evaluate our approach with a rich set of SPARQL queries on real-world industry data at Bosch and discuss our findings. △ Less

Submitted 22 September, 2022; originally announced September 2022.

arXiv:2205.03447 [pdf, ps, other]

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

Authors: Yuan He, Jiaoyan Chen, Hang Dong, Ernesto Jiménez-Ruiz, Ali Hadian, Ian Horrocks

Abstract: Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limi… ▽ More Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022. △ Less

Submitted 22 July, 2023; v1 submitted 6 May, 2022; originally announced May 2022.

Comments: Accepted paper (Best Resource Paper Candidate) in the 21st International Semantic Web Conference (ISWC-2022); Bio-ML Dataset: https://doi.org/10.5281/zenodo.6510086

arXiv:2202.09791 [pdf, other]

Contextual Semantic Embeddings for Ontology Subsumption Prediction

Authors: Jiaoyan Chen, Yuan He, Yuxia Geng, Ernesto Jimenez-Ruiz, Hang Dong, Ian Horrocks

Abstract: Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumpti… ▽ More Automating ontology construction and curation is an important but challenging task in knowledge engineering and artificial intelligence. Prediction by machine learning techniques such as contextual semantic embedding is a promising direction, but the relevant research is still preliminary especially for expressive ontologies in Web Ontology Language (OWL). In this paper, we present a new subsumption prediction method named BERTSubs for classes of OWL ontology. It exploits the pre-trained language model BERT to compute contextual embeddings of a class, where customized templates are proposed to incorporate the class context (e.g., neighbouring classes) and the logical existential restriction. BERTSubs is able to predict multiple kinds of subsumers including named classes from the same ontology or another ontology, and existential restrictions from the same ontology. Extensive evaluation on five real-world ontologies for three different subsumption tasks has shown the effectiveness of the templates and that BERTSubs can dramatically outperform the baselines that use (literal-aware) knowledge graph embeddings, non-contextual word embeddings and the state-of-the-art OWL ontology embeddings. △ Less

Submitted 18 March, 2023; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: Accepted by World Wide Web Journal

arXiv:2112.07051 [pdf]

doi 10.1093/database/baac035

A Simple Standard for Sharing Ontological Mappings (SSSOM)

Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Corresponding author: Christopher J. Mungall <cjmungall@lbl.gov>

arXiv:2112.04605 [pdf, other]

doi 10.3233/SW-222804

Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings

Authors: Erik B. Myklebust, Ernesto Jiménez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen

Abstract: We have created a knowledge graph based on major data sources used in ecotoxicological risk assessment. We have applied this knowledge graph to an important task in risk assessment, namely chemical effect prediction. We have evaluated nine knowledge graph embedding models from a selection of geometric, decomposition, and convolutional models on this prediction task. We show that using knowledge gr… ▽ More We have created a knowledge graph based on major data sources used in ecotoxicological risk assessment. We have applied this knowledge graph to an important task in risk assessment, namely chemical effect prediction. We have evaluated nine knowledge graph embedding models from a selection of geometric, decomposition, and convolutional models on this prediction task. We show that using knowledge graph embeddings can increase the accuracy of effect prediction with neural networks. Furthermore, we have implemented a fine-tuning architecture that adapts the knowledge graph embeddings to the effect prediction task and leads to better performance. Finally, we evaluate certain characteristics of the knowledge graph embedding models to shed light on the individual model performance. △ Less

Submitted 30 March, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: Semantic Web, vol. Pre-press, no. Pre-press, pp. 1-40, 2022

ACM Class: I.2

arXiv:2009.14654 [pdf, other]

OWL2Vec*: Embedding of OWL Ontologies

Authors: Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, Ian Horrocks

Abstract: Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies which can express a much wider range of semantics than knowledge graphs and have… ▽ More Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies which can express a much wider range of semantics than knowledge graphs and have been widely adopted in domains such as bioinformatics. In this paper, we propose a random walk and word embedding based ontology embedding method named OWL2Vec*, which encodes the semantics of an OWL ontology by taking into account its graph structure, lexical information and logical constructors. Our empirical evaluation with three real world datasets suggests that OWL2Vec* benefits from these three different aspects of an ontology in class membership prediction and class subsumption prediction tasks. Furthermore, OWL2Vec* often significantly outperforms the state-of-the-art methods in our experiments. △ Less

Submitted 25 January, 2021; v1 submitted 30 September, 2020; originally announced September 2020.

arXiv:2003.05370 [pdf, other]

Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-based Modules

Authors: Ernesto Jiménez-Ruiz, Asan Agibetov, Jiaoyan Chen, Matthias Samwald, Valerie Cross

Abstract: Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluatio… ▽ More Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed method is adequate in practice and can be integrated within the workflow of systems unable to cope with very large ontologies. △ Less

Submitted 25 February, 2020; originally announced March 2020.

Comments: Accepted to the 24th European Conference on Artificial Intelligence (ECAI 2020). arXiv admin note: text overlap with arXiv:1805.12402

ACM Class: I.2

arXiv:2001.06917 [pdf, other]

doi 10.1145/3366423.3380226

Correcting Knowledge Base Assertions

Authors: Jiaoyan Chen, Xi Chen, Ian Horrocks, Ernesto Jimenez-Ruiz, Erik B. Myklebus

Abstract: The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking.… ▽ More The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: Accepted by The Web Conference (WWW) 2020

ACM Class: I.2

arXiv:1908.10128 [pdf, other]

TERA: the Toxicological Effect and Risk Assessment Knowledge Graph

Authors: Erik Bryhn Myklebust, Ernesto Jimenez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen

Abstract: Ecological risk assessment requires large amounts of chemical effect data from laboratory experiments. Due to experimental effort and animal welfare concerns it is desired to extrapolate data from existing sources. To cover the required chemical effect data several data sources need to be integrated to enable their interoperability. In this paper we introduce the Toxicological Effect and Risk Asse… ▽ More Ecological risk assessment requires large amounts of chemical effect data from laboratory experiments. Due to experimental effort and animal welfare concerns it is desired to extrapolate data from existing sources. To cover the required chemical effect data several data sources need to be integrated to enable their interoperability. In this paper we introduce the Toxicological Effect and Risk Assessment (TERA) knowledge graph, which aims at providing such integrated view, and the data preparation and steps followed to construct this knowledge graph. We also present the applications of TERA for chemical effect prediction and the potential applications within the Semantic Web community. △ Less

Submitted 12 December, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: Submitted to a conference

arXiv:1907.01328 [pdf, other]

doi 10.1007/978-3-030-30796-7_30

Knowledge Graph Embedding for Ecotoxicological Effect Prediction

Authors: Erik Bryhn Myklebust, Ernesto Jimenez-Ruiz, Jiaoyan Chen, Raoul Wolf, Knut Erik Tollefsen

Abstract: Exploring the effects a chemical compound has on a species takes a considerable experimental effort. Appropriate methods for estimating and suggesting new effects can dramatically reduce the work needed to be done by a laboratory. In this paper we explore the suitability of using a knowledge graph embedding approach for ecotoxicological effect prediction. A knowledge graph has been constructed fro… ▽ More Exploring the effects a chemical compound has on a species takes a considerable experimental effort. Appropriate methods for estimating and suggesting new effects can dramatically reduce the work needed to be done by a laboratory. In this paper we explore the suitability of using a knowledge graph embedding approach for ecotoxicological effect prediction. A knowledge graph has been constructed from publicly available data sets, including a species taxonomy and chemical classification and similarity. The publicly available effect data is integrated to the knowledge graph using ontology alignment techniques. Our experimental results show that the knowledge graph based approach improves the selected baselines. △ Less

Submitted 11 November, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Journal ref: In: Ghidini C. et al. (eds) The Semantic Web - ISWC 2019. ISWC 2019. Lecture Notes in Computer Science, vol 11779. Springer, Cham

arXiv:1906.11180 [pdf, other]

Canonicalizing Knowledge Base Literals

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks

Abstract: Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that i… ▽ More Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Journal ref: International Semantic Web Conference (ISWC) 2019

arXiv:1906.00781 [pdf, other]

Learning Semantic Annotations for Tabular Data

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton

Abstract: The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), a… ▽ More The usefulness of tabular data such as web tables critically depends on understanding their semantics. This study focuses on column type prediction for tables without any meta data. Unlike traditional lexical matching-based methods, we propose a deep prediction model that can fully exploit a table's contextual semantics, including table locality features learned by a Hybrid Neural Network (HNN), and inter-column semantics features learned by a knowledge base (KB) lookup and query answering algorithm.It exhibits good performance not only on individual table sets, but also when transferring from one table set to another. △ Less

Submitted 30 May, 2019; originally announced June 2019.

Comments: 7 pages

Journal ref: IJCAI 2019

arXiv:1901.08547 [pdf]

Human-centric Transfer Learning Explanation via Knowledge Graph [Extended Abstract]

Authors: Yuxia Geng, Jiaoyan Chen, Ernesto Jimenez-Ruiz, Huajun Chen

Abstract: Transfer learning which aims at utilizing knowledge learned from one problem (source domain) to solve another different but related problem (target domain) has attracted wide research attentions. However, the current transfer learning methods are mostly uninterpretable, especially to people without ML expertise. In this extended abstract, we brief introduce two knowledge graph (KG) based framework… ▽ More Transfer learning which aims at utilizing knowledge learned from one problem (source domain) to solve another different but related problem (target domain) has attracted wide research attentions. However, the current transfer learning methods are mostly uninterpretable, especially to people without ML expertise. In this extended abstract, we brief introduce two knowledge graph (KG) based frameworks towards human understandable transfer learning explanation. The first one explains the transferability of features learned by Convolutional Neural Network (CNN) from one domain to another through pre-training and fine-tuning, while the second justifies the model of a target domain predicted by models from multiple source domains in zero-shot learning (ZSL). Both methods utilize KG and its reasoning capability to provide rich and human understandable explanations to the transfer procedure. △ Less

Submitted 20 January, 2019; originally announced January 2019.

Comments: In AAAI-19 Workshop on Network Interpretability for Deep Learning

arXiv:1811.01304 [pdf, other]

ColNet: Embedding the Semantics of Web Tables for Column Type Prediction

Authors: Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Charles Sutton

Abstract: Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation frame… ▽ More Automatically annotating column types with knowledge base (KB) concepts is a critical task to gain a basic understanding of web tables. Current methods rely on either table metadata like column name or entity correspondences of cells in the KB, and may fail to deal with growing web tables with incomplete meta information. In this paper we propose a neural network based column type annotation framework named ColNet which is able to integrate KB reasoning and lookup with machine learning and can automatically train Convolutional Neural Networks for prediction. The prediction model not only considers the contextual semantics within a cell using word representation, but also embeds the semantics of a column by learning locality features from multiple cells. The method is evaluated with DBPedia and two different web table datasets, T2Dv2 from the general Web and Limaye from Wikipedia pages, and achieves higher performance than the state-of-the-art approaches. △ Less

Submitted 14 November, 2018; v1 submitted 3 November, 2018; originally announced November 2018.

Comments: AAAI 2019

arXiv:1805.12402 [pdf, ps, other]

Breaking-down the Ontology Alignment Task with a Lexical Index and Neural Embeddings

Authors: Ernesto Jimenez-Ruiz, Asan Agibetov, Matthias Samwald, Valerie Cross

Abstract: Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In the paper we present an approach that combines a lexical index, a neural embedding model and locality modules to effectively divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Align… ▽ More Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In the paper we present an approach that combines a lexical index, a neural embedding model and locality modules to effectively divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed methods are adequate in practice and can be integrated within the workflow of state-of-the-art systems. △ Less

Submitted 31 May, 2018; originally announced May 2018.

arXiv:1208.3148 [pdf, other]

Evaluating Ontology Matching Systems on Large, Multilingual and Real-world Test Cases

Authors: Christian Meilicke, Ondrej Sváb-Zamazal, Cássia Trojahn, Ernesto Jiménez-Ruiz, José-Luis Aguirre, Heiner Stuckenschmidt, Bernardo Cuenca Grau

Abstract: In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign ar… ▽ More In the field of ontology matching, the most systematic evaluation of matching systems is established by the Ontology Alignment Evaluation Initiative (OAEI), which is an annual campaign for evaluating ontology matching systems organized by different groups of researchers. In this paper, we report on the results of an intermediary OAEI campaign called OAEI 2011.5. The evaluations of this campaign are divided in five tracks. Three of these tracks are new or have been improved compared to previous OAEI campaigns. Overall, we evaluated 18 matching systems. We discuss lessons learned, in terms of scalability, multilingual issues and the ability do deal with real world cases from different domains. △ Less

Submitted 15 August, 2012; originally announced August 2012.

Comments: Technical Report of the OAEI 2011.5 Evaluation Campaign

arXiv:1012.1659 [pdf, other]

First steps in the logic-based assessment of post-composed phenotypic descriptions

Authors: Ernesto Jimenez-Ruiz, Bernardo Cuenca Grau, Rafael Berlanga, Dietrich Rebholz-Schuhmann

Abstract: In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies. In this paper we present a preliminary logic-based evaluation of the integration of post-composed phenotypic descriptions with domain ontologies. The evaluation has been performed using a description logic reasoner together with scalable techniques: ontology modularization and approximations of the logical difference between ontologies. △ Less

Submitted 7 December, 2010; originally announced December 2010.

Comments: in Adrian Paschke, Albert Burger, Andrea Splendiani, M. Scott Marshall, Paolo Romano: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences, Berlin,Germany, December 8-10, 2010

Report number: SWAT4LS 2010 ACM Class: J.3

arXiv:1012.1609 [pdf, other]

Building conceptual spaces for exploring and linking biomedical resources

Authors: R. Berlanga, E. Jimenez-Ruiz, V. Nebot

Abstract: The establishment of links between data (e.g., patient records) and Web resources (e.g., literature) and the proper visualization of such discovered knowledge is still a challenge in most Life Science domains (e.g., biomedicine). In this paper we present our contribution to the community in the form of an infrastructure to annotate information resources, to discover relationships among them, and t… ▽ More The establishment of links between data (e.g., patient records) and Web resources (e.g., literature) and the proper visualization of such discovered knowledge is still a challenge in most Life Science domains (e.g., biomedicine). In this paper we present our contribution to the community in the form of an infrastructure to annotate information resources, to discover relationships among them, and to represent and visualize the new discovered knowledge. Furthermore, we have also implemented a Web-based prototype tool which integrates the proposed infrastructure. △ Less

Submitted 7 December, 2010; originally announced December 2010.

Comments: in Adrian Paschke, Albert Burger, Andrea Splendiani, M. Scott Marshall, Paolo Romano: Proceedings of the 3rd International Workshop on Semantic Web Applications and Tools for the Life Sciences, Berlin,Germany, December 8-10, 2010

Report number: SWAT4LS 2010 ACM Class: J.3

arXiv:cs/0609144 [pdf]

The Management and Integration of Biomedical Knowledge: Application in the Health-e-Child Project (Position Paper)

Authors: E. Jimenez-Ruiz, R. Berlanga, I. Sanz, R. McClatchey, R. Danger, D. Manset, J. Paraire, A. Rios

Abstract: The Health-e-Child project aims to develop an integrated healthcare platform for European paediatrics. In order to achieve a comprehensive view of childrens health, a complex integration of biomedical data, information, and knowledge is necessary. Ontologies will be used to formally define this domain knowledge and will form the basis for the medical knowledge management system. This paper intro… ▽ More The Health-e-Child project aims to develop an integrated healthcare platform for European paediatrics. In order to achieve a comprehensive view of childrens health, a complex integration of biomedical data, information, and knowledge is necessary. Ontologies will be used to formally define this domain knowledge and will form the basis for the medical knowledge management system. This paper introduces an innovative methodology for the vertical integration of biomedical knowledge. This approach will be largely clinician-centered and will enable the definition of ontology fragments, connections between them (semantic bridges) and enriched ontology fragments (views). The strategy for the specification and capture of fragments, bridges and views is outlined with preliminary examples demonstrated in the collection of biomedical information from hospital databases, biomedical ontologies, and biomedical public databases. △ Less

Submitted 26 September, 2006; originally announced September 2006.

Comments: 6 pages; 2 figures. Proceedings of the 1st International Workshop on Ontology content and evaluation in Enterprise

ACM Class: H.2.4; J.3

Showing 1–25 of 25 results for author: Jimenez-Ruiz, E