-
Research Trends for the Interplay between Large Language Models and Knowledge Graphs
Authors:
Hanieh Khorashadizadeh,
Fatima Zahra Amara,
Morteza Ezzabady,
Frédéric Ieng,
Sanju Tiwari,
Nandana Mihindukulasooriya,
Jinghua Groppe,
Soror Sahri,
Farah Benamara,
Sven Groppe
Abstract:
This survey investigates the synergistic relationship between Large Language Models (LLMs) and Knowledge Graphs (KGs), which is crucial for advancing AI's capabilities in understanding, reasoning, and language processing. It aims to address gaps in current research by exploring areas such as KG Question Answering, ontology generation, KG validation, and the enhancement of KG accuracy and consisten…
▽ More
This survey investigates the synergistic relationship between Large Language Models (LLMs) and Knowledge Graphs (KGs), which is crucial for advancing AI's capabilities in understanding, reasoning, and language processing. It aims to address gaps in current research by exploring areas such as KG Question Answering, ontology generation, KG validation, and the enhancement of KG accuracy and consistency through LLMs. The paper further examines the roles of LLMs in generating descriptive texts and natural language queries for KGs. Through a structured analysis that includes categorizing LLM-KG interactions, examining methodologies, and investigating collaborative uses and potential biases, this study seeks to provide new insights into the combined potential of LLMs and KGs. It highlights the importance of their interaction for improving AI applications and outlines future research directions.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Matching Table Metadata with Business Glossaries Using Large Language Models
Authors:
Elita Lobo,
Oktie Hassanzadeh,
Nhan Pham,
Nandana Mihindukulasooriya,
Dharmashankar Subramanian,
Horst Samulowitz
Abstract:
Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents and, therefore, limit the application of classic retrieval and analysis solutions. As a result, there is a need for solutions that can effectively utilize the av…
▽ More
Enterprises often own large collections of structured data in the form of large databases or an enterprise data lake. Such data collections come with limited metadata and strict access policies that could limit access to the data contents and, therefore, limit the application of classic retrieval and analysis solutions. As a result, there is a need for solutions that can effectively utilize the available metadata. In this paper, we study the problem of matching table metadata to a business glossary containing data labels and descriptions. The resulting matching enables the use of an available or curated business glossary for retrieval and analysis without or before requesting access to the data contents. One solution to this problem is to use manually-defined rules or similarity measures on column names and glossary descriptions (or their vector embeddings) to find the closest match. However, such approaches need to be tuned through manual labeling and cannot handle many business glossaries that contain a combination of simple as well as complex and long descriptions. In this work, we leverage the power of large language models (LLMs) to design generic matching methods that do not require manual tuning and can identify complex relations between column names and glossaries. We propose methods that utilize LLMs in two ways: a) by generating additional context for column names that can aid with matching b) by using LLMs to directly infer if there is a relation between column names and glossary descriptions. Our preliminary experimental results show the effectiveness of our proposed methods.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text
Authors:
Nandana Mihindukulasooriya,
Sanju Tiwari,
Carlos F. Enguix,
Kusum Lata
Abstract:
The recent advances in large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs can be used for KG construction or completion while existing KGs can be used for different tasks such as making LLM outputs explainable or fact-checking in Neuro-Symb…
▽ More
The recent advances in large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks. LLMs and Knowledge Graphs (KG) can complement each other such that LLMs can be used for KG construction or completion while existing KGs can be used for different tasks such as making LLM outputs explainable or fact-checking in Neuro-Symbolic manner. In this paper, we present Text2KGBench, a benchmark to evaluate the capabilities of language models to generate KGs from natural language text guided by an ontology. Given an input ontology and a set of sentences, the task is to extract facts from the text while complying with the given ontology (concepts, relations, domain/range constraints) and being faithful to the input sentences. We provide two datasets (i) Wikidata-TekGen with 10 ontologies and 13,474 sentences and (ii) DBpedia-WebNLG with 19 ontologies and 4,860 sentences. We define seven evaluation metrics to measure fact extraction performance, ontology conformance, and hallucinations by LLMs. Furthermore, we provide results for two baseline models, Vicuna-13B and Alpaca-LoRA-13B using automatic prompt generation from test cases. The baseline results show that there is room for improvement using both Semantic Web and Natural Language Processing techniques.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Finspector: A Human-Centered Visual Inspection Tool for Exploring and Comparing Biases among Foundation Models
Authors:
Bum Chul Kwon,
Nandana Mihindukulasooriya
Abstract:
Pre-trained transformer-based language models are becoming increasingly popular due to their exceptional performance on various benchmarks. However, concerns persist regarding the presence of hidden biases within these models, which can lead to discriminatory outcomes and reinforce harmful stereotypes. To address this issue, we propose Finspector, a human-centered visual inspection tool designed t…
▽ More
Pre-trained transformer-based language models are becoming increasingly popular due to their exceptional performance on various benchmarks. However, concerns persist regarding the presence of hidden biases within these models, which can lead to discriminatory outcomes and reinforce harmful stereotypes. To address this issue, we propose Finspector, a human-centered visual inspection tool designed to detect biases in different categories through log-likelihood scores generated by language models. The goal of the tool is to enable researchers to easily identify potential biases using visual analytics, ultimately contributing to a fairer and more just deployment of these models in both academic and industrial settings. Finspector is available at https://github.com/IBM/finspector.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Exploring In-Context Learning Capabilities of Foundation Models for Generating Knowledge Graphs from Text
Authors:
Hanieh Khorashadizadeh,
Nandana Mihindukulasooriya,
Sanju Tiwari,
Jinghua Groppe,
Sven Groppe
Abstract:
Knowledge graphs can represent information about the real-world using entities and their relations in a structured and semantically rich manner and they enable a variety of downstream applications such as question-answering, recommendation systems, semantic search, and advanced analytics. However, at the moment, building a knowledge graph involves a lot of manual effort and thus hinders their appl…
▽ More
Knowledge graphs can represent information about the real-world using entities and their relations in a structured and semantically rich manner and they enable a variety of downstream applications such as question-answering, recommendation systems, semantic search, and advanced analytics. However, at the moment, building a knowledge graph involves a lot of manual effort and thus hinders their application in some situations and the automation of this process might benefit especially for small organizations. Automatically generating structured knowledge graphs from a large volume of natural language is still a challenging task and the research on sub-tasks such as named entity extraction, relation extraction, entity and relation linking, and knowledge graph construction aims to improve the state of the art of automatic construction and completion of knowledge graphs from text. The recent advancement of foundation models with billions of parameters trained in a self-supervised manner with large volumes of training data that can be adapted to a variety of downstream tasks has helped to demonstrate high performance on a large range of Natural Language Processing (NLP) tasks. In this context, one emerging paradigm is in-context learning where a language model is used as it is with a prompt that provides instructions and some examples to perform a task without changing the parameters of the model using traditional approaches such as fine-tuning. This way, no computing resources are needed for re-training/fine-tuning the models and the engineering effort is minimal. Thus, it would be beneficial to utilize such capabilities for generating knowledge graphs from text.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Scaling Knowledge Graphs for Automating AI of Digital Twins
Authors:
Joern Ploennigs,
Konstantinos Semertzidis,
Fabio Lorenzi,
Nandana Mihindukulasooriya
Abstract:
Digital Twins are digital representations of systems in the Internet of Things (IoT) that are often based on AI models that are trained on data from those systems. Semantic models are used increasingly to link these datasets from different stages of the IoT systems life-cycle together and to automatically configure the AI modelling pipelines. This combination of semantic models with AI pipelines r…
▽ More
Digital Twins are digital representations of systems in the Internet of Things (IoT) that are often based on AI models that are trained on data from those systems. Semantic models are used increasingly to link these datasets from different stages of the IoT systems life-cycle together and to automatically configure the AI modelling pipelines. This combination of semantic models with AI pipelines running on external datasets raises unique challenges particular if rolled out at scale. Within this paper we will discuss the unique requirements of applying semantic graphs to automate Digital Twins in different practical use cases. We will introduce the benchmark dataset DTBM that reflects these characteristics and look into the scaling challenges of different knowledge graph technologies. Based on these insights we will propose a reference architecture that is in-use in multiple products in IBM and derive lessons learned for scaling knowledge graphs for configuring AI models for Digital Twins.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
KnowGL: Knowledge Generation and Linking from Text
Authors:
Gaetano Rossiello,
Md Faisal Mahbub Chowdhury,
Nandana Mihindukulasooriya,
Owen Cornec,
Alfio Massimiliano Gliozzo
Abstract:
We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions compliant with the TBox of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART. Given a sentence, we fine-tune such models to detect pairs of entity m…
▽ More
We propose KnowGL, a tool that allows converting text into structured relational data represented as a set of ABox assertions compliant with the TBox of a given Knowledge Graph (KG), such as Wikidata. We address this problem as a sequence generation task by leveraging pre-trained sequence-to-sequence language models, e.g. BART. Given a sentence, we fine-tune such models to detect pairs of entity mentions and jointly generate a set of facts consisting of the full set of semantic annotations for a KG, such as entity labels, entity types, and their relationships. To showcase the capabilities of our tool, we build a web application consisting of a set of UI widgets that help users to navigate through the semantic data extracted from a given input text. We make the KnowGL model available at https://huggingface.co/ibm/knowgl-large.
△ Less
Submitted 22 November, 2022; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Knowledge Graph Induction enabling Recommending and Trend Analysis: A Corporate Research Community Use Case
Authors:
Nandana Mihindukulasooriya,
Mike Sava,
Gaetano Rossiello,
Md Faisal Mahbub Chowdhury,
Irene Yachbes,
Aditya Gidh,
Jillian Duckwitz,
Kovit Nisar,
Michael Santos,
Alfio Gliozzo
Abstract:
A research division plays an important role of driving innovation in an organization. Drawing insights, following trends, keeping abreast of new research, and formulating strategies are increasingly becoming more challenging for both researchers and executives as the amount of information grows in both velocity and volume. In this paper we present a use case of how a corporate research community,…
▽ More
A research division plays an important role of driving innovation in an organization. Drawing insights, following trends, keeping abreast of new research, and formulating strategies are increasingly becoming more challenging for both researchers and executives as the amount of information grows in both velocity and volume. In this paper we present a use case of how a corporate research community, IBM Research, utilizes Semantic Web technologies to induce a unified Knowledge Graph from both structured and textual data obtained by integrating various applications used by the community related to research projects, academic papers, datasets, achievements and recognition. In order to make the Knowledge Graph more accessible to application developers, we identified a set of common patterns for exploiting the induced knowledge and exposed them as APIs. Those patterns were born out of user research which identified the most valuable use cases or user pain points to be alleviated. We outline two distinct scenarios: recommendation and analytics for business use. We will discuss these scenarios in detail and provide an empirical evaluation on entity recommendation specifically. The methodology used and the lessons learned from this work can be applied to other organizations facing similar challenges.
△ Less
Submitted 15 September, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
CBR-iKB: A Case-Based Reasoning Approach for Question Answering over Incomplete Knowledge Bases
Authors:
Dung Thai,
Srinivas Ravishankar,
Ibrahim Abdelaziz,
Mudit Chaudhary,
Nandana Mihindukulasooriya,
Tahira Naseem,
Rajarshi Das,
Pavan Kapanipathi,
Achille Fokoue,
Andrew McCallum
Abstract:
Knowledge bases (KBs) are often incomplete and constantly changing in practice. Yet, in many question answering applications coupled with knowledge bases, the sparse nature of KBs is often overlooked. To this end, we propose a case-based reasoning approach, CBR-iKB, for knowledge base question answering (KBQA) with incomplete-KB as our main focus. Our method ensembles decisions from multiple reaso…
▽ More
Knowledge bases (KBs) are often incomplete and constantly changing in practice. Yet, in many question answering applications coupled with knowledge bases, the sparse nature of KBs is often overlooked. To this end, we propose a case-based reasoning approach, CBR-iKB, for knowledge base question answering (KBQA) with incomplete-KB as our main focus. Our method ensembles decisions from multiple reasoning chains with a novel nonparametric reasoning algorithm. By design, CBR-iKB can seamlessly adapt to changes in KBs without any task-specific training or fine-tuning. Our method achieves 100% accuracy on MetaQA and establishes new state-of-the-art on multiple benchmarks. For instance, CBR-iKB achieves an accuracy of 70% on WebQSP under the incomplete-KB setting, outperforming the existing state-of-the-art method by 22.3%.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
KGI: An Integrated Framework for Knowledge Intensive Language Tasks
Authors:
Md Faisal Mahbub Chowdhury,
Michael Glass,
Gaetano Rossiello,
Alfio Gliozzo,
Nandana Mihindukulasooriya
Abstract:
In this paper, we present a system to showcase the capabilities of the latest state-of-the-art retrieval augmented generation models trained on knowledge-intensive language tasks, such as slot filling, open domain question answering, dialogue, and fact-checking. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine the outputs of each oth…
▽ More
In this paper, we present a system to showcase the capabilities of the latest state-of-the-art retrieval augmented generation models trained on knowledge-intensive language tasks, such as slot filling, open domain question answering, dialogue, and fact-checking. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine the outputs of each other. Particularly, we show how accuracy in dialogue can be improved using the question answering model. We are also releasing all models used in the demo as a contribution of this paper. A short video demonstrating the system is available at https://ibm.box.com/v/emnlp2022-demo.
△ Less
Submitted 21 September, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases
Authors:
Sumit Neelam,
Udit Sharma,
Hima Karanam,
Shajith Ikbal,
Pavan Kapanipathi,
Ibrahim Abdelaziz,
Nandana Mihindukulasooriya,
Young-Suk Lee,
Santosh Srivastava,
Cezar Pendus,
Saswati Dana,
Dinesh Garg,
Achille Fokoue,
G P Shrivatsa Bhargav,
Dinesh Khandelwal,
Srinivas Ravishankar,
Sairam Gurajada,
Maria Chang,
Rosario Uceda-Sosa,
Salim Roukos,
Alexander Gray,
Guilherme Lima,
Ryan Riegel,
Francois Luus,
L Venkata Subramaniam
Abstract:
Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-…
▽ More
Knowledge Base Question Answering (KBQA) tasks that involve complex reasoning are emerging as an important research direction. However, most existing KBQA datasets focus primarily on generic multi-hop reasoning over explicit facts, largely ignoring other reasoning types such as temporal, spatial, and taxonomic reasoning. In this paper, we present a benchmark dataset for temporal reasoning, TempQA-WD, to encourage research in extending the present approaches to target a more challenging set of complex reasoning tasks. Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata. The TempQA-WD dataset is available at https://github.com/IBM/tempqa-wd.
△ Less
Submitted 15 January, 2022;
originally announced January 2022.
-
Applying a Generic Sequence-to-Sequence Model for Simple and Effective Keyphrase Generation
Authors:
Md Faisal Mahbub Chowdhury,
Gaetano Rossiello,
Michael Glass,
Nandana Mihindukulasooriya,
Alfio Gliozzo
Abstract:
In recent years, a number of keyphrase generation (KPG) approaches were proposed consisting of complex model architectures, dedicated training paradigms and decoding strategies. In this work, we opt for simplicity and show how a commonly used seq2seq language model, BART, can be easily adapted to generate keyphrases from the text in a single batch computation using a simple training procedure. Emp…
▽ More
In recent years, a number of keyphrase generation (KPG) approaches were proposed consisting of complex model architectures, dedicated training paradigms and decoding strategies. In this work, we opt for simplicity and show how a commonly used seq2seq language model, BART, can be easily adapted to generate keyphrases from the text in a single batch computation using a simple training procedure. Empirical results on five benchmarks show that our approach is as good as the existing state-of-the-art KPG systems, but using a much simpler and easy to deploy framework.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Learning to Transpile AMR into SPARQL
Authors:
Mihaela Bornea,
Ramon Fernandez Astudillo,
Tahira Naseem,
Nandana Mihindukulasooriya,
Ibrahim Abdelaziz,
Pavan Kapanipathi,
Radu Florian,
Salim Roukos
Abstract:
We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA). This allows us to delegate part of the semantic representation to a strongly pre-trained semantic parser, while learning transpiling with small amount of paired data. We depart from recent work relating AMR and SPARQL constructs, but rather than applying…
▽ More
We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA). This allows us to delegate part of the semantic representation to a strongly pre-trained semantic parser, while learning transpiling with small amount of paired data. We depart from recent work relating AMR and SPARQL constructs, but rather than applying a set of rules, we teach a BART model to selectively use these relations. Further, we avoid explicitly encoding AMR but rather encode the parser state in the attention mechanism of BART, following recent semantic parsing works. The resulting model is simple, provides supporting text for its decisions, and outperforms recent approaches in KBQA across two knowledge bases: DBPedia (LC-QuAD 1.0, QALD-9) and Wikidata (WebQSP, SWQ-WD).
△ Less
Submitted 8 December, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Semantic Answer Type and Relation Prediction Task (SMART 2021)
Authors:
Nandana Mihindukulasooriya,
Mohnish Dubey,
Alfio Gliozzo,
Jens Lehmann,
Axel-Cyrille Ngonga Ngomo,
Ricardo Usbeck,
Gaetano Rossiello,
Uttam Kumar
Abstract:
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge after a successful SMART 2020 at ISWC 2020. This yea…
▽ More
Each year the International Semantic Web Conference organizes a set of Semantic Web Challenges to establish competitions that will advance state-of-the-art solutions in some problem domains. The Semantic Answer Type and Relation Prediction Task (SMART) task is one of the ISWC 2021 Semantic Web challenges. This is the second year of the challenge after a successful SMART 2020 at ISWC 2020. This year's version focuses on two sub-tasks that are very important to Knowledge Base Question Answering (KBQA): Answer Type Prediction and Relation Prediction. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights about the expected answer that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the first task is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata. Similarly, the second task is to identify relations in the natural language query and link them to the relations in a target ontology. This paper discusses the task descriptions, benchmark datasets, and evaluation metrics. For more information, please visit https://smart-task.github.io/2021/.
△ Less
Submitted 10 January, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases
Authors:
Sumit Neelam,
Udit Sharma,
Hima Karanam,
Shajith Ikbal,
Pavan Kapanipathi,
Ibrahim Abdelaziz,
Nandana Mihindukulasooriya,
Young-Suk Lee,
Santosh Srivastava,
Cezar Pendus,
Saswati Dana,
Dinesh Garg,
Achille Fokoue,
G P Shrivatsa Bhargav,
Dinesh Khandelwal,
Srinivas Ravishankar,
Sairam Gurajada,
Maria Chang,
Rosario Uceda-Sosa,
Salim Roukos,
Alexander Gray,
Guilherme LimaRyan Riegel,
Francois Luus,
L Venkata Subramaniam
Abstract:
Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are…
▽ More
Knowledge Base Question Answering (KBQA) tasks that in-volve complex reasoning are emerging as an important re-search direction. However, most KBQA systems struggle withgeneralizability, particularly on two dimensions: (a) acrossmultiple reasoning types where both datasets and systems haveprimarily focused on multi-hop reasoning, and (b) across mul-tiple knowledge bases, where KBQA approaches are specif-ically tuned to a single knowledge base. In this paper, wepresent SYGMA, a modular approach facilitating general-izability across multiple knowledge bases and multiple rea-soning types. Specifically, SYGMA contains three high levelmodules: 1) KB-agnostic question understanding module thatis common across KBs 2) Rules to support additional reason-ing types and 3) KB-specific question mapping and answeringmodule to address the KB-specific aspects of the answer ex-traction. We demonstrate effectiveness of our system by evalu-ating on datasets belonging to two distinct knowledge bases,DBpedia and Wikidata. In addition, to demonstrate extensi-bility to additional reasoning types we evaluate on multi-hopreasoning datasets and a new Temporal KBQA benchmarkdataset on Wikidata, namedTempQA-WD1, introduced in thispaper. We show that our generalizable approach has bettercompetetive performance on multiple datasets on DBpediaand Wikidata that requires both multi-hop and temporal rea-soning
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Generative Relation Linking for Question Answering over Knowledge Bases
Authors:
Gaetano Rossiello,
Nandana Mihindukulasooriya,
Ibrahim Abdelaziz,
Mihaela Bornea,
Alfio Gliozzo,
Tahira Naseem,
Pavan Kapanipathi
Abstract:
Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel approach for relation linking framing it as a generati…
▽ More
Relation linking is essential to enable question answering over knowledge bases. Although there are various efforts to improve relation linking performance, the current state-of-the-art methods do not achieve optimal results, therefore, negatively impacting the overall end-to-end question answering performance. In this work, we propose a novel approach for relation linking framing it as a generative problem facilitating the use of pre-trained sequence-to-sequence models. We extend such sequence-to-sequence models with the idea of infusing structured data from the target knowledge base, primarily to enable these models to handle the nuances of the knowledge base. Moreover, we train the model with the aim to generate a structured output consisting of a list of argument-relation pairs, enabling a knowledge validation step. We compared our method against the existing relation linking systems on four different datasets derived from DBpedia and Wikidata. Our method reports large improvements over the state-of-the-art while using a much simpler model that can be easily adapted to different knowledge bases.
△ Less
Submitted 16 August, 2021;
originally announced August 2021.
-
Type Prediction Systems
Authors:
Sarthak Dash,
Nandana Mihindukulasooriya,
Alfio Gliozzo,
Mustafa Canim
Abstract:
Inferring semantic types for entity mentions within text documents is an important asset for many downstream NLP tasks, such as Semantic Role Labelling, Entity Disambiguation, Knowledge Base Question Answering, etc. Prior works have mostly focused on supervised solutions that generally operate on relatively small-to-medium-sized type systems. In this work, we describe two systems aimed at predicti…
▽ More
Inferring semantic types for entity mentions within text documents is an important asset for many downstream NLP tasks, such as Semantic Role Labelling, Entity Disambiguation, Knowledge Base Question Answering, etc. Prior works have mostly focused on supervised solutions that generally operate on relatively small-to-medium-sized type systems. In this work, we describe two systems aimed at predicting type information for the following two tasks, namely, a TypeSuggest module, an unsupervised system designed to predict types for a set of user-entered query terms, and an Answer Type prediction module, that provides a solution for the task of determining the correct type of the answer expected to a given query. Our systems generalize to arbitrary type systems of any sizes, thereby making it a highly appealing solution to extract type information at any granularity.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Open Knowledge Graphs Canonicalization using Variational Autoencoders
Authors:
Sarthak Dash,
Gaetano Rossiello,
Nandana Mihindukulasooriya,
Sugato Bagchi,
Alfio Gliozzo
Abstract:
Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features.…
▽ More
Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.
△ Less
Submitted 27 September, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Leveraging Abstract Meaning Representation for Knowledge Base Question Answering
Authors:
Pavan Kapanipathi,
Ibrahim Abdelaziz,
Srinivas Ravishankar,
Salim Roukos,
Alexander Gray,
Ramon Astudillo,
Maria Chang,
Cristina Cornelio,
Saswati Dana,
Achille Fokoue,
Dinesh Garg,
Alfio Gliozzo,
Sairam Gurajada,
Hima Karanam,
Naweed Khan,
Dinesh Khandelwal,
Young-Suk Lee,
Yunyao Li,
Francois Luus,
Ndivhuwo Makondo,
Nandana Mihindukulasooriya,
Tahira Naseem,
Sumit Neelam,
Lucian Popa,
Revanth Reddy
, et al. (5 additional authors not shown)
Abstract:
Knowledge base question answering (KBQA)is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AM…
▽ More
Knowledge base question answering (KBQA)is an important task in Natural Language Processing. Existing approaches face significant challenges including complex question understanding, necessity for reasoning, and lack of large end-to-end training datasets. In this work, we propose Neuro-Symbolic Question Answering (NSQA), a modular KBQA system, that leverages (1) Abstract Meaning Representation (AMR) parses for task-independent question understanding; (2) a simple yet effective graph transformation approach to convert AMR parses into candidate logical queries that are aligned to the KB; (3) a pipeline-based approach which integrates multiple, reusable modules that are trained specifically for their individual tasks (semantic parser, entity andrelationship linkers, and neuro-symbolic reasoner) and do not require end-to-end training data. NSQA achieves state-of-the-art performance on two prominent KBQA datasets based on DBpedia (QALD-9 and LC-QuAD1.0). Furthermore, our analysis emphasizes that AMR is a powerful tool for KBQA systems.
△ Less
Submitted 2 June, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
SeMantic AnsweR Type prediction task (SMART) at ISWC 2020 Semantic Web Challenge
Authors:
Nandana Mihindukulasooriya,
Mohnish Dubey,
Alfio Gliozzo,
Jens Lehmann,
Axel-Cyrille Ngonga Ngomo,
Ricardo Usbeck
Abstract:
Each year the International Semantic Web Conference accepts a set of Semantic Web Challenges to establish competitions that will advance the state of the art solutions in any given problem domain. The SeMantic AnsweR Type prediction task (SMART) was part of ISWC 2020 challenges. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insi…
▽ More
Each year the International Semantic Web Conference accepts a set of Semantic Web Challenges to establish competitions that will advance the state of the art solutions in any given problem domain. The SeMantic AnsweR Type prediction task (SMART) was part of ISWC 2020 challenges. Question type and answer type prediction can play a key role in knowledge base question answering systems providing insights that are helpful to generate correct queries or rank the answer candidates. More concretely, given a question in natural language, the task of SMART challenge is, to predict the answer type using a target ontology (e.g., DBpedia or Wikidata).
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
Authors:
Nandana Mihindukulasooriya,
Gaetano Rossiello,
Pavan Kapanipathi,
Ibrahim Abdelaziz,
Srinivas Ravishankar,
Mo Yu,
Alfio Gliozzo,
Salim Roukos,
Alexander Gray
Abstract:
Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abst…
▽ More
Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules. However, the task of extracting and linking relations from text to knowledgebases faces two primary challenges; the ambiguity of natural language and lack of training data. To overcome these challenges, we present SLING, a relation linking framework which leverages semantic parsing using Abstract Meaning Representation (AMR) and distant supervision. SLING integrates multiple relation linking approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledgebase. The experiments on relation linking using three KBQA datasets; QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Hypernym Detection Using Strict Partial Order Networks
Authors:
Sarthak Dash,
Md Faisal Mahbub Chowdhury,
Alfio Gliozzo,
Nandana Mihindukulasooriya,
Nicolas Rodolfo Fauceglia
Abstract:
This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabulary terms to previously unseen ones. An extensive ev…
▽ More
This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabulary terms to previously unseen ones. An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks.
△ Less
Submitted 22 November, 2019; v1 submitted 23 September, 2019;
originally announced September 2019.
-
Completeness and Consistency Analysis for Evolving Knowledge Bases
Authors:
Mohammad Rifat Ahmmad Rashid,
Giuseppe Rizzo,
Marco Torchiano,
Nandana Mihindukulasooriya,
Oscar Corcho,
Raúl García-Castro
Abstract:
Assessing the quality of an evolving knowledge base is a challenging task as it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data sources, it is impractical to manually curate the data, and challenging to continuously and automatically assess their quality.
In this paper, we explore two main areas of quali…
▽ More
Assessing the quality of an evolving knowledge base is a challenging task as it often requires to identify correct quality assessment procedures.
Since data is often derived from autonomous, and increasingly large data sources, it is impractical to manually curate the data, and challenging to continuously and automatically assess their quality.
In this paper, we explore two main areas of quality assessment related to evolving knowledge bases: (i) identification of completeness issues using knowledge base evolution analysis, and (ii) identification of consistency issues based on integrity constraints, such as minimum and maximum cardinality, and range constraints.
For completeness analysis, we use data profiling information from consecutive knowledge base releases to estimate completeness measures that allow predicting quality issues. Then, we perform consistency checks to validate the results of the completeness analysis using integrity constraints and learning models.
The approach has been tested both quantitatively and qualitatively by using a subset of datasets from both DBpedia and 3cixty knowledge bases. The performance of the approach is evaluated using precision, recall, and F1 score. From completeness analysis, we observe a 94% precision for the English DBpedia KB and 95% precision for the 3cixty Nice KB. We also assessed the performance of our consistency analysis by using five learning models over three sub-tasks, namely minimum cardinality, maximum cardinality, and range constraint. We observed that the best performing model in our experimental setup is the Random Forest, reaching an F1 score greater than 90% for minimum and maximum cardinality and 84% for range constraints.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.