Skip to main content

Showing 1–30 of 30 results for author: Simperl, E

  1. arXiv:2405.10205  [pdf, other

    cs.HC

    Exploring the Impact of ChatGPT on Wikipedia Engagement

    Authors: Neal Reeves, Wenjie Yin, Elena Simperl

    Abstract: Wikipedia is one of the most popular websites in the world, serving as a major source of information and learning resource for millions of users worldwide. While motivations for its usage vary, prior research suggests shallow information gathering -- looking up facts and information or answering questions -- dominates over more in-depth usage. On the 22nd of November 2022, ChatGPT was released to… ▽ More

    Submitted 29 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, submitted to ACM Collective Intelligence

  2. arXiv:2403.19546  [pdf, other

    cs.LG cs.AI cs.DB cs.IR

    Croissant: A Metadata Format for ML-Ready Datasets

    Authors: Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Pieter Gijsbers, Joan Giner-Miguelez, Nitisha Jain, Michael Kuchnik, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Pierre Ruyssen, Rajat Shinde, Elena Simperl, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Jos van der Velde, Steffen Vogler, Carole-Jean Wu

    Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Published in Proceedings of ACM SIGMOD/PODS'24 Data Management for End-to-End Machine Learning (DEEM) Workshop https://dl.acm.org/doi/10.1145/3650203.3663326

  3. arXiv:2403.15861  [pdf, other

    cs.HC

    User Experience in Dataset Search Platform Interfaces

    Authors: Yihang Zhao, Albert Meroño-Peñuela, Elena Simperl

    Abstract: This research investigates User Experience (UX) issues in dataset search platform interfaces, targeting Google Dataset Search and data.europa.eu. It focuses on 6 areas within UX: Initial Interaction, Search Process, Dataset Exploration, Filtering and Sorting, Dataset Actions, and Assistance and Feedback. The evaluation method combines 'The Pandemic Puzzle' user task, think-aloud methods, and demog… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  4. arXiv:2402.01495  [pdf, other

    cs.CL

    A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation

    Authors: Phillip Schneider, Manuel Klettner, Elena Simperl, Florian Matthes

    Abstract: Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of large language models, which offer great pote… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024

  5. arXiv:2401.01711  [pdf, ps, other

    cs.CL cs.IR

    Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs

    Authors: Phillip Schneider, Manuel Klettner, Kristiina Jokinen, Elena Simperl, Florian Matthes

    Abstract: Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversa… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to ICAART 2024

  6. arXiv:2312.09947  [pdf, other

    cs.HC

    Prompting Datasets: Data Discovery with Conversational Agents

    Authors: Johanna Walker, Elisavet Koutsiana, Joe Massey, Gefion Thuermer, Elena Simperl

    Abstract: Can large language models assist in data discovery? Data discovery predominantly happens via search on a data portal or the web, followed by assessment of the dataset to ensure it is fit for the intended purpose. The ability of conversational generative AI (CGAI) to support recommendations with reasoning implies it can suggest datasets to users, explain why it has done so, and provide information… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 27 pages, 9 figures

  7. arXiv:2311.07453  [pdf, other

    cs.CL cs.CV

    ChartCheck: Explainable Fact-Checking over Real-World Chart Images

    Authors: Mubashara Akhtar, Nikesh Subedi, Vivek Gupta, Sahar Tahmasebi, Oana Cocarascu, Elena Simperl

    Abstract: Whilst fact verification has attracted substantial interest in the natural language processing community, verifying misinforming statements against data visualizations such as charts has so far been overlooked. Charts are commonly used in the real-world to summarize and communicate key information, but they can also be easily misused to spread misinformation and promote certain agendas. In this pa… ▽ More

    Submitted 16 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

  8. arXiv:2311.02216  [pdf, other

    cs.CL cs.LG

    Exploring the Numerical Reasoning Capabilities of Language Models: A Comprehensive Analysis on Tabular Data

    Authors: Mubashara Akhtar, Abhilash Shankarampeta, Vivek Gupta, Arpit Patil, Oana Cocarascu, Elena Simperl

    Abstract: Numbers are crucial for various real-world domains such as finance, economics, and science. Thus, understanding and reasoning with numbers are essential skills for language models to solve different tasks. While different numerical benchmarks have been introduced in recent years, they are limited to specific numerical aspects mostly. In this paper, we propose a hierarchical taxonomy for numerical… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023 (Findings)

  9. arXiv:2309.08491  [pdf, other

    cs.CL cs.AI

    Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata

    Authors: Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño Peñuela, Elena Simperl

    Abstract: In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Enginee… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Knowledge Base Construction from Pre-trained Language Models (LM-KBC) Challenge @ ISWC 2023

  10. arXiv:2306.11766  [pdf, other

    cs.HC

    Agreeing and Disagreeing in Collaborative Knowledge Graph Construction: An Analysis of Wikidata

    Authors: Elisavet Koutsiana, Tushita Yadav, Nitisha Jain, Albert Meroño-Peñuela, Elena Simperl

    Abstract: In this work, we study disagreement in discussions around Wikidata, an online knowledge community that builds the data backend of Wikipedia. Discussions are important in collaborative work as they can increase contributor performance and encourage the emergence of shared norms and practices. While disagreements can play a productive role in discussions, they can also lead to conflicts and controve… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  11. arXiv:2305.13507  [pdf, other

    cs.CL cs.AI cs.CV

    Multimodal Automated Fact-Checking: A Survey

    Authors: Mubashara Akhtar, Michael Schlichtkrull, Zhijiang Guo, Oana Cocarascu, Elena Simperl, Andreas Vlachos

    Abstract: Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned image. Multimodal misinformation is perceived as more credible by humans, and spreads faster than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on text. In this survey, we conceptualise a framework for AFC including subtasks uniqu… ▽ More

    Submitted 25 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP): Findings

  12. arXiv:2301.11843  [pdf, other

    cs.CL cs.CV

    Reading and Reasoning over Chart Images for Evidence-based Automated Fact-Checking

    Authors: Mubashara Akhtar, Oana Cocarascu, Elena Simperl

    Abstract: Evidence data for automated fact-checking (AFC) can be in multiple modalities such as text, tables, images, audio, or video. While there is increasing interest in using images for AFC, previous works mostly focus on detecting manipulated or fake images. We propose a novel task, chart-based fact-checking, and introduce ChartBERT as the first model for AFC against chart evidence. ChartBERT leverages… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: Accepted to EACL 2023 (Findings)

  13. arXiv:2212.01818  [pdf

    cs.HC cs.IR

    Exploring and Eliciting Needs and Preferences from Editors for Wikidata Recommendations

    Authors: Kholoud Alghamdi, Miaojing Shi, Elena Simperl

    Abstract: Wikidata is an open knowledge graph created, managed, and maintained collaboratively by a global community of volunteers. As it continues to grow, it faces substantial editor engagement challenges, including acquiring new editors to tackle an increasing workload and retaining existing editors. Experiences from other online communities and peer-production systems, including Wikipedia, suggest that… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

  14. arXiv:2210.14846  [pdf, other

    cs.CL

    ProVe: A Pipeline for Automated Provenance Verification of Knowledge Graphs against Textual Sources

    Authors: Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

    Abstract: Knowledge Graphs are repositories of information that gather data from a multitude of domains and sources in the form of semantic triples, serving as a source of structured data for various crucial applications in the modern web landscape, from Wikipedia infoboxes to search engines. Such graphs mainly serve as secondary sources of information and depend on well-documented and verifiable provenance… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  15. arXiv:2210.00105  [pdf, other

    cs.CL cs.AI

    A Decade of Knowledge Graphs in Natural Language Processing: A Survey

    Authors: Phillip Schneider, Tim Schopf, Juraj Vladika, Mikhail Galkin, Elena Simperl, Florian Matthes

    Abstract: In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing am… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

    Comments: Accepted to AACL-IJCNLP 2022

  16. arXiv:2206.08709  [pdf, other

    cs.CL cs.LG

    Statistical and Neural Methods for Cross-lingual Entity Label Mapping in Knowledge Graphs

    Authors: Gabriel Amaral, Mārcis Pinnis, Inguna Skadiņa, Odinaldo Rodrigues, Elena Simperl

    Abstract: Knowledge bases such as Wikidata amass vast amounts of named entity information, such as multilingual labels, which can be extremely useful for various multilingual and cross-lingual applications. However, such labels are not guaranteed to match across languages from an information consistency standpoint, greatly compromising their usefulness for fields such as machine translation. In this work, w… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  17. arXiv:2205.02627  [pdf, other

    cs.CL

    WDV: A Broad Data Verbalisation Dataset Built from Wikidata

    Authors: Gabriel Amaral, Odinaldo Rodrigues, Elena Simperl

    Abstract: Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats. Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. Although KG verbalis… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  18. arXiv:2109.09405  [pdf, other

    cs.AI cs.CL

    Assessing the quality of sources in Wikidata across languages: a hybrid approach

    Authors: Gabriel Amaral, Alessandro Piscopo, Lucie-Aimée Kaffee, Odinaldo Rodrigues, Elena Simperl

    Abstract: Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this e… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  19. arXiv:2107.06423  [pdf, other

    cs.IR

    Learning to Recommend Items to Wikidata Editors

    Authors: Kholoud AlGhamdi, Miaojing Shi, Elena Simperl

    Abstract: Wikidata is an open knowledge graph built by a global community of volunteers. As it advances in scale, it faces substantial challenges around editor engagement. These challenges are in terms of both attracting new editors to keep up with the sheer amount of work and retaining existing editors. Experience from other online communities and peer-production systems, including Wikipedia, suggests that… ▽ More

    Submitted 30 July, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: The paper is accepted to appear in ISWC 2021

  20. Talking datasets: Understanding data sensemaking behaviours

    Authors: Laura Koesten, Kathleen Gregory, Paul Groth, Elena Simperl

    Abstract: The sharing and reuse of data are seen as critical to solving the most complex problems of today. Despite this potential, relatively little is known about a key step in data reuse: people's behaviours involved in data-centric sensemaking. We aim to address this gap by presenting a mixed-methods study combining in-depth interviews, a think-aloud task and a screen recording analysis with 31 research… ▽ More

    Submitted 18 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 26 pages, 7 figures, 6 tables

  21. arXiv:1901.09264  [pdf, other

    cs.CY

    On the mapping of Points of Interest through StreetView imagery and paid crowdsourcing

    Authors: Eddy Maddalena, Luis-Daniel Ibáñez, Elena Simperl

    Abstract: The use of volunteers has emerged as low-cost alternative to generate accurate geographical information, an approach known as Volunteered Geographic Information (VGI). However, VGI is limited by the number and availability of volunteers in the area to be mapped, hindering scalability for large areas and making difficult to map within a time-frame. Fortunately, the availability of street-view image… ▽ More

    Submitted 26 January, 2019; originally announced January 2019.

    Comments: 25 pages

  22. arXiv:1901.05670  [pdf, other

    cs.CY

    Beyond monetary incentives: experiments in paid microtask contests modelled as continuous-time markov chains

    Authors: Oluwaseyi Feyisetan, Elena Simperl

    Abstract: In this paper, we aim to gain a better understanding into how paid microtask crowdsourcing could leverage its appeal and scaling power by using contests to boost crowd performance and engagement. We introduce our microtask-based annotation platform Wordsmith, which features incentives such as points, leaderboards and badges on top of financial remuneration. Our analysis focuses on a particular typ… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

  23. Dataset search: a survey

    Authors: Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez-Gonzalez, Emilia Kacprzak, Paul Groth

    Abstract: Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data s… ▽ More

    Submitted 3 January, 2019; originally announced January 2019.

    Comments: 20 pages, 153 references

  24. arXiv:1810.12423  [pdf, other

    cs.IR

    Everything you always wanted to know about a dataset: studies in data summarisation

    Authors: Laura Koesten, Elena Simperl, Emilia Kacprzak, Tom Blount, Jeni Tennison

    Abstract: Summarising data as text helps people make sense of it. It also improves data discovery, as search algorithms can match this text against keyword queries. In this paper, we explore the characteristics of text summaries of data in order to understand how meaningful summaries look like. We present two complementary studies: a data-search diary study with 69 students, which offers insight into the in… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  25. arXiv:1805.11883  [pdf, ps, other

    cs.OH

    DATA:SEARCH'18 -- Searching Data on the Web

    Authors: Paul Groth, Laura Koesten, Philipp Mayr, Maarten de Rijke, Elena Simperl

    Abstract: This half day workshop explores challenges in data search, with a particular focus on data on the web. We want to stimulate an interdisciplinary discussion around how to improve the description, discovery, ranking and presentation of structured and semi-structured data, across data formats and domain applications. We welcome contributions describing algorithms and systems, as well as frameworks an… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

  26. arXiv:1803.07116  [pdf, other

    cs.CL

    Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

    Authors: Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl

    Abstract: While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidat… ▽ More

    Submitted 29 April, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: NAACL HTL 2018

  27. Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

    Authors: Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

    Abstract: Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neur… ▽ More

    Submitted 31 October, 2017; originally announced November 2017.

  28. arXiv:1710.04203  [pdf, other

    cs.CL cs.HC

    Crowdsourcing for Beyond Polarity Sentiment Analysis A Pure Emotion Lexicon

    Authors: Giannis Haralabopoulos, Elena Simperl

    Abstract: Sentiment analysis aims to uncover emotions conveyed through information. In its simplest form, it is performed on a polarity basis, where the goal is to classify information with positive or negative emotion. Recent research has explored more nuanced ways to capture emotions that go beyond polarity. For these methods to work, they require a critical resource: a lexicon that is appropriate for the… ▽ More

    Submitted 4 October, 2017; originally announced October 2017.

    Comments: Keywords: Beyond Polarity, Pure Sentiment, Crowdsourcing, Sentiment Analysis, Lexicon Acquisition, Reddit, Twitter, Brexit [19 pages, 6 figures, 4 tables]

  29. arXiv:1503.02911  [pdf, other

    cs.DB

    RDF-Hunter: Automatically Crowdsourcing the Execution of Queries Against RDF Data Sets

    Authors: Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal, Rudi Studer

    Abstract: In the last years, a large number of RDF data sets has become available on the Web. However, due to the semi-structured nature of RDF data, missing values affect answer completeness of queries that are posed against this data. To overcome this limitation, we propose RDF-Hunter, a novel hybrid query processing approach that brings together machine and human computation to execute queries against RD… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

  30. arXiv:1406.7551  [pdf

    cs.SI cs.CY physics.soc-ph

    Collective Intelligence in Citizen Science -- A Study of Performers and Talkers

    Authors: Ramine Tinati, Elena Simperl, Markus Luczak-Roesch, Max Van Kleek, Nigel Shadbolt

    Abstract: The recent emergence of online citizen science is illustrative of an efficient and effective means to harness the crowd in order to achieve a range of scientific discoveries. Fundamentally, citizen science projects draw upon crowds of non-expert volunteers to complete short Tasks, which can vary in domain and complexity. However, unlike most human-computational systems, participants in these syste… ▽ More

    Submitted 29 June, 2014; originally announced June 2014.

    Report number: ci-2014/28