Skip to main content

Showing 1–4 of 4 results for author: Borkakoty, H

  1. arXiv:2406.19116  [pdf, other

    cs.CL cs.AI cs.LG

    CHEW: A Dataset of CHanging Events in Wikipedia

    Authors: Hsuvas Borkakoty, Luis Espinosa-Anke

    Abstract: We introduce CHEW, a novel dataset of changing events in Wikipedia expressed in naturally occurring text. We use CHEW for probing LLMs for their timeline understanding of Wikipedia entities and events in generative and classification experiments. Our results suggest that LLMs, despite having temporal information available, struggle to construct accurate timelines. We further show the usefulness of… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Short Paper

  2. arXiv:2406.09948  [pdf, other

    cs.CL

    BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

    Authors: Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh

    Abstract: Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2405.02175  [pdf, other

    cs.CL cs.AI cs.LG

    Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset

    Authors: Hsuvas Borkakoty, Luis Espinosa-Anke

    Abstract: Hoaxes are a recognised form of disinformation created deliberately, with potential serious implications in the credibility of reference knowledge resources such as Wikipedia. What makes detecting Wikipedia hoaxes hard is that they often are written according to the official style guidelines. In this work, we first provide a systematic analysis of the similarities and discrepancies between legitim… ▽ More

    Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Short paper

  4. arXiv:2308.03582  [pdf, other

    cs.CL

    WIKITIDE: A Wikipedia-Based Timestamped Definition Pairs Dataset

    Authors: Hsuvas Borkakoty, Luis Espinosa-Anke

    Abstract: A fundamental challenge in the current NLP context, dominated by language models, comes from the inflexibility of current architectures to 'learn' new information. While model-centric solutions like continual learning or parameter-efficient fine tuning are available, the question still remains of how to reliably identify changes in language or in the world. In this paper, we propose WikiTiDe, a da… ▽ More

    Submitted 18 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted by RANLP 2023 main conference