Skip to main content

Showing 1–30 of 30 results for author: Guerin, F

  1. arXiv:2406.20038  [pdf, other

    cs.CL

    BioMNER: A Dataset for Biomedical Method Entity Recognition

    Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

    Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.03447  [pdf, other

    cs.CV cs.AI cs.LG

    FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

    Authors: Mona Ahmadian, Frank Guerin, Andrew Gilbert

    Abstract: This paper demonstrates a self-supervised approach for learning semantic video representations. Recent vision studies show that a masking strategy for vision and natural language supervision has contributed to developing transferable visual pretraining. Our goal is to achieve a more semantic video representation by leveraging the text related to the video content during the pretraining in a fully… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  3. arXiv:2402.00861  [pdf, other

    cs.CL cs.AI

    Evaluating Large Language Models for Generalization and Robustness via Data Compression

    Authors: Yucheng Li, Yunhao Guo, Frank Guerin, Chenghua Lin

    Abstract: Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossless data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff. Specifically, we collect comprehensive test data spanning 83 months fro… ▽ More

    Submitted 3 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2401.16012  [pdf, other

    cs.CL

    Finding Challenging Metaphors that Confuse Pretrained Language Models

    Authors: Yucheng Li, Frank Guerin, Chenghua Lin

    Abstract: Metaphors are considered to pose challenges for a wide spectrum of NLP tasks. This gives rise to the area of computational metaphor processing. However, it remains unclear what types of metaphors challenge current state-of-the-art models. In this paper, we test various NLP models on the VUA metaphor dataset and quantify to what extent metaphors affect models' performance on various downstream task… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  5. arXiv:2312.12343  [pdf, other

    cs.CL cs.AI

    LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction

    Authors: Yucheng Li, Frank Guerin, Chenghua Lin

    Abstract: Data contamination in evaluation is getting increasingly prevalent with the emergence of language models pre-trained on super large, automatically crawled corpora. This problem leads to significant challenges in the accurate assessment of model capabilities and generalisations. In this paper, we propose LatestEval, an automatic method that leverages the most recent texts to create uncontaminated r… ▽ More

    Submitted 1 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  6. arXiv:2310.20467  [pdf, other

    cs.CL cs.AI

    ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology

    Authors: Chen Tang, Frank Guerin, Chenghua Lin

    Abstract: The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  7. arXiv:2310.17589  [pdf, other

    cs.CL cs.AI

    An Open Source Data Contamination Report for Large Language Models

    Authors: Yucheng Li, Frank Guerin, Chenghua Lin

    Abstract: Data contamination in model evaluation has become increasingly prevalent with the growing popularity of large language models. It allows models to "cheat" via memorisation instead of displaying true capabilities. Therefore, contamination analysis has become an crucial part of reliable model evaluation to validate results. However, existing contamination analysis is usually conducted internally by… ▽ More

    Submitted 28 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  8. arXiv:2310.06201  [pdf, other

    cs.CL

    Compressing Context to Enhance Inference Efficiency of Large Language Models

    Authors: Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin

    Abstract: Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Cont… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023. arXiv admin note: substantial text overlap with arXiv:2304.12102; text overlap with arXiv:2303.11076 by other authors

  9. arXiv:2308.12488  [pdf, other

    cs.AI cs.CL

    GPTEval: A Survey on Assessments of ChatGPT and GPT-4

    Authors: Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin, Erik Cambria

    Abstract: The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizi… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  10. arXiv:2308.12447  [pdf, other

    cs.CV

    MOFO: MOtion FOcused Self-Supervision for Video Understanding

    Authors: Mona Ahmadian, Frank Guerin, Andrew Gilbert

    Abstract: Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing re… ▽ More

    Submitted 1 November, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted at the NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

  11. arXiv:2306.16195  [pdf, other

    cs.CL cs.AI

    Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

    Authors: Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

    Abstract: Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023

  12. arXiv:2305.01082  [pdf, other

    cs.CL cs.IR cs.LG

    Contextual Multilingual Spellchecker for User Queries

    Authors: Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora

    Abstract: Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most… ▽ More

    Submitted 14 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: 5 pages, In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

  13. arXiv:2302.05611  [pdf, other

    cs.CL

    Metaphor Detection with Effective Context Denoising

    Authors: Shun Wang, Yucheng Li, Chenghua Lin, Loïc Barrault, Frank Guerin

    Abstract: We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our ap… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  14. arXiv:2302.04834  [pdf, other

    cs.CL

    FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning

    Authors: Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loïc Barrault

    Abstract: In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet.

    Submitted 9 February, 2023; originally announced February 2023.

  15. arXiv:2301.13042  [pdf, other

    cs.CL

    The Secret of Metaphor on Expressing Stronger Emotion

    Authors: Yucheng Li, Frank Guerin, Chenghua Lin

    Abstract: Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific t… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: FigLang@EMNLP2022

  16. arXiv:2210.15551  [pdf, other

    cs.CL

    Terminology-aware Medical Dialogue Generation

    Authors: Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

    Abstract: Medical dialogue generation aims to generate responses according to a history of dialogue turns between doctors and patients. Unlike open-domain dialogue generation, this requires background knowledge specific to the medical domain. Existing generative frameworks for medical dialogue generation fall short of incorporating domain-specific knowledge, especially with regard to medical terminology. In… ▽ More

    Submitted 14 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2023

  17. arXiv:2210.12463  [pdf, other

    cs.CL cs.AI

    EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention

    Authors: Chen Tang, Chenghua Lin, Henglin Huang, Frank Guerin, Zhihao Zhang

    Abstract: One of the key challenges of automatic story generation is how to generate a long narrative that can maintain fluency, relevance, and coherence. Despite recent progress, current story generation systems still face the challenge of how to effectively capture contextual and event features, which has a profound impact on a model's generation performance. To address these challenges, we present EtriCA… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 Findings

    Journal ref: EMNLP 2022 Findings

  18. arXiv:2210.10618  [pdf, other

    cs.CL cs.AI

    Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

    Authors: Henglin Huang, Chen Tang, Tyler Loakman, Frank Guerin, Chenghua Lin

    Abstract: Story generation aims to generate a long narrative conditioned on a given input. In spite of the success of prior works with the application of pre-trained models, current neural models for Chinese stories still struggle to generate high-quality long text narratives. We hypothesise that this stems from ambiguity in syntactically parsing the Chinese language, which does not have explicit delimiters… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Journal ref: AACL 2022

  19. arXiv:2210.10602  [pdf, other

    cs.CL cs.AI

    NGEP: A Graph-based Event Planning Framework for Story Generation

    Authors: Chen Tang, Zhihao Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

    Abstract: To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i.e. storylines) to guide story generation. Such prior works mostly employ end-to-end neural generation models to predict event sequences for a story. However, such generation models struggle to guarantee the narrative coherence of separate events due to the hallucination probl… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Journal ref: AACL 2022

  20. arXiv:2203.03047  [pdf, other

    cs.CL cs.AI

    Recent Advances in Neural Text Generation: A Task-Agnostic Survey

    Authors: Chen Tang, Frank Guerin, Chenghua Lin

    Abstract: In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG). The primary objective is to generate text that is both linguistically natural and human-like, while also exerting control over the generation process. This paper offers a comprehensive and task-agnostic survey of the recent advancements in neural text gen… ▽ More

    Submitted 12 June, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: This has been updated with some recent advances in 2023

  21. arXiv:2107.05319  [pdf, other

    cs.CV

    Human-like Relational Models for Activity Recognition in Video

    Authors: Joseph Chrol-Cannon, Andrew Gilbert, Ranko Lazic, Adithya Madhusoodanan, Frank Guerin

    Abstract: Video activity recognition by deep neural networks is impressive for many classes. However, it falls short of human performance, especially for challenging to discriminate activities. Humans differentiate these complex activities by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object entering the aperture of a container. Deep neural… ▽ More

    Submitted 11 January, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

  22. arXiv:2104.03391  [pdf, other

    cs.CL

    Interpreting Verbal Metaphors by Paraphrasing

    Authors: Rui Mao, Chenghua Lin, Frank Guerin

    Abstract: Metaphorical expressions are difficult linguistic phenomena, challenging diverse Natural Language Processing tasks. Previous works showed that paraphrasing a metaphor as its literal counterpart can help machines better process metaphors on downstream tasks. In this paper, we interpret metaphors with BERT and WordNet hypernyms and synonyms in an unsupervised manner, showing that our method signific… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  23. arXiv:2104.03285  [pdf, other

    cs.CL

    Combining Pre-trained Word Embeddings and Linguistic Features for Sequential Metaphor Identification

    Authors: Rui Mao, Chenghua Lin, Frank Guerin

    Abstract: We tackle the problem of identifying metaphors in text, treated as a sequence tagging task. The pre-trained word embeddings GloVe, ELMo and BERT have individually shown good performance on sequential metaphor identification. These embeddings are generated by different models, training targets and corpora, thus encoding different semantic and syntactic information. We show that leveraging GloVe, EL… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

  24. arXiv:2103.13512  [pdf, other

    cs.AI cs.CV cs.RO

    Projection: A Mechanism for Human-like Reasoning in Artificial Intelligence

    Authors: Frank Guerin

    Abstract: Artificial Intelligence systems cannot yet match human abilities to apply knowledge to situations that vary from what they have been programmed for, or trained for. In visual object recognition methods of inference exploiting top-down information (from a model) have been shown to be effective for recognising entities in difficult conditions. Here this type of inference, called `projection', is sho… ▽ More

    Submitted 17 May, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: 29 pages, 3 figures. Some minor additions/clarifications in this revision, e.g. mathematical description

  25. arXiv:2012.02128  [pdf, other

    cs.CL cs.CV

    BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

    Authors: Jing Su, Qingyun Dai, Frank Guerin, Mian Zhou

    Abstract: Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical vis… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  26. arXiv:2010.05204  [pdf, other

    cs.HC

    Towards Somaesthetics Inspired Games: Exploring the Influence of a Mirror Effect on Self-Presentation in a Public Setting

    Authors: Fiona Guerin, Alice Rey, Enis Caliskan, Erik Kynast, Andreas Zimmerer, Ilhan Aslan, Elisabeth André

    Abstract: We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Base… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: 11 pages

  27. arXiv:1907.12385  [pdf, other

    cs.LG stat.ML

    Latent Space Factorisation and Manipulation via Matrix Subspace Projection

    Authors: Xiao Li, Chenghua Lin, Ruizhe Li, Chaozheng Wang, Frank Guerin

    Abstract: We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discr… ▽ More

    Submitted 14 August, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

    Comments: Final camera ready version for ICML 2020

  28. arXiv:1803.02743  [pdf, other

    cs.RO

    Adapting Everyday Manipulation Skills to Varied Scenarios

    Authors: Pawel Gajewski, Paulo Ferreira, Georg Bartels, Chaozheng Wang, Frank Guerin, Bipin Indurkhya, Michael Beetz, Bartlomiej Sniezynski

    Abstract: We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; th… ▽ More

    Submitted 4 March, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: accepted for ICRA 2019

  29. arXiv:1710.04970  [pdf, other

    cs.RO

    Transfer of Tool Affordance and Manipulation Cues with 3D Vision Data

    Authors: Paulo Abelha, Frank Guerin

    Abstract: Future service robots working in human environments, such as kitchens, will face situations where they need to improvise. The usual tool for a given task might not be available and the robot will have to use some substitute tool. The robot needs to select an appropriate alternative tool from the candidates available, and also needs to know where to grasp it, how to orient it and what part to use a… ▽ More

    Submitted 13 October, 2017; originally announced October 2017.

    Comments: 24 pages

  30. arXiv:1410.8292  [pdf, other

    cs.RO

    A Decentralized Interactive Architecture for Aerial and Ground Mobile Robots Cooperation

    Authors: El Houssein Chouaib Harik, François Guérin, Frédéric Guinand, Jean-François Brethé, Hervé Pelvillain

    Abstract: This paper presents a novel decentralized interactive architecture for aerial and ground mobile robots cooperation. The aerial mobile robot is used to provide a global coverage during an area inspection, while the ground mobile robot is used to provide a local coverage of ground features. We include a human-in-the-loop to provide waypoints for the ground mobile robot to progress safely in the insp… ▽ More

    Submitted 27 February, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

    Comments: Submitted to 2015 International Conference on Control, Automation and Robotics (ICCAR)