subscribe to arXiv mailings

arXiv:2406.20038 [pdf, other]

BioMNER: A Dataset for Biomedical Method Entity Recognition

Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources, primarily attributed to the intricate nature of methodological concepts, which necessitate a profound understanding for precise delineation. In this study, we propose a novel dataset for biomedical method entity recognition, employing an automated BioMethod entity recognition and information retrieval system to assist human annotation. Furthermore, we comprehensively explore a range of conventional and contemporary open-domain NER methodologies, including the utilization of cutting-edge large-scale language models (LLMs) customised to our dataset. Our empirical findings reveal that the large parameter counts of language models surprisingly inhibit the effective assimilation of entity extraction patterns pertaining to biomedical methods. Remarkably, the approach, leveraging the modestly sized ALBERT model (only 11MB), in conjunction with conditional random fields (CRF), achieves state-of-the-art (SOTA) performance. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.03447 [pdf, other]

FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Authors: Mona Ahmadian, Frank Guerin, Andrew Gilbert

Abstract: This paper demonstrates a self-supervised approach for learning semantic video representations. Recent vision studies show that a masking strategy for vision and natural language supervision has contributed to developing transferable visual pretraining. Our goal is to achieve a more semantic video representation by leveraging the text related to the video content during the pretraining in a fully… ▽ More This paper demonstrates a self-supervised approach for learning semantic video representations. Recent vision studies show that a masking strategy for vision and natural language supervision has contributed to developing transferable visual pretraining. Our goal is to achieve a more semantic video representation by leveraging the text related to the video content during the pretraining in a fully self-supervised manner. To this end, we present FILS, a novel self-supervised video Feature prediction In semantic Language Space (FILS). The vision model can capture valuable structured information by correctly predicting masked feature semantics in language space. It is learned using a patch-wise video-text contrastive strategy, in which the text representations act as prototypes for transforming vision features into a language space, which are then used as targets for semantically meaningful feature prediction using our masked encoder-decoder structure. FILS demonstrates remarkable transferability on downstream action recognition tasks, achieving state-of-the-art on challenging egocentric datasets, like Epic-Kitchens, Something-SomethingV2, Charades-Ego, and EGTEA, using ViT-Base. Our efficient method requires less computation and smaller batches compared to previous works. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2402.00861 [pdf, other]

Evaluating Large Language Models for Generalization and Robustness via Data Compression

Authors: Yucheng Li, Yunhao Guo, Frank Guerin, Chenghua Lin

Abstract: Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossless data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff. Specifically, we collect comprehensive test data spanning 83 months fro… ▽ More Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossless data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff. Specifically, we collect comprehensive test data spanning 83 months from 2017 to 2023 and split the data into training and testing periods according to models' training data cutoff. We measure: 1) the compression performance on the testing period as a measure of generalization on unseen data; and 2) the performance gap between the training and testing period as a measure of robustness. Our experiments test 14 representative large language models with various sizes on sources including Wikipedia, news articles, code, arXiv papers, and multi-modal data. We find that the compression rate of many models reduces significantly after their cutoff date, but models such as Mistral and Llama-2 demonstrate a good balance between performance and robustness. Results also suggest that models struggle to generalize on news and code data, but work especially well on arXiv papers. We also find the context size and tokenization implementation have a big impact of on the overall compression performance. △ Less

Submitted 3 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.16012 [pdf, other]

Finding Challenging Metaphors that Confuse Pretrained Language Models

Authors: Yucheng Li, Frank Guerin, Chenghua Lin

Abstract: Metaphors are considered to pose challenges for a wide spectrum of NLP tasks. This gives rise to the area of computational metaphor processing. However, it remains unclear what types of metaphors challenge current state-of-the-art models. In this paper, we test various NLP models on the VUA metaphor dataset and quantify to what extent metaphors affect models' performance on various downstream task… ▽ More Metaphors are considered to pose challenges for a wide spectrum of NLP tasks. This gives rise to the area of computational metaphor processing. However, it remains unclear what types of metaphors challenge current state-of-the-art models. In this paper, we test various NLP models on the VUA metaphor dataset and quantify to what extent metaphors affect models' performance on various downstream tasks. Analysis reveals that VUA includes a large number of metaphors that pose little difficulty to downstream tasks. We would like to shift the attention of researchers away from these metaphors to instead focus on challenging metaphors. To identify hard metaphors, we propose an automatic pipeline that identifies metaphors that challenge a particular model. Our analysis demonstrates that our detected hard metaphors contrast significantly with VUA and reduce the accuracy of machine translation by 16\%, QA performance by 4\%, NLI by 7\%, and metaphor identification recall by over 14\% for various popular NLP systems. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.12343 [pdf, other]

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction

Authors: Yucheng Li, Frank Guerin, Chenghua Lin

Abstract: Data contamination in evaluation is getting increasingly prevalent with the emergence of language models pre-trained on super large, automatically crawled corpora. This problem leads to significant challenges in the accurate assessment of model capabilities and generalisations. In this paper, we propose LatestEval, an automatic method that leverages the most recent texts to create uncontaminated r… ▽ More Data contamination in evaluation is getting increasingly prevalent with the emergence of language models pre-trained on super large, automatically crawled corpora. This problem leads to significant challenges in the accurate assessment of model capabilities and generalisations. In this paper, we propose LatestEval, an automatic method that leverages the most recent texts to create uncontaminated reading comprehension evaluations. LatestEval avoids data contamination by only using texts published within a recent time window, ensuring no overlap with the training corpora of pre-trained language models. We develop the LatestEval automated pipeline to 1) gather the latest texts; 2) identify key information, and 3) construct questions targeting the information while removing the existing answers from the context. This encourages models to infer the answers themselves based on the remaining context, rather than just copy-paste. Our experiments demonstrate that language models exhibit negligible memorisation behaviours on LatestEval as opposed to previous benchmarks, suggesting a significantly reduced risk of data contamination and leading to a more robust evaluation. Data and code are publicly available at: https://github.com/liyucheng09/LatestEval. △ Less

Submitted 1 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2310.20467 [pdf, other]

ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology

Authors: Chen Tang, Frank Guerin, Chenghua Lin

Abstract: The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows… ▽ More The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows for efficient management of the local papers using a wide range of operations, including "where," "group," "order," and more. By providing over 20 operations, this tool significantly enhances the retrieval of literature based on specific conditions. Notably, this tool has been successfully utilised in writing a survey paper (Tang et al.,2022a). By introducing the ACL Anthology Helper, we aim to enhance researchers' ability to effectively access and organise literature from the ACL Anthology. This tool offers a convenient solution for researchers seeking to explore the ACL Anthology's vast collection of publications while allowing for more targeted and efficient literature retrieval. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.17589 [pdf, other]

An Open Source Data Contamination Report for Large Language Models

Authors: Yucheng Li, Frank Guerin, Chenghua Lin

Abstract: Data contamination in model evaluation has become increasingly prevalent with the growing popularity of large language models. It allows models to "cheat" via memorisation instead of displaying true capabilities. Therefore, contamination analysis has become an crucial part of reliable model evaluation to validate results. However, existing contamination analysis is usually conducted internally by… ▽ More Data contamination in model evaluation has become increasingly prevalent with the growing popularity of large language models. It allows models to "cheat" via memorisation instead of displaying true capabilities. Therefore, contamination analysis has become an crucial part of reliable model evaluation to validate results. However, existing contamination analysis is usually conducted internally by large language model developers and often lacks transparency and completeness. This paper presents an extensive data contamination report for over 15 popular large language models across six popular multiple-choice QA benchmarks. We also introduce an open-source pipeline that enables the community to perform contamination analysis on customised data and models. Our experiments reveal varying contamination levels ranging from 1\% to 45\% across benchmarks, with the contamination degree increasing rapidly over time. Performance analysis of large language models indicates that data contamination does not necessarily lead to increased model metrics: while significant accuracy boosts of up to 14\% and 7\% are observed on contaminated C-Eval and Hellaswag benchmarks, only a minimal increase is noted on contaminated MMLU. We also find larger models seem able to gain more advantages than smaller models on contaminated test sets. △ Less

Submitted 28 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.06201 [pdf, other]

Compressing Context to Enhance Inference Efficiency of Large Language Models

Authors: Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin

Abstract: Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Cont… ▽ More Large language models (LLMs) achieved remarkable performance across various tasks. However, they face challenges in managing long documents and extended conversations, due to significantly increased computational requirements, both in memory and inference time, and potential context truncation when the input exceeds the LLM's fixed context length. This paper proposes a method called Selective Context that enhances the inference efficiency of LLMs by identifying and pruning redundancy in the input context to make the input more compact. We test our approach using common data sources requiring long context processing: arXiv papers, news articles, and long conversations, on tasks of summarisation, question answering, and response generation. Experimental results show that Selective Context significantly reduces memory cost and decreases generation latency while maintaining comparable performance compared to that achieved when full context is used. Specifically, we achieve a 50\% reduction in context cost, resulting in a 36\% reduction in inference memory usage and a 32\% reduction in inference time, while observing only a minor drop of .023 in BERTscore and .038 in faithfulness on four downstream applications, indicating that our method strikes a good balance between efficiency and performance. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: EMNLP 2023. arXiv admin note: substantial text overlap with arXiv:2304.12102; text overlap with arXiv:2303.11076 by other authors

arXiv:2308.12488 [pdf, other]

GPTEval: A Survey on Assessments of ChatGPT and GPT-4

Authors: Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin, Erik Cambria

Abstract: The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizi… ▽ More The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research in evaluating large language models. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.12447 [pdf, other]

MOFO: MOtion FOcused Self-Supervision for Video Understanding

Authors: Mona Ahmadian, Frank Guerin, Andrew Gilbert

Abstract: Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing re… ▽ More Self-supervised learning (SSL) techniques have recently produced outstanding results in learning visual representations from unlabeled videos. Despite the importance of motion in supervised learning techniques for action recognition, SSL methods often do not explicitly consider motion information in videos. To address this issue, we propose MOFO (MOtion FOcused), a novel SSL method for focusing representation learning on the motion area of a video, for action recognition. MOFO automatically detects motion areas in videos and uses these to guide the self-supervision task. We use a masked autoencoder which randomly masks out a high proportion of the input sequence; we force a specified percentage of the inside of the motion area to be masked and the remainder from outside. We further incorporate motion information into the finetuning step to emphasise motion in the downstream task. We demonstrate that our motion-focused innovations can significantly boost the performance of the currently leading SSL method (VideoMAE) for action recognition. Our method improves the recent self-supervised Vision Transformer (ViT), VideoMAE, by achieving +2.6%, +2.1%, +1.3% accuracy on Epic-Kitchens verb, noun and action classification, respectively, and +4.7% accuracy on Something-Something V2 action classification. Our proposed approach significantly improves the performance of the current SSL method for action recognition, indicating the importance of explicitly encoding motion in SSL. △ Less

Submitted 1 November, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted at the NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

arXiv:2306.16195 [pdf, other]

Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation

Authors: Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Abstract: Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between… ▽ More Incorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between graph knowledge and text. In this study, we propose a novel framework for knowledge graph enhanced dialogue generation. We dynamically construct a multi-hop knowledge graph with pseudo nodes to involve the language model in feature aggregation within the graph at all steps. To avoid the semantic biases caused by learning on vanilla subgraphs, the proposed framework applies hierarchical graph attention to aggregate graph features on pseudo nodes and then attains a global feature. Therefore, the framework can better utilise the heterogeneous features from both the post and external graph knowledge. Extensive experiments demonstrate that our framework outperforms state-of-the-art (SOTA) baselines on dialogue generation. Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge. Moreover, the language model also learns how to better select knowledge triples for a more informative response via exploiting subgraph patterns within our feature aggregation process. Our code and resources are available at https://github.com/tangg555/SaBART. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: Accepted by ACL 2023

arXiv:2305.01082 [pdf, other]

doi 10.1145/3539618.3591861

Contextual Multilingual Spellchecker for User Queries

Authors: Sanat Sharma, Josep Valls-Vargas, Tracy Holloway King, Francois Guerin, Chirag Arora

Abstract: Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most… ▽ More Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications. △ Less

Submitted 14 June, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 5 pages, In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

arXiv:2302.05611 [pdf, other]

Metaphor Detection with Effective Context Denoising

Authors: Shun Wang, Yucheng Li, Chenghua Lin, Loïc Barrault, Frank Guerin

Abstract: We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our ap… ▽ More We propose a novel RoBERTa-based model, RoPPT, which introduces a target-oriented parse tree structure in metaphor detection. Compared to existing models, RoPPT focuses on semantically relevant information and achieves the state-of-the-art on several main metaphor datasets. We also compare our approach against several popular denoising and pruning methods, demonstrating the effectiveness of our approach in context denoising. Our code and dataset can be found at https://github.com/MajiBear000/RoPPT △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.04834 [pdf, other]

FrameBERT: Conceptual Metaphor Detection with Frame Embedding Learning

Authors: Yucheng Li, Shun Wang, Chenghua Lin, Frank Guerin, Loïc Barrault

Abstract: In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet. In this paper, we propose FrameBERT, a RoBERTa-based model that can explicitly learn and incorporate FrameNet Embeddings for concept-level metaphor detection. FrameBERT not only achieves better or comparable performance to the state-of-the-art, but also is more explainable and interpretable compared to existing models, attributing to its ability of accounting for external knowledge of FrameNet. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.13042 [pdf, other]

The Secret of Metaphor on Expressing Stronger Emotion

Authors: Yucheng Li, Frank Guerin, Chenghua Lin

Abstract: Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific t… ▽ More Metaphors are proven to have stronger emotional impact than literal expressions. Although this conclusion is shown to be promising in benefiting various NLP applications, the reasons behind this phenomenon are not well studied. This paper conducts the first study in exploring how metaphors convey stronger emotion than their literal counterparts. We find that metaphors are generally more specific than literal expressions. The more specific property of metaphor can be one of the reasons for metaphors' superiority in emotion expression. When we compare metaphors with literal expressions with the same specificity level, the gap of emotion expressing ability between both reduces significantly. In addition, we observe specificity is crucial in literal language as well, as literal language can express stronger emotion by making it more specific. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: FigLang@EMNLP2022

arXiv:2210.15551 [pdf, other]

Terminology-aware Medical Dialogue Generation

Authors: Chen Tang, Hongbo Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Abstract: Medical dialogue generation aims to generate responses according to a history of dialogue turns between doctors and patients. Unlike open-domain dialogue generation, this requires background knowledge specific to the medical domain. Existing generative frameworks for medical dialogue generation fall short of incorporating domain-specific knowledge, especially with regard to medical terminology. In… ▽ More Medical dialogue generation aims to generate responses according to a history of dialogue turns between doctors and patients. Unlike open-domain dialogue generation, this requires background knowledge specific to the medical domain. Existing generative frameworks for medical dialogue generation fall short of incorporating domain-specific knowledge, especially with regard to medical terminology. In this paper, we propose a novel framework to improve medical dialogue generation by considering features centered on domain-specific terminology. We leverage an attention mechanism to incorporate terminologically centred features, and fill in the semantic gap between medical background knowledge and common utterances by enforcing language models to learn terminology representations with an auxiliary terminology recognition task. Experimental results demonstrate the effectiveness of our approach, in which our proposed framework outperforms SOTA language models. Additionally, we provide a new dataset with medical terminology annotations to support the research on medical dialogue generation. Our dataset and code are available at https://github.com/tangg555/meddialog. △ Less

Submitted 14 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: Accepted by ICASSP 2023

arXiv:2210.12463 [pdf, other]

EtriCA: Event-Triggered Context-Aware Story Generation Augmented by Cross Attention

Authors: Chen Tang, Chenghua Lin, Henglin Huang, Frank Guerin, Zhihao Zhang

Abstract: One of the key challenges of automatic story generation is how to generate a long narrative that can maintain fluency, relevance, and coherence. Despite recent progress, current story generation systems still face the challenge of how to effectively capture contextual and event features, which has a profound impact on a model's generation performance. To address these challenges, we present EtriCA… ▽ More One of the key challenges of automatic story generation is how to generate a long narrative that can maintain fluency, relevance, and coherence. Despite recent progress, current story generation systems still face the challenge of how to effectively capture contextual and event features, which has a profound impact on a model's generation performance. To address these challenges, we present EtriCA, a novel neural generation model, which improves the relevance and coherence of the generated stories through residually mapping context features to event sequences with a cross-attention mechanism. Such a feature capturing mechanism allows our model to better exploit the logical relatedness between events when generating stories. Extensive experiments based on both automatic and human evaluations show that our model significantly outperforms state-of-the-art baselines, demonstrating the effectiveness of our model in leveraging context and event features. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: Accepted by EMNLP 2022 Findings

Journal ref: EMNLP 2022 Findings

arXiv:2210.10618 [pdf, other]

Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

Authors: Henglin Huang, Chen Tang, Tyler Loakman, Frank Guerin, Chenghua Lin

Abstract: Story generation aims to generate a long narrative conditioned on a given input. In spite of the success of prior works with the application of pre-trained models, current neural models for Chinese stories still struggle to generate high-quality long text narratives. We hypothesise that this stems from ambiguity in syntactically parsing the Chinese language, which does not have explicit delimiters… ▽ More Story generation aims to generate a long narrative conditioned on a given input. In spite of the success of prior works with the application of pre-trained models, current neural models for Chinese stories still struggle to generate high-quality long text narratives. We hypothesise that this stems from ambiguity in syntactically parsing the Chinese language, which does not have explicit delimiters for word segmentation. Consequently, neural models suffer from the inefficient capturing of features in Chinese narratives. In this paper, we present a new generation framework that enhances the feature capturing mechanism by informing the generation model of dependencies between words and additionally augmenting the semantic representation learning through synonym denoising training. We conduct a range of experiments, and the results demonstrate that our framework outperforms the state-of-the-art Chinese generation models on all evaluation metrics, demonstrating the benefits of enhanced dependency and semantic representation learning. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Journal ref: AACL 2022

arXiv:2210.10602 [pdf, other]

NGEP: A Graph-based Event Planning Framework for Story Generation

Authors: Chen Tang, Zhihao Zhang, Tyler Loakman, Chenghua Lin, Frank Guerin

Abstract: To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i.e. storylines) to guide story generation. Such prior works mostly employ end-to-end neural generation models to predict event sequences for a story. However, such generation models struggle to guarantee the narrative coherence of separate events due to the hallucination probl… ▽ More To improve the performance of long text generation, recent studies have leveraged automatically planned event structures (i.e. storylines) to guide story generation. Such prior works mostly employ end-to-end neural generation models to predict event sequences for a story. However, such generation models struggle to guarantee the narrative coherence of separate events due to the hallucination problem, and additionally the generated event sequences are often hard to control due to the end-to-end nature of the models. To address these challenges, we propose NGEP, an novel event planning framework which generates an event sequence by performing inference on an automatically constructed event graph and enhances generalisation ability through a neural event advisor. We conduct a range of experiments on multiple criteria, and the results demonstrate that our graph-based neural framework outperforms the state-of-the-art (SOTA) event planning approaches, considering both the performance of event sequence generation and the effectiveness on the downstream task of story generation. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Journal ref: AACL 2022

arXiv:2203.03047 [pdf, other]

Recent Advances in Neural Text Generation: A Task-Agnostic Survey

Authors: Chen Tang, Frank Guerin, Chenghua Lin

Abstract: In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG). The primary objective is to generate text that is both linguistically natural and human-like, while also exerting control over the generation process. This paper offers a comprehensive and task-agnostic survey of the recent advancements in neural text gen… ▽ More In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG). The primary objective is to generate text that is both linguistically natural and human-like, while also exerting control over the generation process. This paper offers a comprehensive and task-agnostic survey of the recent advancements in neural text generation. These advancements have been facilitated through a multitude of developments, which we categorize into four key areas: data construction, neural frameworks, training and inference strategies, and evaluation metrics. By examining these different aspects, we aim to provide a holistic overview of the progress made in the field. Furthermore, we explore the future directions for the advancement of neural text generation, which encompass the utilization of neural pipelines and the incorporation of background knowledge. These avenues present promising opportunities to further enhance the capabilities of NLG systems. Overall, this survey serves to consolidate the current state of the art in neural text generation and highlights potential avenues for future research and development in this dynamic field. △ Less

Submitted 12 June, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: This has been updated with some recent advances in 2023

arXiv:2107.05319 [pdf, other]

Human-like Relational Models for Activity Recognition in Video

Authors: Joseph Chrol-Cannon, Andrew Gilbert, Ranko Lazic, Adithya Madhusoodanan, Frank Guerin

Abstract: Video activity recognition by deep neural networks is impressive for many classes. However, it falls short of human performance, especially for challenging to discriminate activities. Humans differentiate these complex activities by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object entering the aperture of a container. Deep neural… ▽ More Video activity recognition by deep neural networks is impressive for many classes. However, it falls short of human performance, especially for challenging to discriminate activities. Humans differentiate these complex activities by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object entering the aperture of a container. Deep neural networks can struggle to learn such critical relationships effectively. Therefore we propose a more human-like approach to activity recognition, which interprets a video in sequential temporal phases and extracts specific relationships among objects and hands in those phases. Random forest classifiers are learnt from these extracted relationships. We apply the method to a challenging subset of the something-something dataset and achieve a more robust performance against neural network baselines on challenging activities. △ Less

Submitted 11 January, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

arXiv:2104.03391 [pdf, other]

Interpreting Verbal Metaphors by Paraphrasing

Authors: Rui Mao, Chenghua Lin, Frank Guerin

Abstract: Metaphorical expressions are difficult linguistic phenomena, challenging diverse Natural Language Processing tasks. Previous works showed that paraphrasing a metaphor as its literal counterpart can help machines better process metaphors on downstream tasks. In this paper, we interpret metaphors with BERT and WordNet hypernyms and synonyms in an unsupervised manner, showing that our method signific… ▽ More Metaphorical expressions are difficult linguistic phenomena, challenging diverse Natural Language Processing tasks. Previous works showed that paraphrasing a metaphor as its literal counterpart can help machines better process metaphors on downstream tasks. In this paper, we interpret metaphors with BERT and WordNet hypernyms and synonyms in an unsupervised manner, showing that our method significantly outperforms the state-of-the-art baseline. We also demonstrate that our method can help a machine translation system improve its accuracy in translating English metaphors to 8 target languages. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2104.03285 [pdf, other]

Combining Pre-trained Word Embeddings and Linguistic Features for Sequential Metaphor Identification

Authors: Rui Mao, Chenghua Lin, Frank Guerin

Abstract: We tackle the problem of identifying metaphors in text, treated as a sequence tagging task. The pre-trained word embeddings GloVe, ELMo and BERT have individually shown good performance on sequential metaphor identification. These embeddings are generated by different models, training targets and corpora, thus encoding different semantic and syntactic information. We show that leveraging GloVe, EL… ▽ More We tackle the problem of identifying metaphors in text, treated as a sequence tagging task. The pre-trained word embeddings GloVe, ELMo and BERT have individually shown good performance on sequential metaphor identification. These embeddings are generated by different models, training targets and corpora, thus encoding different semantic and syntactic information. We show that leveraging GloVe, ELMo and feature-based BERT based on a multi-channel CNN and a Bidirectional LSTM model can significantly outperform any single word embedding method and the combination of the two embeddings. Incorporating linguistic features into our model can further improve model performance, yielding state-of-the-art performance on three public metaphor datasets. We also provide in-depth analysis on the effectiveness of leveraging multiple word embeddings, including analysing the spatial distribution of different embedding methods for metaphors and literals, and showing how well the embeddings complement each other in different genres and parts of speech. △ Less

Submitted 7 April, 2021; originally announced April 2021.

arXiv:2103.13512 [pdf, other]

Projection: A Mechanism for Human-like Reasoning in Artificial Intelligence

Authors: Frank Guerin

Abstract: Artificial Intelligence systems cannot yet match human abilities to apply knowledge to situations that vary from what they have been programmed for, or trained for. In visual object recognition methods of inference exploiting top-down information (from a model) have been shown to be effective for recognising entities in difficult conditions. Here this type of inference, called `projection', is sho… ▽ More Artificial Intelligence systems cannot yet match human abilities to apply knowledge to situations that vary from what they have been programmed for, or trained for. In visual object recognition methods of inference exploiting top-down information (from a model) have been shown to be effective for recognising entities in difficult conditions. Here this type of inference, called `projection', is shown to be a key mechanism to solve the problem of applying knowledge to varied or challenging situations, across a range of AI domains, such as vision, robotics, or language. Finally the relevance of projection to tackling the commonsense knowledge problem is discussed. △ Less

Submitted 17 May, 2022; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 29 pages, 3 figures. Some minor additions/clarifications in this revision, e.g. mathematical description

arXiv:2012.02128 [pdf, other]

BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

Authors: Jing Su, Qingyun Dai, Frank Guerin, Mian Zhou

Abstract: Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical vis… ▽ More Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation. △ Less

Submitted 3 December, 2020; originally announced December 2020.

arXiv:2010.05204 [pdf, other]

Towards Somaesthetics Inspired Games: Exploring the Influence of a Mirror Effect on Self-Presentation in a Public Setting

Authors: Fiona Guerin, Alice Rey, Enis Caliskan, Erik Kynast, Andreas Zimmerer, Ilhan Aslan, Elisabeth André

Abstract: We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Base… ▽ More We report on an initial user study, which explores how players of an augmented mirror game, self-style or self-present themselves when they are allowed to see themselves in the mirror compared to when they do not see themselves. To this end, we customized an open source fruit slicing game into an interactive installation for an architecture museum and conducted with 36 visitors a field study. Based on an analysis of video recordings of participants we identified, for example significant differences in how often participants smile. Ultimately, presenting a self-image to gamers in a social setting resulted in behavior change, which we argue could be utilized carefully from a Somaesthetics perspective as an experience design feature in future games. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 11 pages

arXiv:1907.12385 [pdf, other]

Latent Space Factorisation and Manipulation via Matrix Subspace Projection

Authors: Xiao Li, Chenghua Lin, Ruizhe Li, Chaozheng Wang, Frank Guerin

Abstract: We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discr… ▽ More We tackle the problem disentangling the latent space of an autoencoder in order to separate labelled attribute information from other characteristic information. This then allows us to change selected attributes while preserving other information. Our method, matrix subspace projection, is much simpler than previous approaches to latent space factorisation, for example not requiring multiple discriminators or a careful weighting among their loss functions. Furthermore our new model can be applied to autoencoders as a plugin, and works across diverse domains such as images or text. We demonstrate the utility of our method for attribute manipulation in autoencoders trained across varied domains, using both human evaluation and automated methods. The quality of generation of our new model (e.g. reconstruction, conditional generation) is highly competitive to a number of strong baselines. △ Less

Submitted 14 August, 2020; v1 submitted 26 July, 2019; originally announced July 2019.

Comments: Final camera ready version for ICML 2020

arXiv:1803.02743 [pdf, other]

Adapting Everyday Manipulation Skills to Varied Scenarios

Authors: Pawel Gajewski, Paulo Ferreira, Georg Bartels, Chaozheng Wang, Frank Guerin, Bipin Indurkhya, Michael Beetz, Bartlomiej Sniezynski

Abstract: We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; th… ▽ More We address the problem of executing tool-using manipulation skills in scenarios where the objects to be used may vary. We assume that point clouds of the tool and target object can be obtained, but no interpretation or further knowledge about these objects is provided. The system must interpret the point clouds and decide how to use the tool to complete a manipulation task with a target object; this means it must adjust motion trajectories appropriately to complete the task. We tackle three everyday manipulations: scraping material from a tool into a container, cutting, and scooping from a container. Our solution encodes these manipulation skills in a generic way, with parameters that can be filled in at run-time via queries to a robot perception module; the perception module abstracts the functional parts for the tool and extracts key parameters that are needed for the task. The approach is evaluated in simulation and with selected examples on a PR2 robot. △ Less

Submitted 4 March, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: accepted for ICRA 2019

arXiv:1710.04970 [pdf, other]

Transfer of Tool Affordance and Manipulation Cues with 3D Vision Data

Authors: Paulo Abelha, Frank Guerin

Abstract: Future service robots working in human environments, such as kitchens, will face situations where they need to improvise. The usual tool for a given task might not be available and the robot will have to use some substitute tool. The robot needs to select an appropriate alternative tool from the candidates available, and also needs to know where to grasp it, how to orient it and what part to use a… ▽ More Future service robots working in human environments, such as kitchens, will face situations where they need to improvise. The usual tool for a given task might not be available and the robot will have to use some substitute tool. The robot needs to select an appropriate alternative tool from the candidates available, and also needs to know where to grasp it, how to orient it and what part to use as the end-effector. We present a system which takes as input a candidate tool's point cloud and weight, and outputs a score for how effective that tool is for a task, and how to use it. Our key novelty is in taking a task-driven approach, where the task exerts a top-down influence on how low level vision data is interpreted. This facilitates the type of 'everyday creativity' where an object such as a wine bottle could be used as a rolling pin, because the interpretation of the object is not fixed in advance, but rather results from the interaction between the bottom-up and top-down pressures at run-time. The top-down influence is implemented by transfer: prior knowledge of geometric features that make a tool good for a task is used to seek similar features in a candidate tool. The prior knowledge is learned by simulating Web models performing the tasks. We evaluate on a set of fifty household objects and five tasks. We compare our system with the closest one in the literature and show that we achieve significantly better results △ Less

Submitted 13 October, 2017; originally announced October 2017.

Comments: 24 pages

arXiv:1410.8292 [pdf, other]

A Decentralized Interactive Architecture for Aerial and Ground Mobile Robots Cooperation

Authors: El Houssein Chouaib Harik, François Guérin, Frédéric Guinand, Jean-François Brethé, Hervé Pelvillain

Abstract: This paper presents a novel decentralized interactive architecture for aerial and ground mobile robots cooperation. The aerial mobile robot is used to provide a global coverage during an area inspection, while the ground mobile robot is used to provide a local coverage of ground features. We include a human-in-the-loop to provide waypoints for the ground mobile robot to progress safely in the insp… ▽ More This paper presents a novel decentralized interactive architecture for aerial and ground mobile robots cooperation. The aerial mobile robot is used to provide a global coverage during an area inspection, while the ground mobile robot is used to provide a local coverage of ground features. We include a human-in-the-loop to provide waypoints for the ground mobile robot to progress safely in the inspected area. The aerial mobile robot follows continuously the ground mobile robot in order to always keep it in its coverage view. △ Less

Submitted 27 February, 2015; v1 submitted 30 October, 2014; originally announced October 2014.

Comments: Submitted to 2015 International Conference on Control, Automation and Robotics (ICCAR)

Showing 1–30 of 30 results for author: Guerin, F