Skip to main content

Showing 1–8 of 8 results for author: Labbé, C

  1. arXiv:2402.03370  [pdf, other

    cs.IR cs.AI cs.CL cs.DL

    Detection of tortured phrases in scientific literature

    Authors: Eléna Martel, Martin Lentschat, Cyril Labbé

    Abstract: This paper presents various automatic detection methods to extract so called tortured phrases from scientific papers. These tortured phrases, e.g. flag to clamor instead of signal to noise, are the results of paraphrasing tools used to escape plagiarism detection. We built a dataset and evaluated several strategies to flag previously undocumented tortured phrases. The proposed and tested methods a… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 2nd Workshop on Information Extraction from Scientific Publications, Nov 2023, Bali, Indonesia

  2. arXiv:2402.03362  [pdf, other

    cs.IR cs.AI cs.CL

    NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervision

    Authors: Martin Lentschat, Cyril Labbé, Ran Cheng

    Abstract: Here we present the training and evaluation of NanoNER, a Named Entity Recognition (NER) model for Nanobiology. NER consists in the identification of specific entities in spans of unstructured texts and is often a primary task in Natural Language Processing (NLP) and Information Extraction. The aim of our model is to recognise entities previously identified by domain experts as constituting the es… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

  3. Sneaked references: Cooked reference metadata inflate citation counts

    Authors: Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov

    Abstract: We report evidence of an undocumented method to manipulate citation counts involving 'sneaked' references. Sneaked references are registered as metadata for scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metad… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    ACM Class: H.3.7

  4. arXiv:2210.13024  [pdf, other

    cs.CL cs.IR

    Investigating the detection of Tortured Phrases in Scientific Literature

    Authors: Puthineath Lay, Martin Lentschat, Cyril Labbé

    Abstract: With the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of 'tortured phrase', an unexpected odd phrase that appears instead of the fixed… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  5. arXiv:2210.04895  [pdf

    cs.DL cs.IR

    The 'Problematic Paper Screener' automatically selects suspect publications for post-publication (re)assessment

    Authors: Guillaume Cabanac, Cyril Labbé, Alexander Magazinov

    Abstract: Post publication assessment remains necessary to check erroneous or fraudulent scientific publications. We present an online platform, the 'Problematic Paper Screener' (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener) that leverages both automatic machine detection and human assessment to identify and flag already published problematic articles. We provide a new effective tool to… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Presented at WCRI 2022: 7th World Conference on Research Integrity, Cape Town, South Africa. 29 May -- 1 June 2022

  6. arXiv:2209.04703  [pdf

    cs.DL

    Improper legitimization of hijacked journals through citations

    Authors: Anna Abalkina, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov

    Abstract: The goal is to study the prevalence of citajacked papers: papers in authentic scientific journals citing hijacked journals, in academic literature. A Citejacked detector was designed as a part of the Problematic Paper Screener (https://www.irit.fr/~Guillaume.Cabanac/problematic-paper-screener/citejacked) to trace if the references to articles originating from hijacked journals infiltrate scientifi… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

  7. arXiv:2107.06751  [pdf, other

    cs.DL cs.CL cs.CY cs.IR

    Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals

    Authors: Guillaume Cabanac, Cyril Labbé, Alexander Magazinov

    Abstract: Probabilistic text generators have been used to produce fake scientific papers for more than a decade. Such nonsensical papers are easily detected by both human and machine. Now more complex AI-powered generation techniques produce texts indistinguishable from that of humans and the generation of scientific texts from a few keywords has been documented. Our study introduces the concept of tortured… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  8. arXiv:1910.03484  [pdf, ps, other

    cs.CL

    Semi-Supervised Neural Text Generation by Joint Learning of Natural Language Generation and Natural Language Understanding Models

    Authors: Raheel Qader, François Portet, Cyril Labbé

    Abstract: In Natural Language Generation (NLG), End-to-End (E2E) systems trained through deep learning have recently gained a strong interest. Such deep models need a large amount of carefully annotated data to reach satisfactory performance. However, acquiring such datasets for every new NLG application is a tedious and time-consuming task. In this paper, we propose a semi-supervised deep learning scheme t… ▽ More

    Submitted 29 September, 2019; originally announced October 2019.