-
EnzChemRED, a rich enzyme chemistry relation extraction dataset
Authors:
Po-Ting Lai,
Elisabeth Coudert,
Lucila Aimo,
Kristian Axelsen,
Lionel Breuza,
Edouard de Castro,
Marc Feuermann,
Anne Morgat,
Lucille Pourcel,
Ivo Pedruzzi,
Sylvain Poux,
Nicole Redaschi,
Catherine Rivoire,
Anastasia Sveshnikova,
Chih-Hsuan Wei,
Robert Leaman,
Ling Luo,
Zhiyong Lu,
Alan Bridge
Abstract:
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) metho…
▽ More
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at https://ftp.expasy.org/databases/rhea/nlp/.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Video In Sentences Out
Authors:
Andrei Barbu,
Alexander Bridge,
Zachary Burchill,
Dan Coroian,
Sven Dickinson,
Sanja Fidler,
Aaron Michaux,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
Jinlian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adj…
▽ More
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the trackto-role assignments, and changing body posture.
△ Less
Submitted 9 August, 2014;
originally announced August 2014.
-
Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction
Authors:
Andrei Barbu,
Alexander Bridge,
Dan Coroian,
Sven Dickinson,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
Jinlian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on…
▽ More
We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.
△ Less
Submitted 16 April, 2012;
originally announced April 2012.
-
Video In Sentences Out
Authors:
Andrei Barbu,
Alexander Bridge,
Zachary Burchill,
Dan Coroian,
Sven Dickinson,
Sanja Fidler,
Aaron Michaux,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
Jinlian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adju…
▽ More
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.