Skip to main content

Showing 1–15 of 15 results for author: Mungall, C J

  1. arXiv:2406.02623  [pdf

    q-bio.OT cs.DL

    The Vertebrate Breed Ontology: Towards Effective Breed Data Standardization

    Authors: Kathleen R. Mullen, Imke Tammen, Nicolas A. Matentzoglu, Marius Mather, Christopher J. Mungall, Melissa A. Haendel, Frank W. Nicholas, Sabrina Toro, the Vertebrate Breed Ontology Consortium

    Abstract: Background: Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes application of existing information-based tools to support advancement in veterinary diagnostics, treatments, and precision medicine. Objectives: Creation of a Vertebrate Breed Ontology (VBO) as a single, coherent logic-based… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2406.00063  [pdf

    cs.DB

    Methods for Linking Data to Online Resources and Ontologies with Applications to Neurophysiology

    Authors: Matthew Avaylon, Ryan Ly, Andrew Tritt, Benjamin Dichter, Kristofer E. Bouchard, Christopher J. Mungall, Oliver Ruebel

    Abstract: Across many domains, large swaths of digital assets are being stored across distributed data repositories, e.g., the DANDI Archive [8]. The distribution and diversity of these repositories impede researchers from formally defining terminology within experiments, integrating information across datasets, and easily querying, reusing, and analyzing data that follow the FAIR principles [15]. As such,… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  3. arXiv:2404.03044  [pdf

    cs.LG cs.AI

    The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies

    Authors: Marcin P. Joachimiak, Mark A. Miller, J. Harry Caufield, Ryan Ly, Nomi L. Harris, Andrew Tritt, Christopher J. Mungall, Kristofer E. Bouchard

    Abstract: The Artificial Intelligence Ontology (AIO) is a systematization of artificial intelligence (AI) concepts, methodologies, and their interrelations. Developed via manual curation, with the additional assistance of large language models (LLMs), AIO aims to address the rapidly evolving landscape of AI by providing a comprehensive framework that encompasses both technical and ethical aspects of AI tech… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  4. arXiv:2312.10904  [pdf

    cs.AI

    Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI)

    Authors: Sabrina Toro, Anna V Anagnostopoulos, Sue Bello, Kai Blumberg, Rhiannon Cameron, Leigh Carmody, Alexander D Diehl, Damion Dooley, William Duncan, Petra Fey, Pascale Gaudet, Nomi L Harris, Marcin Joachimiak, Leila Kiani, Tiago Lubiana, Monica C Munoz-Torres, Shawn O'Neil, David Osumi-Sutherland, Aleix Puig, Justin P Reese, Leonore Reiser, Sofia Robb, Troy Ruemping, James Seager, Eric Sid , et al. (5 additional authors not shown)

    Abstract: Background: Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dyna… ▽ More

    Submitted 12 June, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

  5. arXiv:2311.05042  [pdf, other

    cs.IR cs.AI cs.LG q-bio.GN

    Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

    Authors: Oluwamayowa O. Amusat, Harshad Hegde, Christopher J. Mungall, Anna Giannakou, Neil P. Byers, Dan Gunter, Kjiersten Fagnan, Lavanya Ramakrishnan

    Abstract: Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lacks the essential metadata required for researchers to find and search them effectively. The lack of metadata poses a significant challenge in the utilization of these datasets. Machine learning-based metadata extraction techniques have emerged as a potentially viable approach to automati… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 33 pages, 6 figures, 10 tables

  6. arXiv:2310.03666  [pdf

    cs.CL cs.AI

    MapperGPT: Large Language Models for Linking and Mapping Entities

    Authors: Nicolas Matentzoglu, J. Harry Caufield, Harshad B. Hegde, Justin T. Reese, Sierra Moxon, Hyeongsik Kim, Nomi L. Harris, Melissa A Haendel, Christopher J. Mungall

    Abstract: Aligning terminological resources, including ontologies, controlled vocabularies, taxonomies, and value sets is a critical part of data integration in many domains such as healthcare, chemistry, and biomedical research. Entity mapping is the process of determining correspondences between entities across these resources, such as gene identifiers, disease concepts, or chemical entity identifiers. Ma… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  7. arXiv:2309.17169  [pdf

    cs.CL cs.AI

    An evaluation of GPT models for phenotype concept recognition

    Authors: Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A Haendel, Peter N Robinson, Christopher J Mungall, Justin T Reese

    Abstract: Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machin… ▽ More

    Submitted 22 November, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

  8. arXiv:2307.05727  [pdf

    cs.AI cs.CE

    An Open-Source Knowledge Graph Ecosystem for the Life Sciences

    Authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A. Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt, Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana, Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo, Nicole A. Vasilevsky, Robert Hoehndorf , et al. (7 additional authors not shown)

    Abstract: Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integrat… ▽ More

    Submitted 30 January, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  9. arXiv:2305.13338  [pdf

    q-bio.GN cs.AI cs.CL q-bio.QM

    Gene Set Summarization using Large Language Models

    Authors: Marcin P. Joachimiak, J. Harry Caufield, Nomi L. Harris, Hyeongsik Kim, Christopher J. Mungall

    Abstract: Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpretin… ▽ More

    Submitted 3 July, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

  10. arXiv:2304.02711  [pdf, other

    cs.AI cs.LG

    Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning

    Authors: J. Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L. Harris, Marcin P. Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra A. T. Moxon, Justin T. Reese, Melissa A. Haendel, Peter N. Robinson, Christopher J. Mungall

    Abstract: Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (S… ▽ More

    Submitted 22 December, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Updated 2023-12-22

  11. arXiv:2302.10800  [pdf

    q-bio.QM cs.AI cs.LG

    KG-Hub -- Building and Exchanging Biological Knowledge Graphs

    Authors: J Harry Caufield, Tim Putman, Kevin Schaper, Deepak R Unni, Harshad Hegde, Tiffany J Callahan, Luca Cappelletti, Sierra AT Moxon, Vida Ravanmehr, Seth Carbon, Lauren E Chan, Katherina Cortes, Kent A Shefchek, Glass Elsarboukh, James P Balhoff, Tommaso Fontana, Nicolas Matentzoglu, Richard M Bruskiewich, Anne E Thessen, Nomi L Harris, Monica C Munoz-Torres, Melissa A Haendel, Peter N Robinson, Marcin P Joachimiak, Christopher J Mungall , et al. (1 additional authors not shown)

    Abstract: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simp… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  12. Ontology Development Kit: a toolkit for building, maintaining, and standardising biomedical ontologies

    Authors: Nicolas Matentzoglu, Damien Goutte-Gattat, Shawn Zheng Kai Tan, James P. Balhoff, Seth Carbon, Anita R. Caron, William D. Duncan, Joe E. Flack, Melissa Haendel, Nomi L. Harris, William R Hogan, Charles Tapley Hoyt, Rebecca C. Jackson, HyeongSik Kim, Huseyin Kir, Martin Larralde, Julie A. McMurry, James A. Overton, Bjoern Peters, Clare Pilgrim, Ray Stefancsik, Sofia MC Robb, Sabrina Toro, Nicole A Vasilevsky, Ramona Walls , et al. (2 additional authors not shown)

    Abstract: Similar to managing software packages, managing the ontology life cycle involves multiple complex workflows such as preparing releases, continuous quality control checking, and dependency management. To manage these processes, a diverse set of tools is required, from command line utilities to powerful ontology engineering environments such as ROBOT. Particularly in the biomedical domain, which has… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: 19 pages, 2 supplementary tables, 1 supplementary figure

  13. Biolink Model: A Universal Schema for Knowledge Graphs in Clinical, Biomedical, and Translational Science

    Authors: Deepak R. Unni, Sierra A. T. Moxon, Michael Bada, Matthew Brush, Richard Bruskiewich, Paul Clemons, Vlado Dancik, Michel Dumontier, Karamarie Fecho, Gustavo Glusman, Jennifer J. Hadlock, Nomi L. Harris, Arpita Joshi, Tim Putman, Guangrong Qin, Stephen A. Ramsey, Kent A. Shefchek, Harold Solbrig, Karthik Soman, Anne T. Thessen, Melissa A. Haendel, Chris Bizon, Christopher J. Mungall, the Biomedical Data Translator Consortium

    Abstract: Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness between core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  14. A Simple Standard for Sharing Ontological Mappings (SSSOM)

    Authors: Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, John Graybeal, Alasdair Gray, Benjamin M. Gyori, Melissa Haendel, Henriette Harmse, Nomi L. Harris, Ian Harrow, Harshad Hegde, Amelia L. Hoyt, Charles T. Hoyt, Dazhi Jiao, Ernesto Jiménez-Ruiz, Simon Jupp, Hyeongsik Kim, Sebastian Koehler , et al. (19 additional authors not shown)

    Abstract: Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: Corresponding author: Christopher J. Mungall <cjmungall@lbl.gov>

  15. arXiv:2110.06196  [pdf, other

    cs.LG cs.DC

    GRAPE for Fast and Scalable Graph Processing and random walk-based Embedding

    Authors: Luca Cappelletti, Tommaso Fontana, Elena Casiraghi, Vida Ravanmehr, Tiffany J. Callahan, Carlos Cano, Marcin P. Joachimiak, Christopher J. Mungall, Peter N. Robinson, Justin Reese, Giorgio Valentini

    Abstract: Graph Representation Learning (GRL) methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE, a software resource for graph processing and embedding that can scale with… ▽ More

    Submitted 7 May, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

    ACM Class: D.m; E.2; I.2.6; I.5.5