Skip to main content

Showing 1–7 of 7 results for author: Wilbur, W J

  1. arXiv:2402.13225  [pdf

    cs.CL cs.AI

    AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning

    Authors: Qiao Jin, Zhizheng Wang, Yifan Yang, Qingqing Zhu, Donald Wright, Thomas Huang, W John Wilbur, Zhe He, Andrew Taylor, Qingyu Chen, Zhiyong Lu

    Abstract: Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes such as prognosis. Nevertheless, their widespread utilization is frequently hindered by usability challenges, poor dissemination, and restricted functionality. Augmenting large language models with extensive collections of clinical calculators presents an opportunity to overcom… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Work in progress

  2. arXiv:2307.00589  [pdf

    cs.IR cs.AI cs.CL q-bio.QM

    MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

    Authors: Qiao Jin, Won Kim, Qingyu Chen, Donald C. Comeau, Lana Yeganova, W. John Wilbur, Zhiyong Lu

    Abstract: Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we… ▽ More

    Submitted 3 October, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: The MedCPT code and API are available at https://github.com/ncbi/MedCPT

  3. arXiv:2209.08124  [pdf

    cs.LG

    Comprehensively identifying Long Covid articles with human-in-the-loop machine learning

    Authors: Robert Leaman, Rezarta Islamaj, Alexis Allot, Qingyu Chen, W. John Wilbur, Zhiyong Lu

    Abstract: A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning fr… ▽ More

    Submitted 28 October, 2022; v1 submitted 16 September, 2022; originally announced September 2022.

  4. arXiv:2008.03397  [pdf

    cs.DL cs.DB cs.IR cs.LG

    Navigating the landscape of COVID-19 research through literature analysis: A bird's eye view

    Authors: Lana Yeganova, Rezarta Islamaj, Qingyu Chen, Robert Leaman, Alexis Allot, Chin-Hsuan Wei, Donald C. Comeau, Won Kim, Yifan Peng, W. John Wilbur, Zhiyong Lu

    Abstract: Timely access to accurate scientific literature in the battle with the ongoing COVID-19 pandemic is critical. This unprecedented public health risk has motivated research towards understanding the disease in general, identifying drugs to treat the disease, developing potential vaccines, etc. This has given rise to a rapidly growing body of literature that doubles in number of publications every 20… ▽ More

    Submitted 11 September, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: 10 pages, 8 Figures, Submitted to KDD 2020 Health Day

    Journal ref: KDD 2020 Health Day: AI for COVID, August 23-27, 2020, Virtual Conference, CA, US

  5. arXiv:1912.02077  [pdf

    cs.CL cs.IR

    PDC -- a probabilistic distributional clustering algorithm: a case study on suicide articles in PubMed

    Authors: Rezarta Islamaj, Lana Yeganova, Won Kim, Natalie Xie, W. John Wilbur, Zhiyong Lu

    Abstract: The need to organize a large collection in a manner that facilitates human comprehension is crucial given the ever-increasing volumes of information. In this work, we present PDC (probabilistic distributional clustering), a novel algorithm that, given a document collection, computes disjoint term sets representing topics in the collection. The algorithm relies on probabilities of word co-occurrenc… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: AMIA Informatics Summit 2020, 18 pages, Algorithm in the Appendix, 3 figures

  6. arXiv:1909.03044  [pdf

    cs.CL cs.LG stat.ML

    Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records

    Authors: Qingyu Chen, Jingcheng Du, Sun Kim, W. John Wilbur, Zhiyong Lu

    Abstract: Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP organizers have made the first attempt to annotate 1,068 sentence pairs from clinical notes and have called for a com… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: 15 pages, 5 figures, 2 tables

  7. Bridging the Gap: Incorporating a Semantic Similarity Measure for Effectively Mapping PubMed Queries to Documents

    Authors: Sun Kim, Nicolas Fiorini, W. John Wilbur, Zhiyong Lu

    Abstract: The main approach of traditional information retrieval (IR) is to examine how many words from a query appear in a document. A drawback of this approach, however, is that it may fail to detect relevant documents where no or only few words from a query are found. The semantic analysis methods such as LSA (latent semantic analysis) and LDA (latent Dirichlet allocation) have been proposed to address t… ▽ More

    Submitted 17 October, 2017; v1 submitted 5 August, 2016; originally announced August 2016.

    Comments: 10 pages, 1 figure, 3 tables