Skip to main content

Showing 1–50 of 76 results for author: Wallace, B C

  1. arXiv:2407.09429  [pdf, other

    cs.CL

    Open (Clinical) LLMs are Sensitive to Instruction Phrasings

    Authors: Alberto Mario Ceballos Arroyo, Monica Munnangi, Jiuding Sun, Karen Y. C. Zhang, Denis Jered McInerney, Byron C. Wallace, Silvio Amir

    Abstract: Instruction-tuned Large Language Models (LLMs) can perform a wide range of tasks given natural language instructions to do so, but they are sensitive to how such instructions are phrased. This issue is especially concerning in healthcare, as clinicians are unlikely to be experienced prompt engineers and the potential consequences of inaccurate outputs are heightened in this domain. This raises a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: To appear at BioNLP, ACL 2024

  2. arXiv:2407.00211  [pdf, other

    cs.CL

    Detection and Measurement of Syntactic Templates in Generated Text

    Authors: Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. Wallace

    Abstract: Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic templates and show that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts. W… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2406.14511  [pdf, other

    cs.CL

    Investigating Mysteries of CoT-Augmented Distillation

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a s… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Draft; under review

  4. arXiv:2406.09330  [pdf, other

    cs.CL

    Learning from Natural Language Explanations for Generalizable Entity Matching

    Authors: Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris Kong

    Abstract: Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted L… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.01686  [pdf, other

    cs.CL cs.AI

    Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

    Authors: Hye Sun Yun, David Pogrebitskiy, Iain J. Marshall, Byron C. Wallace

    Abstract: Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 24 pages, 7 figures, 6 tables

  6. arXiv:2404.00152  [pdf, other

    cs.CL

    On-the-fly Definition Augmentation of LLMs for Biomedical NER

    Authors: Monica Munnangi, Sergey Feldman, Byron C Wallace, Silvio Amir, Tom Hope, Aakanksha Naik

    Abstract: Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to p… ▽ More

    Submitted 23 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: To appear at NAACL 2024 (Main)

  7. arXiv:2403.00553  [pdf, other

    cs.CL

    Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

    Authors: Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: The diversity across outputs generated by large language models shapes the perception of their quality and utility. Prompt leaks, templated answer structure, and canned responses across different interactions are readily noticed by people, but there is no standard score to measure this aspect of model behavior. In this work we empirically investigate diversity scores on English texts. We find that… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Preprint

  8. arXiv:2402.18756  [pdf, other

    cs.CL

    How Much Annotation is Needed to Compare Summarization Models?

    Authors: Chantal Shaib, Joe Barrow, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: Modern instruction-tuned models have become highly capable in text generation tasks such as summarization, and are expected to be released at a steady pace. In practice one may now wish to choose confidently, but with minimal effort, the best performing summarization model when applied to a new domain or purpose. In this work, we empirically investigate the test sample size necessary to select a p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Preprint

  9. arXiv:2402.15663  [pdf, other

    cs.CL

    Leveraging ChatGPT in Pharmacovigilance Event Extraction: An Empirical Study

    Authors: Zhaoyue Sun, Gabriele Pergola, Byron C. Wallace, Yulan He

    Abstract: With the advent of large language models (LLMs), there has been growing interest in exploring their potential for medical applications. This research aims to investigate the ability of LLMs, specifically ChatGPT, in the context of pharmacovigilance event extraction, of which the main goal is to identify and extract adverse events or potential therapeutic events from textual medical sources. We con… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 14 pages, 2 figures, accepted by EACL 2024

  10. arXiv:2402.12566  [pdf, other

    cs.CL cs.LG

    GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence

    Authors: Kundan Krishna, Sanjana Ramprasad, Prakhar Gupta, Byron C. Wallace, Zachary C. Lipton, Jeffrey P. Bigham

    Abstract: LLMs can generate factually incorrect statements even when provided access to reference documents. Such errors can be dangerous in high-stakes applications (e.g., document-grounded QA for healthcare or finance). We present GenAudit -- a tool intended to assist fact-checking LLM responses for document-grounded tasks. GenAudit suggests edits to the LLM response by revising or removing claims that ar… ▽ More

    Submitted 16 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Code and models available at https://genaudit.org

  11. arXiv:2402.11456  [pdf, other

    cs.CL

    FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

    Authors: Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa Göke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li

    Abstract: Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Preprint has been updated to match the final revision for ACL 2024

  12. arXiv:2402.10109  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

    Authors: Denis Jered McInerney, William Dickinson, Lucy C. Flynn, Andrea C. Young, Geoffrey S. Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propo… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  13. arXiv:2402.03509  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

    Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton, Byron C Wallace

    Abstract: Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (pote… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  14. arXiv:2401.16475  [pdf, other

    cs.CL

    InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

    Authors: Jan Trienes, Sebastian Joseph, Jörg Schlötterer, Christin Seifert, Kyle Lo, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Text simplification aims to make technical texts more accessible to laypeople but often results in deletion of information and vagueness. This work proposes InfoLossQA, a framework to characterize and recover simplification-induced information loss in form of question-and-answer (QA) pairs. Building on the theory of Question Under Discussion, the QA pairs are designed to help readers deepen their… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024 (main conference)

  15. arXiv:2311.11211  [pdf

    cs.AI

    Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness

    Authors: Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

    Abstract: Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, ho… ▽ More

    Submitted 31 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  16. Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

    Authors: Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

    Abstract: We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead. More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$? To test this, we measure linear appr… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted at CoNLL 2023

  17. arXiv:2310.15213  [pdf, other

    cs.CL cs.LG

    Function Vectors in Large Language Models

    Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

    Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are… ▽ More

    Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. 52 pages, 30 figures, 23 tables. Code and data at https://functions.baulab.info

  18. arXiv:2309.04550  [pdf, other

    cs.CL

    Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges

    Authors: Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C. Wallace

    Abstract: Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs… ▽ More

    Submitted 10 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  19. arXiv:2306.11270  [pdf, other

    cs.CL cs.LG

    Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

    Authors: Jiuding Sun, Chantal Shaib, Byron C. Wallace

    Abstract: Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-… ▽ More

    Submitted 8 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

  20. arXiv:2305.14296  [pdf, other

    cs.CL cs.LG

    USB: A Unified Summarization Benchmark Across Tasks and Domains

    Authors: Kundan Krishna, Prakhar Gupta, Sanjana Ramprasad, Byron C. Wallace, Jeffrey P. Bigham, Zachary C. Lipton

    Abstract: While the NLP community has produced numerous summarization benchmarks, none provide the rich annotations required to simultaneously address many important problems related to control and reliability. We introduce a Wikipedia-derived benchmark, complemented by a rich set of crowd-sourced annotations, that supports $8$ interrelated tasks: (i) extractive summarization; (ii) abstractive summarization… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP Findings 2023 Camera Ready

  21. arXiv:2305.13693  [pdf, other

    cs.CL

    Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations

    Authors: Lucy Lu Wang, Yulia Otmakhova, Jay DeYoung, Thinh Hung Truong, Bailey E. Kuehl, Erin Bransom, Byron C. Wallace

    Abstract: Evaluating multi-document summarization (MDS) quality is difficult. This is especially true in the case of MDS for biomedical literature reviews, where models must synthesize contradicting evidence reported across different documents. Prior work has shown that rather than performing the task, models may exploit shortcuts that are difficult to detect using standard n-gram similarity metrics such as… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: ACL 2023; Github: https://github.com/allenai/mslr-annotated-dataset

  22. arXiv:2305.12532  [pdf, other

    cs.CL

    Multilingual Simplification of Medical Texts

    Authors: Sebastian Joseph, Kathryn Kazanas, Keziah Reina, Vishnesh J. Ramanathan, Wei Xu, Byron C. Wallace, Junyi Jessy Li

    Abstract: Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text… ▽ More

    Submitted 18 October, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: This version will be in EMNLP 2023 main

  23. arXiv:2305.11828  [pdf, other

    cs.CL cs.AI cs.HC

    Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews

    Authors: Hye Sun Yun, Iain J. Marshall, Thomas A. Trikalinos, Byron C. Wallace

    Abstract: Medical systematic reviews play a vital role in healthcare decision making and policy. However, their production is time-consuming, limiting the availability of high-quality and up-to-date evidence summaries. Recent advancements in large language models (LLMs) offer the potential to automatically generate literature reviews on demand, addressing this issue. However, LLMs sometimes generate inaccur… ▽ More

    Submitted 18 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: 18 pages, 2 figures, 8 tables. Accepted as an EMNLP 2023 main paper

  24. arXiv:2305.06299  [pdf, other

    cs.CL

    Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

    Authors: Chantal Shaib, Millicent L. Li, Sebastian Joseph, Iain J. Marshall, Junyi Jessy Li, Byron C. Wallace

    Abstract: Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized, high-stakes domains such as biomedicine. In this paper, we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generat… ▽ More

    Submitted 11 May, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted short paper to ACL 2023

  25. arXiv:2305.05003  [pdf, other

    cs.CL

    Revisiting Relation Extraction in the era of Large Language Models

    Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

    Abstract: Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  26. arXiv:2305.03642  [pdf, other

    cs.CL

    Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

    Authors: Somin Wadhwa, Jay DeYoung, Benjamin Nye, Silvio Amir, Byron C. Wallace

    Abstract: Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of i… ▽ More

    Submitted 17 July, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: Accepted to MLHC 2023

  27. arXiv:2303.05392  [pdf, other

    cs.CL cs.IR cs.LG

    Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges

    Authors: Sanjana Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron C. Wallace

    Abstract: We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  28. arXiv:2302.12343  [pdf, other

    cs.CL cs.AI cs.LG

    CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models

    Authors: Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: We propose CHiLL (Crafting High-Level Latents), an approach for natural-language specification of features for linear models. CHiLL prompts LLMs with expert-crafted queries to generate interpretable features from health records. The resulting noisy labels are then used to train a simple linear classifier. Generating features based on queries to an LLM can empower physicians to use their domain exp… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: To be published at EMNLP Findings 2023

  29. arXiv:2302.05574  [pdf, other

    cs.CL

    NapSS: Paragraph-level Medical Text Simplification via Narrative Prompting and Sentence-matching Summarization

    Authors: Junru Lu, Jiazheng Li, Byron C. Wallace, Yulan He, Gabriele Pergola

    Abstract: Accessing medical literature is difficult for laypeople as the content is written for specialists and contains medical jargon. Automated text simplification methods offer a potential means to address this issue. In this work, we propose a summarize-then-simplify two-stage strategy, which we call NapSS, identifying the relevant content to simplify while ensuring that the original narrative flow is… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Findings of EACL 2023

  30. arXiv:2302.02169  [pdf, other

    cs.LG cs.AI cs.CL

    How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

    Authors: Jinghan Yang, Sarthak Jain, Byron C. Wallace

    Abstract: We consider the problem of identifying a minimal subset of training data $\mathcal{S}_t$ such that if the instances comprising $\mathcal{S}_t$ had been removed prior to training, the categorization of a given test point $x_t$ would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of $\mathcal{S}_t$ provides a measure of robustness (if… ▽ More

    Submitted 8 February, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023

  31. arXiv:2301.13844  [pdf, other

    cs.CL

    Do Multi-Document Summarization Models Synthesize?

    Authors: Jay DeYoung, Stephanie C. Martinez, Iain J. Marshall, Byron C. Wallace

    Abstract: Multi-document summarization entails producing concise synopses of collections of inputs. For some applications, the synopsis should accurately synthesize inputs with respect to a key aspect, e.g., a synopsis of film reviews written about a particular movie should reflect the average critic consensus. As a more consequential example, narrative summaries that accompany biomedical systematic reviews… ▽ More

    Submitted 12 July, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to TACL, to be presented at ACL 2024 in Bangkok, Thailand. 9 Figures, 11 Tables, 14 pages of main content, 20 pages total. This paper has some _history_. Buy me a drink if you want to hear about it

    Report number: TACL 6011

  32. arXiv:2212.01641  [pdf, other

    cs.CL cs.LG

    Intermediate Entity-based Sparse Interpretable Representation Learning

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh, Byron C. Wallace

    Abstract: Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permi… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted into BlackBox NLP Workshop at EMNLP 2022

  33. arXiv:2210.14177  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Influence Functions for Sequence Tagging Models

    Authors: Sarthak Jain, Varun Manjunatha, Byron C. Wallace, Ani Nenkova

    Abstract: Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagging, and Semantic Role Labeling) are naturally framed as sequence tagging problems. However, there has been comparatively little work on interpretability methods for sequence tagging models. In this paper, we extend influence functions - which aim to trace predictions back to the training points that informed them - to sequenc… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Accepted to Findings of EMNLP 2022

  34. arXiv:2210.12560  [pdf, other

    cs.CL

    PHEE: A Dataset for Pharmacovigilance Event Extraction from Text

    Authors: Zhaoyue Sun, Jiazheng Li, Gabriele Pergola, Byron C. Wallace, Bino John, Nigel Greene, Joseph Kim, Yulan He

    Abstract: The primary goal of drug safety researchers and regulators is to promptly identify adverse drug reactions. Doing so may in turn prevent or reduce the harm to patients and ultimately improve public health. Evaluating and monitoring drug safety (i.e., pharmacovigilance) involves analyzing an ever growing collection of spontaneous reports from health professionals, physicians, and pharmacists, and in… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: 17 pages, 3 figures, EMNLP2022 accepted

  35. arXiv:2210.08145  [pdf, other

    cs.CL

    Self-Repetition in Abstractive Neural Summarizers

    Authors: Nikita Salkar, Thomas Trikalinos, Byron C. Wallace, Ani Nenkova

    Abstract: We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5, and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three archit… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  36. arXiv:2210.06565  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

    Authors: Denis Jered McInerney, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenom… ▽ More

    Submitted 22 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

  37. arXiv:2206.02696  [pdf, other

    cs.CL

    Learning to Ask Like a Physician

    Authors: Eric Lehman, Vladislav Lialin, Katelyn Y. Legaspi, Anne Janelle R. Sy, Patricia Therese S. Pile, Nicole Rose I. Alberto, Richard Raymund R. Ragasa, Corinna Victoria M. Puyat, Isabelle Rose I. Alberto, Pia Gabrielle I. Alfonso, Marianne Taliño, Dana Moukheiber, Byron C. Wallace, Anna Rumshisky, Jenifer J. Liang, Preethi Raghavan, Leo Anthony Celi, Peter Szolovits

    Abstract: Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and consequently fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are gene… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  38. arXiv:2204.07562  [pdf, other

    cs.CL

    Evaluating Factuality in Text Simplification

    Authors: Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li

    Abstract: Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupporte… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  39. arXiv:2109.10415  [pdf, other

    cs.CL cs.AI

    What Would it Take to get Biomedical QA Systems into Practice?

    Authors: Gregory Kell, Iain J. Marshall, Byron C. Wallace, Andre Jaun

    Abstract: Medical question answering (QA) systems have the potential to answer clinicians uncertainties about treatment and diagnosis on demand, informed by the latest evidence. However, despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments. One likely reason for this is that clinicians may not readily trust QA system… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Accepted by MRQA workshop at EMNLP 2021

  40. arXiv:2107.00323  [pdf, other

    cs.CL cs.LG

    Combining Feature and Instance Attribution to Detect Artifacts

    Authors: Pouya Pezeshkpour, Sarthak Jain, Sameer Singh, Byron C. Wallace

    Abstract: Training the deep neural networks that dominate NLP requires large datasets. These are often collected automatically or via crowdsourcing, and may exhibit systematic biases or annotation artifacts. By the latter we mean spurious correlations between inputs and outputs that do not represent a generally held causal relationship between features and classes; models that exploit such correlations may… ▽ More

    Submitted 25 March, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: ACL Findings 2022

  41. arXiv:2106.09502  [pdf, other

    cs.CL cs.LG

    Biomedical Interpretable Entity Representations

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Ioana Baldini, Joydeep Ghosh, Byron C. Wallace, Kush R. Varshney

    Abstract: Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic represent… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted into Findings of ACL-IJCNLP 2021

  42. arXiv:2104.07762  [pdf, other

    cs.CL cs.AI cs.LG

    Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?

    Authors: Eric Lehman, Sarthak Jain, Karl Pichotta, Yoav Goldberg, Byron C. Wallace

    Abstract: Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT. While most efforts have used deidentified… ▽ More

    Submitted 22 April, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: NAACL Camera Ready Submission

  43. arXiv:2104.07155  [pdf, other

    cs.CL cs.LG

    Disentangling Representations of Text by Masking Transformers

    Authors: Xiongyi Zhang, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Representations from large pretrained models such as BERT encode a range of features into monolithic vectors, affording strong predictive accuracy across a multitude of downstream tasks. In this paper we explore whether it is possible to learn disentangled representations by identifying existing subnetworks within pretrained models that encode distinct, complementary aspect representations. Concre… ▽ More

    Submitted 10 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: 14 pages, 9 figures

  44. arXiv:2104.06338  [pdf, other

    cs.CL

    On the Impact of Random Seeds on the Fairness of Clinical Classifiers

    Authors: Silvio Amir, Jan-Willem van de Meent, Byron C. Wallace

    Abstract: Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yie… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted for publication at NAACL 2021

  45. arXiv:2104.05767  [pdf, other

    cs.CL

    Paragraph-level Simplification of Medical Texts

    Authors: Ashwin Devaraj, Iain J. Marshall, Byron C. Wallace, Junyi Jessy Li

    Abstract: We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no larg… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  46. arXiv:2104.04128  [pdf, other

    cs.CL cs.LG

    An Empirical Comparison of Instance Attribution Methods for NLP

    Authors: Pouya Pezeshkpour, Sarthak Jain, Byron C. Wallace, Sameer Singh

    Abstract: Widespread adoption of deep models has motivated a pressing need for approaches to interpret network outputs and to facilitate model debugging. Instance attribution methods constitute one means of accomplishing these goals by retrieving training instances that (may have) led to a particular prediction. Influence functions (IF; Koh and Liang 2017) provide machinery for doing this by quantifying the… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  47. arXiv:2010.11966  [pdf, other

    cs.CL cs.LG

    Unsupervised Data Augmentation with Naive Augmentation and without Unlabeled Data

    Authors: David Lowell, Brian E. Howard, Zachary C. Lipton, Byron C. Wallace

    Abstract: Unsupervised Data Augmentation (UDA) is a semi-supervised technique that applies a consistency loss to penalize differences between a model's predictions on (a) observed (unlabeled) examples; and (b) corresponding 'noised' examples produced via data augmentation. While UDA has gained popularity for text classification, open questions linger over which design decisions are necessary and over how to… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  48. arXiv:2010.03550  [pdf, other

    cs.CL

    Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations

    Authors: Benjamin E. Nye, Jay DeYoung, Eric Lehman, Ani Nenkova, Iain J. Marshall, Byron C. Wallace

    Abstract: The best evidence concerning comparative treatment effectiveness comes from clinical trials, the results of which are reported in unstructured articles. Medical experts must manually extract information from articles to inform decision-making, which is time-consuming and expensive. Here we consider the end-to-end task of both (a) extracting treatments and outcomes from full-text articles describin… ▽ More

    Submitted 7 January, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

  49. arXiv:2008.11293  [pdf, other

    cs.CL

    Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization

    Authors: Byron C. Wallace, Sayantan Saha, Frank Soboczenski, Iain J. Marshall

    Abstract: We consider the problem of automatically generating a narrative biomedical evidence summary from multiple trial reports. We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews previously conducted by members of the Cochrane collaboration, using the authors conclusions section of the review abstract as our target. We enlist medical profe… ▽ More

    Submitted 22 December, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

    Comments: 11 pages, 2 figures. Accepted for presentation at the 2021 AMIA Informatics Summit

  50. arXiv:2005.10865  [pdf, other

    cs.IR cs.CL cs.HC cs.LG

    Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time

    Authors: Benjamin E. Nye, Ani Nenkova, Iain J. Marshall, Byron C. Wallace

    Abstract: We introduce Trialstreamer, a living database of clinical trial reports. Here we mainly describe the evidence extraction component; this extracts from biomedical abstracts key pieces of information that clinicians need when appraising the literature, and also the relations between these. Specifically, the system extracts descriptions of trial participants, the treatments compared in each arm (the… ▽ More

    Submitted 21 May, 2020; originally announced May 2020.

    Comments: 6 pages, 4 figures

    ACM Class: I.2.7; J.3