Skip to main content

Showing 1–24 of 24 results for author: Delbrouck, J

  1. arXiv:2406.06512  [pdf, other

    cs.CV cs.AI

    Merlin: A Vision Language Foundation Model for 3D Computed Tomography

    Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

    Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures

  2. arXiv:2405.19538  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats

    Authors: Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz

    Abstract: Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 13 pages Updated title

  3. arXiv:2405.03595  [pdf, other

    cs.CL cs.AI

    GREEN: Generative Radiology Report Evaluation and Error Notation

    Authors: Sophie Ostmeier, Justin Xu, Zhihong Chen, Maya Varma, Louis Blankemeier, Christian Bluethgen, Arne Edward Michalson, Michael Moseley, Curtis Langlotz, Akshay S Chaudhari, Jean-Benoit Delbrouck

    Abstract: Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GRE… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2401.12208  [pdf, other

    cs.CV cs.CL

    CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

    Authors: Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, Emily B. Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Gatidis, Akshay S. Chaudhari, Curtis Langlotz

    Abstract: Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challengin… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 24 pages, 8 figures

  5. Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

    Authors: Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, John Pauly, Akshay S. Chaudhari

    Abstract: Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs,… ▽ More

    Submitted 11 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 27 pages, 19 figures

    Journal ref: Nature Medicine, 2024

  6. arXiv:2308.11194  [pdf, other

    cs.CV cs.AI

    ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data

    Authors: Maya Varma, Jean-Benoit Delbrouck, Sarah Hooper, Akshay Chaudhari, Curtis Langlotz

    Abstract: Vision-language models (VLMs), such as CLIP and ALIGN, are generally trained on datasets consisting of image-caption pairs obtained from the web. However, real-world multimodal datasets, such as healthcare data, are significantly more complex: each image (e.g. X-ray) is often paired with text (e.g. physician report) that describes many distinct attributes occurring in fine-grained regions of the i… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  7. arXiv:2305.01146  [pdf, other

    cs.CL

    RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

    Authors: Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly

    Abstract: We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to… ▽ More

    Submitted 20 July, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: 12 pages, 10 figures. Published in ACL BioNLP. Compared to v1, v2 includes minor edits and one additional figure in the appendix. Compared to v2, v3 includes a link to the project's GitHub repository

  8. arXiv:2211.12737  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

    Authors: Pierre Chambon, Christian Bluethgen, Jean-Benoit Delbrouck, Rogier Van der Sluijs, Małgorzata Połacin, Juan Manuel Zambrano Chaves, Tanishq Mathew Abraham, Shivanshu Purohit, Curtis P. Langlotz, Akshay Chaudhari

    Abstract: Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images. Medical imaging data is fundamentally different to natural images, and the language used to succinctly capture relevant details in medical data uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multi-modal models trai… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 19 pages

  9. Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

    Authors: Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck

    Abstract: Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproductio… ▽ More

    Submitted 21 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023

  10. arXiv:2210.12186  [pdf, other

    cs.CL cs.AI

    Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards

    Authors: Jean-Benoit Delbrouck, Pierre Chambon, Christian Bluethgen, Emily Tsai, Omar Almusa, Curtis P. Langlotz

    Abstract: Neural image-to-text radiology report generation systems offer the potential to improve radiology reporting by reducing the repetitive process of report drafting and identifying possible medical errors. These systems have achieved promising performance as measured by widely used NLG metrics such as BLEU and CIDEr. However, the current systems face important limitations. First, they present an incr… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  11. arXiv:2203.14960  [pdf, other

    cs.LG cs.AI

    Domino: Discovering Systematic Errors with Cross-Modal Embeddings

    Authors: Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré

    Abstract: Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDM… ▽ More

    Submitted 21 May, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: ICLR 2022 (Oral)

  12. arXiv:2010.02057  [pdf, other

    cs.CL cs.HC cs.LG

    Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

    Authors: Jean-Benoit Delbrouck, Noé Tits, Stéphane Dupont

    Abstract: This paper aims to bring a new lightweight yet powerful solution for the task of Emotion Recognition and Sentiment Analysis. Our motivation is to propose two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field. To demonstrate the efficiency of our mode… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020 workshop: NLP Beyond Text (NLPBT)

  13. A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis

    Authors: Jean-Benoit Delbrouck, Noé Tits, Mathilde Brousmiche, Stéphane Dupont

    Abstract: Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed soluti… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: Winner of the ACL20: Second Grand-Challenge on Multimodal Language

  14. arXiv:1910.14609  [pdf, other

    cs.CL cs.CV cs.LG

    Can adversarial training learn image captioning ?

    Authors: Jean-Benoit Delbrouck, Bastien Vanderplaetse, Stéphane Dupont

    Abstract: Recently, generative adversarial networks (GAN) have gathered a lot of interest. Their efficiency in generating unseen samples of high quality, especially images, has improved over the years. In the field of Natural Language Generation (NLG), the use of the adversarial setting to generate meaningful sentences has shown to be difficult for two reasons: the lack of existing architectures to produce… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: Accepted to NeurIPS 2019 ViGiL workshop

  15. arXiv:1910.03343  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Modulated Self-attention Convolutional Network for VQA

    Authors: Jean-Benoit Delbrouck, Antoine Maiorca, Nathan Hubens, Stéphane Dupont

    Abstract: As new data-sets for real-world visual reasoning and compositional question answering are emerging, it might be needed to use the visual feature extraction as a end-to-end process during training. This small contribution aims to suggest new ideas to improve the visual processing of traditional convolutional network for visual question answering (VQA). In this paper, we propose to modulate by a lin… ▽ More

    Submitted 31 October, 2019; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 workshop: ViGIL

  16. arXiv:1910.02766  [pdf, other

    cs.CL

    Adversarial reconstruction for Multi-modal Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: Even with the growing interest in problems at the intersection of Computer Vision and Natural Language, grounding (i.e. identifying) the components of a structured description in an image still remains a challenging task. This contribution aims to propose a model which learns grounding by reconstructing the visual features for the Multi-modal translation task. Previous works have partially investi… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  17. arXiv:1811.09178  [pdf, other

    cs.CV

    Object-oriented Targets for Visual Navigation using Rich Semantic Representations

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: When searching for an object humans navigate through a scene using semantic information and spatial relationships. We look for an object using our knowledge of its attributes and relationships with other objects to infer the probable location. In this paper, we propose to tackle the visual navigation problem using rich semantic representations of the observed scene and object-oriented targets to t… ▽ More

    Submitted 17 December, 2018; v1 submitted 22 November, 2018; originally announced November 2018.

    Comments: Presented at NIPS workshop (ViGIL)

  18. arXiv:1810.06245  [pdf, other

    cs.CL

    Bringing back simplicity and lightliness into neural image captioning

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: Neural Image Captioning (NIC) or neural caption generation has attracted a lot of attention over the last few years. Describing an image with a natural language has been an emerging challenge in both fields of computer vision and language processing. Therefore a lot of research has focused on driving this task forward with new creative ideas. So far, the goal has been to maximize scores on automat… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  19. arXiv:1810.06233  [pdf, ps, other

    cs.CL

    UMONS Submission for WMT18 Multimodal Translation Task

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18). We explore a novel architecture, called deepGRU, based on recent findings in the related task of Neural Image Captioning (NIC). The models presented in the following sections lead to the best METEOR translation score for both constrained (English, im… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

  20. arXiv:1805.02489  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    Transformer for Emotion Recognition

    Authors: Jean-Benoit Delbrouck

    Abstract: This paper describes the UMONS solution for the OMG-Emotion Challenge. We explore a context-dependent architecture where the arousal and valence of an utterance are predicted according to its surrounding context (i.e. the preceding and following utterances of the video). We report an improvement when taking into account context for both unimodal and multimodal predictions.

    Submitted 30 May, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

  21. arXiv:1712.03449  [pdf, other

    cs.CL

    Modulating and attending the source image during encoding improves Multimodal Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order to compute the most informative image features for our task. Additionally, we propose a new attention mechanism derived from this original idea, where the attention model for the visual input is conditio… ▽ More

    Submitted 9 December, 2017; originally announced December 2017.

    Comments: Accepted at NIPS Workshop

    Journal ref: Visually-Grounded Interaction and Language, NIPS 2017 Workshop

  22. Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont, Omar Seddati

    Abstract: In Multimodal Neural Machine Translation (MNMT), a neural model generates a translated sentence that describes an image, given the image itself and one source descriptions in English. This is considered as the multimodal image caption translation task. The images are processed with Convolutional Neural Network (CNN) to extract visual features exploitable by the translation model. So far, the CNNs… ▽ More

    Submitted 16 December, 2017; v1 submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted to GLU 2017. arXiv admin note: text overlap with arXiv:1707.00995

    Journal ref: Proc. GLU 2017 International Workshop on Grounding Language Understanding

  23. An empirical study on the effectiveness of images in Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stéphane Dupont

    Abstract: In state-of-the-art Neural Machine Translation (NMT), an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks,… ▽ More

    Submitted 4 July, 2017; originally announced July 2017.

    Comments: Accepted to EMNLP 2017

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

  24. arXiv:1703.08084  [pdf, other

    cs.CL

    Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation

    Authors: Jean-Benoit Delbrouck, Stephane Dupont

    Abstract: In state-of-the-art Neural Machine Translation, an attention mechanism is used during decoding to enhance the translation. At every step, the decoder uses this mechanism to focus on different parts of the source sentence to gather the most useful information before outputting its target word. Recently, the effectiveness of the attention mechanism has also been explored for multimodal tasks, where… ▽ More

    Submitted 23 March, 2017; originally announced March 2017.

    Comments: Submitted to ICLR Workshop 2017