Skip to main content

Showing 1–5 of 5 results for author: Grasch, P

  1. arXiv:2407.02477  [pdf, other

    cs.CV cs.CL

    Understanding Alignment in Multimodal LLMs: A Comprehensive Study

    Authors: Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch

    Abstract: Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by pro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01509  [pdf, other

    cs.CV cs.CL

    MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

    Authors: Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier, Peter Grasch, Yinfei Yang, Zhe Gan

    Abstract: We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results fro… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  4. arXiv:2201.05692  [pdf, other

    cs.CL

    Model Stability with Continuous Data Updates

    Authors: Huiting Liu, Avinesh P. V. S., Siddharth Patwardhan, Peter Grasch, Sachin Agarwal

    Abstract: In this paper, we study the "stability" of machine learning (ML) models within the context of larger, complex NLP systems with continuous training data updates. For this study, we propose a methodology for the assessment of model stability (which we refer to as jitter under various experimental conditions. We find that model design choices, including network architecture and input representation,… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

  5. Noise Robust Named Entity Understanding for Voice Assistants

    Authors: Deepak Muralidharan, Joel Ruben Antony Moniz, Sida Gao, Xiao Yang, Justine Kao, Stephen Pulman, Atish Kothari, Ray Shen, Yinying Pan, Vivek Kaul, Mubarak Seyed Ibrahim, Gang Xiang, Nan Dun, Yidan Zhou, Andy O, Yuan Zhang, Pooja Chitkara, Xuan Wang, Alkesh Patel, Kushal Tayal, Roger Zheng, Peter Grasch, Jason D. Williams, Lin Li

    Abstract: Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to… ▽ More

    Submitted 10 August, 2021; v1 submitted 29 May, 2020; originally announced May 2020.

    Comments: NAACL 2021 Industry Track

    MSC Class: 68T50 ACM Class: I.2.7