Skip to main content

Showing 1–6 of 6 results for author: Weers, F

  1. arXiv:2404.05719  [pdf, other

    cs.CV cs.CL cs.HC

    Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs

    Authors: Keen You, Haotian Zhang, Eldon Schoop, Floris Weers, Amanda Swearngin, Jeffrey Nichols, Yinfei Yang, Zhe Gan

    Abstract: Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend and interact effectively with user interface (UI) screens. In this paper, we present Ferret-UI, a new MLLM tailored for enhanced understanding of mobile UI screens, equipped with referring, grounding, and reasoning capabilities. Given… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  2. arXiv:2403.09611  [pdf, other

    cs.CV cs.CL cs.LG

    MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

    Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman , et al. (7 additional authors not shown)

    Abstract: In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for la… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  3. arXiv:2311.00613  [pdf, other

    cs.SD cs.LG eess.AS

    Controllable Music Production with Diffusion Models and Guidance Gradients

    Authors: Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson

    Abstract: We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic ch… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  4. arXiv:2309.04354  [pdf, other

    cs.CV cs.LG stat.ML

    Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

    Authors: Erik Daxberger, Floris Weers, Bowen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

    Abstract: Sparse Mixture-of-Experts models (MoEs) have recently gained popularity due to their ability to decouple model size from inference efficiency by only activating a small subset of the model parameters for any given input token. As such, sparse MoEs have enabled unprecedented scalability, resulting in tremendous successes across domains such as natural language processing and computer vision. In thi… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  5. arXiv:2301.07836  [pdf, other

    cs.CV cs.AI

    Masked Autoencoding Does Not Help Natural Language Supervision at Scale

    Authors: Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter

    Abstract: Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks. Recent works such as M3AE and SLIP have suggested that these approaches can be effectively combined, but most notably their results use small pre-training datasets (<50M samples) and don't effectively reflect the large-scale regim… ▽ More

    Submitted 15 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted at CVPR 2023

  6. arXiv:2202.08143  [pdf, other

    cs.CV

    Bias in Automated Image Colorization: Metrics and Error Types

    Authors: Frank Stapel, Floris Weers, Doina Bucur

    Abstract: We measure the color shifts present in colorized images from the ADE20K dataset, when colorized by the automatic GAN-based DeOldify model. We introduce fine-grained local and regional bias measurements between the original and the colorized images, and observe many colorization effects. We confirm a general desaturation effect, and also provide novel observations: a shift towards the training aver… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: 5 pages, 8 figures

    MSC Class: 68T45 ACM Class: I.4.4