Skip to main content

Showing 1–50 of 160 results for author: Keutzer, K

  1. arXiv:2407.08892  [pdf, other

    cs.CL cs.LG

    Characterizing Prompt Compression Methods for Long Context Inference

    Authors: Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami

    Abstract: Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts. Recently, several methods have been proposed to compress the prompt to reduce the context length. However, there has been little work on comparing the different proposed methods across different tasks thro… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Es-FoMo @ ICML 2024

  2. arXiv:2407.03442  [pdf, other

    cs.CV

    Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

    Authors: Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, Shanghang Zhang

    Abstract: The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Poster presentation at the 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

  3. arXiv:2406.12303  [pdf, other

    cs.CV

    Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

    Authors: Yiheng Li, Heyang Jiang, Akio Kodaira, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu

    Abstract: In this paper, we point out suboptimal noise-data mapping leads to slow training of diffusion models. During diffusion training, current methods diffuse each image across the entire noise space, resulting in a mixture of all images at every point in the noise layer. We emphasize that this random mixture of noise-data mapping complicates the optimization of the denoising function in diffusion model… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.07296  [pdf, other

    cs.RO cs.CL

    Instruct Large Language Models to Drive like Humans

    Authors: Ruijun Zhang, Xianda Guo, Wenzhao Zheng, Chenming Zhang, Kurt Keutzer, Long Chen

    Abstract: Motion planning in complex scenarios is the core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to plan the future trajectory. Recent methods seek the knowledge preserved in large language models (LLMs) and apply them in the driving scenarios. Despite the promising results, it is still unclear whether the LLM learns the underlying human logi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: project page: https://github.com/bonbon-rj/InstructDriver

  5. arXiv:2405.20323  [pdf, other

    cs.CV cs.AI

    $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

    Authors: Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

    Abstract: Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Code is available at: https://github.com/nnanhuang/S3Gaussian/

  6. arXiv:2405.15757  [pdf, other

    cs.CV cs.MM

    Looking Backward: Streaming Video-to-Video Translation with Feature Banks

    Authors: Feng Liang, Akio Kodaira, Chenfeng Xu, Masayoshi Tomizuka, Kurt Keutzer, Diana Marculescu

    Abstract: This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts. Unlike prior V2V methods using batches to process limited frames, we opt to process frames in a streaming fashion, to support unlimited frames. At the heart of StreamV2V lies a backward-looking principle that relates the present to the past. This is realized by m… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Project page: https://jeff-liangf.github.io/projects/streamv2v

  7. arXiv:2404.08985  [pdf, other

    cs.LG cs.AI

    Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

    Authors: Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation. However, the diversity of downstream tasks in multitask scenarios presents substantial adaptation challenges for LLMs. While traditional methods often succumb to knowledge confusion on thei… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  8. arXiv:2404.07979  [pdf, other

    cs.CL cs.AI cs.LG

    LLoCO: Learning Long Contexts Offline

    Authors: Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

    Abstract: Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work

  9. arXiv:2403.15042  [pdf, other

    cs.CL

    LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

    Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More

    Submitted 13 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: ACL 2024

  10. arXiv:2403.14123  [pdf, other

    cs.LG cs.AR cs.DC

    AI and Memory Wall

    Authors: Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer

    Abstract: The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs. However, the main performance bottleneck is increasingly shifting to memory bandwidth. Over the past 20 years, peak server hardware FLOPS has been scaling at 3.0x/2yrs, outpacing the growth of DRAM and… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Published in IEEE Micro Journal

  11. arXiv:2403.12052  [pdf, other

    cs.CV

    A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

    Authors: Rui Ma, Qiang Zhou, Yizhu Jin, Daquan Zhou, Bangjun Xiao, Xiuyu Li, Yi Qu, Aishani Singh, Kurt Keutzer, Jingtong Hu, Xiaodong Xie, Zhen Dong, Shanghang Zhang, Shiji Zhou

    Abstract: Copyright law confers upon creators the exclusive rights to reproduce, distribute, and monetize their creative works. However, recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. These technologies enable the unauthorized learning and replication of copyrighted content, artistic creations, and likenesses, leading to the proliferation of unregu… ▽ More

    Submitted 21 June, 2024; v1 submitted 4 January, 2024; originally announced March 2024.

    Comments: 20 pages, 7 figures, 3 table

  12. arXiv:2403.12031  [pdf, other

    cs.LG cs.AI

    RouterBench: A Benchmark for Multi-LLM Routing System

    Authors: Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay

    Abstract: As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths… ▽ More

    Submitted 28 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  13. arXiv:2403.08125  [pdf, other

    cs.CV

    Q-SLAM: Quadric Representations for Monocular SLAM

    Authors: Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan

    Abstract: Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirement… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  14. arXiv:2402.16363  [pdf, other

    cs.CL cs.AI

    LLM Inference Unveiled: Survey and Roofline Model Insights

    Authors: Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

    Abstract: The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this domain. Our survey stands out from traditional literature reviews by not only summ… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  15. arXiv:2402.09368  [pdf, other

    cs.CV cs.AI

    Magic-Me: Identity-Specific Video Customized Diffusion

    Authors: Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng

    Abstract: Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has achieved great progress with the identity controlled via reference images. However, its extension to video generation is not well explored. In this work, we propose a simple yet effective subject identity c… ▽ More

    Submitted 20 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Project Page at https://magic-me-webpage.github.io

  16. arXiv:2401.18079  [pdf, other

    cs.LG

    KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

    Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

    Abstract: LLMs are seeing growing use for applications such as document analysis and summarization which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference. Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurat… ▽ More

    Submitted 4 July, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  17. arXiv:2401.07886  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Learned Best-Effort LLM Serving

    Authors: Siddharth Jha, Coleman Hooper, Xiaoxuan Liu, Sehoon Kim, Kurt Keutzer

    Abstract: Many applications must provide low-latency LLM service to users or risk unacceptable user experience. However, over-provisioning resources to serve fluctuating request patterns is often prohibitively expensive. In this work, we present a best-effort serving system that employs deep reinforcement learning to adjust service quality based on the task distribution and system load. Our best-effort syst… ▽ More

    Submitted 14 July, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Es-FoMo @ ICML 2024

  18. arXiv:2401.07853  [pdf, other

    cs.CV

    VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

    Authors: Rongyu Zhang, Zefan Cai, Huanrui Yang, Zidong Liu, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Baobao Chang, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback, we propose a novel approach, Vision-language Collaborative Active Finetuning (VeCAF). With the emerging availability of labels and natural language a… ▽ More

    Submitted 13 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 13 pages

  19. arXiv:2312.16610  [pdf, other

    cs.CV cs.LG

    Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

    Authors: Rongyu Zhang, Yulin Luo, Jiaming Liu, Huanrui Yang, Zhen Dong, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Yuan Du, Shanghang Zhang

    Abstract: The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-task learning including low-level upstream tasks such as concurrent removal of multiple adverse weather effects. However, the conventional MoE architecture with parallel Feed Forward Network (FFN) experts leads to significant parameter and computational overheads that hinder its efficient deployment. In additio… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: aaai2024

  20. arXiv:2312.12491  [pdf, other

    cs.CV cs.GR cs.LG

    StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

    Authors: Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Kurt Keutzer

    Abstract: We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throu… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: tech report, the code is available at https://github.com/cumulo-autumn/StreamDiffusion

  21. arXiv:2312.09148  [pdf, other

    cs.LG cs.CV

    Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

    Authors: Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang

    Abstract: Uncertainty estimation is crucial for machine learning models to detect out-of-distribution (OOD) inputs. However, the conventional discriminative deep learning classifiers produce uncalibrated closed-set predictions for OOD data. A more robust classifiers with the uncertainty estimation typically require a potentially unavailable OOD dataset for outlier exposure training, or a considerable amount… ▽ More

    Submitted 27 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: ICML2024. Project website is available at https://antonioo-c.github.io/projects/split-ensemble

  22. arXiv:2312.04511  [pdf, other

    cs.CL

    An LLM Compiler for Parallel Function Calling

    Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

    Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  23. arXiv:2311.08562  [pdf, other

    cs.CL

    MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

    Authors: Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng

    Abstract: Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work int… ▽ More

    Submitted 16 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: work in progress

  24. arXiv:2311.07620  [pdf, other

    cs.AR cs.LG

    EPIM: Efficient Processing-In-Memory Accelerators based on Epitome

    Authors: Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer

    Abstract: The utilization of large-scale neural networks on Processing-In-Memory (PIM) accelerators encounters challenges due to constrained on-chip memory capacity. To tackle this issue, current works explore model compression algorithms to reduce the size of Convolutional Neural Networks (CNNs). Most of these algorithms either aim to represent neural operators with reduced-size parameters (e.g., quantizat… ▽ More

    Submitted 17 April, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

  25. Simple and Effective Input Reformulations for Translation

    Authors: Brian Yu, Hansen Lillemark, Kurt Keutzer

    Abstract: Foundation language models learn from their finetuning input context in different ways. In this paper, we reformulate inputs during finetuning for challenging translation tasks, leveraging model strengths from pretraining in novel ways to improve downstream performance. These reformulations are simple data level modifications, require no additional collection of training data or modification of da… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: 13 pages, 6 figures. To be published in Empirical Methods in Natural Language Processing (Main) 2023

  26. arXiv:2311.03285  [pdf, other

    cs.LG cs.AI cs.DC

    S-LoRA: Serving Thousands of Concurrent LoRA Adapters

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

    Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  27. arXiv:2310.16003  [pdf, other

    cs.CV

    CVPR 2023 Text Guided Video Editing Competition

    Authors: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola

    Abstract: Humans watch more than a billion hours of video per day. Most of this video was edited manually, which is a tedious process. However, AI-enabled video-generation and video-editing is on the rise. Building on text-to-image models like Stable Diffusion and Imagen, generative AI has improved dramatically on video tasks. But it's hard to evaluate progress in these video tasks because there is no stand… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Project page: https://sites.google.com/view/loveucvpr23/track4

  28. arXiv:2310.15166  [pdf, other

    cs.CV cs.CL

    Large Language Models are Visual Reasoning Coordinators

    Authors: Liangyu Chen, Bo Li, Sheng Shen, Jingkang Yang, Chunyuan Li, Kurt Keutzer, Trevor Darrell, Ziwei Liu

    Abstract: Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the collective power of these complementary VLMs is rarely explored. Existing methods like ensemble still struggle to aggregate these models with the desir… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  29. arXiv:2310.12072  [pdf, other

    cs.CL

    SPEED: Speculative Pipelined Execution for Efficient Decoding

    Authors: Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao

    Abstract: Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios has been highly restricted due to the significant inference latency associated with these models. This is particularly pronounced due to the autoregressive nat… ▽ More

    Submitted 2 January, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS Workshop on Efficient Natural Language and Speech Processing (2023)

  30. arXiv:2310.10008  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Unified and Effective Domain Generalization

    Authors: Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

    Abstract: We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, w… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Project Website: https://invictus717.github.io/Generalization/

  31. arXiv:2310.07147  [pdf, other

    cs.CL cs.LG

    QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources

    Authors: Zhikai Li, Xiaoxuan Liu, Banghua Zhu, Zhen Dong, Qingyi Gu, Kurt Keutzer

    Abstract: Large Language Models (LLMs) have showcased remarkable impacts across a wide spectrum of natural language processing tasks. Fine-tuning these pre-trained models on downstream datasets provides further significant performance gains, but this process has been challenging due to its extraordinary resource requirements. To this end, existing efforts focus on parameter-efficient fine-tuning, which, unf… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  32. arXiv:2310.01779  [pdf, other

    cs.CV

    HallE-Control: Controlling Object Hallucination in Large Multimodal Models

    Authors: Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li

    Abstract: Current Large Multimodal Models (LMMs) achieve remarkable progress, yet there remains significant uncertainty regarding their ability to accurately apprehend visual details, that is, in performing detailed captioning. To address this, we introduce $\textit{CCEval}$, a GPT-4 assisted evaluation method for detailed captioning. Interestingly, while LMMs demonstrate minimal object existence hallucinat… ▽ More

    Submitted 28 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Our code is publicly available at https://github.com/bronyayang/HallE_Control

  33. arXiv:2309.14525  [pdf, other

    cs.CV cs.CL

    Aligning Large Multimodal Models with Factually Augmented RLHF

    Authors: Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

    Abstract: Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context. To address the multimodal misalignment issue, we adapt the Reinforcement Learning from Human Feedback (RLHF) from the text domain to the task of vision-language alignment, wher… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Preprint

  34. arXiv:2308.10515  [pdf, other

    cs.CV

    QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

    Authors: Yifan Zhang, Zhen Dong, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yuan Du, Kurt Keutzer, Li Du, Shanghang Zhang

    Abstract: Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements. However, the huge memory consumption of state-of-the-art models makes it hard to deploy them on vehicles, and the non-trivial latency will affect the real-time perception of streaming applications. Despite the wide application of quantization to lighten models, we show in our paper that directly ap… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 Accept

  35. arXiv:2307.14620  [pdf, other

    cs.CV

    NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

    Authors: Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

    Abstract: We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Unlike existing indoor 3D detection methods that struggle to model scene geometry, our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimiz… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  36. arXiv:2306.07629  [pdf, other

    cs.CL cs.LG

    SqueezeLLM: Dense-and-Sparse Quantization

    Authors: Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer

    Abstract: Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models.… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: ICML 2024

  37. arXiv:2306.00258  [pdf, other

    cs.LG math.NA

    Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

    Authors: Shashank Subramanian, Peter Harrington, Kurt Keutzer, Wahid Bhimji, Dmitriy Morozov, Michael Mahoney, Amir Gholami

    Abstract: Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: 16 pages, 11 figures

    Journal ref: NeurIPS 2023

  38. arXiv:2305.14705  [pdf, other

    cs.CL

    Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models

    Authors: Sheng Shen, Le Hou, Yanqi Zhou, Nan Du, Shayne Longpre, Jason Wei, Hyung Won Chung, Barret Zoph, William Fedus, Xinyun Chen, Tu Vu, Yuexin Wu, Wuyang Chen, Albert Webson, Yunxuan Li, Vincent Zhao, Hongkun Yu, Kurt Keutzer, Trevor Darrell, Denny Zhou

    Abstract: Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnable parameters to Large Language Models (LLMs) without increasing inference cost. Instruction tuning is a technique for training LLMs to follow instructions. We advocate combining these two approaches, as we find that MoE models benefit more from instruction tuning than dense models. In particular, we… ▽ More

    Submitted 5 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Preprint

  39. arXiv:2304.14340  [pdf, other

    cs.CV

    SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

    Authors: Yichen Xie, Chenfeng Xu, Marie-Julie Rakotosaona, Patrick Rim, Federico Tombari, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

    Abstract: By identifying four important components of existing LiDAR-camera 3D object detection methods (LiDAR and camera candidates, transformation, and fusion outputs), we observe that all existing methods either find dense candidates or yield dense representations of scenes. However, given that objects occupy only a small part of a scene, finding dense candidates and generating dense representations is n… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  40. arXiv:2304.14190  [pdf, other

    cs.CV

    Quadric Representations for LiDAR Odometry, Mapping and Localization

    Authors: Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

    Abstract: Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks. However, the space-inefficiency of methods that use point-wise representations limits their development and usage in practical applications. In particular, scan-submap matching and global map representation methods are restricted by the in… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

  41. arXiv:2304.00788   

    cs.CV

    Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

    Authors: Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang

    Abstract: The goal of open-vocabulary detection is to identify novel objects based on arbitrary textual descriptions. In this paper, we address open-vocabulary 3D point-cloud detection by a dividing-and-conquering strategy, which involves: 1) developing a point-cloud detector that can learn a general representation for localizing various objects, and 2) connecting textual and point-cloud representations to… ▽ More

    Submitted 16 May, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: I want to update this manuscript

  42. arXiv:2303.07226  [pdf, other

    cs.CV cs.CL

    Scaling Vision-Language Models with Sparse Mixture of Experts

    Authors: Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He

    Abstract: The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as these models become larger and more complex, they also become more challenging to… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Preprint

  43. arXiv:2302.14017  [pdf, other

    cs.CL cs.LG

    Full Stack Optimization of Transformer Inference: a Survey

    Authors: Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami

    Abstract: Recent advances in state-of-the-art DNN architecture design have been moving toward Transformer models. These models achieve superior accuracy across a wide range of applications. This trend has been consistent over the past several years since Transformer models were originally introduced. However, the amount of compute and bandwidth required for inference of recent Transformer models is growing… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Journal ref: Presented in Workshop on Architecture and System Support for Transformer Models (ASSYST) at ISCA 2023

  44. arXiv:2302.07863  [pdf, other

    cs.CL

    Speculative Decoding with Big Little Decoder

    Authors: Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer

    Abstract: The recent emergence of Large Language Models based on the Transformer architecture has enabled dramatic advancements in the field of Natural Language Processing. However, these models have long inference latency, which limits their deployment and makes them prohibitively expensive for various real-time applications. The inference latency is further exacerbated by autoregressive generative tasks,… ▽ More

    Submitted 12 October, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  45. arXiv:2302.04304  [pdf, other

    cs.CV cs.LG

    Q-Diffusion: Quantizing Diffusion Models

    Authors: Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer

    Abstract: Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model hinder the efficient adoption of diffusion models. Although post-training quantization (PTQ) is considered a go-to compression method for other tasks, it does not… ▽ More

    Submitted 8 June, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: The code is available at https://github.com/Xiuyu-Li/q-diffusion

  46. arXiv:2212.14532  [pdf, other

    cs.CV

    Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

    Authors: Colorado J. Reed, Ritwik Gupta, Shufan Li, Sarah Brockman, Christopher Funk, Brian Clipp, Kurt Keutzer, Salvatore Candido, Matt Uyttendaele, Trevor Darrell

    Abstract: Large, pretrained models are commonly finetuned with imagery that is heavily augmented to mimic different conditions and scales, with the resulting models used for various tasks with imagery from a range of spatial scales. Such models overlook scale-specific information in the data for scale-dependent domains, such as remote sensing. In this paper, we present Scale-MAE, a pretraining method that e… ▽ More

    Submitted 21 September, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: International Conference on Computer Vision 2023

  47. arXiv:2212.02770  [pdf, other

    cs.CV

    CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

    Authors: Lirui Xiao, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang

    Abstract: Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer remains challenging. Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients a… ▽ More

    Submitted 27 February, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Published as a conference paper at DAC 2023

  48. arXiv:2211.16056  [pdf, other

    cs.CV

    NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

    Authors: Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, Shanghang Zhang

    Abstract: The complicated architecture and high training cost of vision transformers urge the exploration of post-training quantization. However, the heavy-tailed distribution of vision transformer activations hinders the effectiveness of previous post-training quantization methods, even with advanced quantizer designs. Instead of tuning the quantizer to better fit the complicated activation distribution, t… ▽ More

    Submitted 19 April, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to CVPR2023

  49. arXiv:2211.11720  [pdf, other

    cs.CV cs.CL

    Multitask Vision-Language Prompt Tuning

    Authors: Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

    Abstract: Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different… ▽ More

    Submitted 5 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Preprint

  50. arXiv:2210.02443  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection

    Authors: Jinhyung Park, Chenfeng Xu, Shijia Yang, Kurt Keutzer, Kris Kitani, Masayoshi Tomizuka, Wei Zhan

    Abstract: While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception. Observing that existing works' fusion of multi-frame images are instances of temporal stereo matching, we find that performance is hindered by the interplay between 1) the low granularity of matching resolut… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Code will be released at https://github.com/Divadi/SOLOFusion