Skip to main content

Showing 1–23 of 23 results for author: Vemulapalli, R

  1. arXiv:2407.09435  [pdf, other

    cs.AI

    MUSCLE: A Model Update Strategy for Compatible LLM Evolution

    Authors: Jessica Echterhoff, Fartash Faghri, Raviteja Vemulapalli, Ting-Yao Hu, Chun-Liang Li, Oncel Tuzel, Hadi Pouransari

    Abstract: Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they ar… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2312.09299  [pdf, other

    cs.LG cs.CL cs.CV

    Weight subcloning: direct initialization of transformers using larger pretrained ones

    Authors: Mohammad Samragh, Mehrdad Farajtabar, Sachin Mehta, Raviteja Vemulapalli, Fartash Faghri, Devang Naik, Oncel Tuzel, Mohammad Rastegari

    Abstract: Training large transformer models from scratch for a target task requires lots of data and is computationally demanding. The usual practice of transfer learning overcomes this challenge by initializing the model with weights of a pretrained model of the same size and specification to increase the convergence and training speed. However, what if no pretrained model of the required size is available… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  3. arXiv:2311.18237  [pdf, other

    cs.CV cs.LG

    Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

    Authors: Raviteja Vemulapalli, Hadi Pouransari, Fartash Faghri, Sachin Mehta, Mehrdad Farajtabar, Mohammad Rastegari, Oncel Tuzel

    Abstract: Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, "How can we leverage the knowledge from a large VFM to… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: International Conference on Machine Learning, 2024

  4. arXiv:2311.18168  [pdf, other

    cs.CV cs.LG eess.AS

    Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

    Authors: Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

    Abstract: We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  5. arXiv:2311.17049  [pdf, other

    cs.CV cs.CL cs.LG

    MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

    Authors: Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel

    Abstract: Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream tasks. However, these models utilize large transformer-based encoders with significant memory and latency overhead which pose challenges for deployment on mobile devices. In this work, we introduce MobileCLIP -- a new family of ef… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  6. arXiv:2310.16226  [pdf, other

    cs.CV cs.CL cs.LG

    TiC-CLIP: Continual Training of CLIP Models

    Authors: Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri

    Abstract: Keeping large foundation models up to date on latest data is inherently expensive. To avoid the prohibitive costs of constantly retraining, it is imperative to continually train these models. This problem is exacerbated by the lack of any large scale continual learning benchmarks or baselines. We introduce the first set of web-scale Time-Continual (TiC) benchmarks for training vision-language mode… ▽ More

    Submitted 21 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  7. arXiv:2310.15308  [pdf, other

    cs.CV cs.LG

    SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

    Authors: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari

    Abstract: The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficient… ▽ More

    Submitted 10 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  8. arXiv:2310.14108  [pdf, other

    cs.LG cs.AI cs.CV

    CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

    Authors: Mohammadreza Salehi, Mehrdad Farajtabar, Maxwell Horton, Fartash Faghri, Hadi Pouransari, Raviteja Vemulapalli, Oncel Tuzel, Ali Farhadi, Mohammad Rastegari, Sachin Mehta

    Abstract: Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual represent… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  9. arXiv:2309.10707  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

    Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

    Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  10. arXiv:2309.05213  [pdf, other

    cs.LG cs.AI cs.DC

    Towards Federated Learning Under Resource Constraints via Layer-wise Training and Depth Dropout

    Authors: Pengfei Guo, Warren Richard Morningstar, Raviteja Vemulapalli, Karan Singhal, Vishal M. Patel, Philip Andrew Mansfield

    Abstract: Large machine learning models trained on diverse data have recently seen unprecedented success. Federated learning enables training on private data that may otherwise be inaccessible, such as domain-specific datasets decentralized across many clients. However, federated learning can be difficult to scale to large models when clients have limited resources. This challenge often results in a trade-o… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  11. arXiv:2210.00092  [pdf, other

    cs.LG cs.CV

    Federated Training of Dual Encoding Models on Small Non-IID Client Datasets

    Authors: Raviteja Vemulapalli, Warren Richard Morningstar, Philip Andrew Mansfield, Hubert Eichner, Karan Singhal, Arash Afkanpour, Bradley Green

    Abstract: Dual encoding models that encode a pair of inputs are widely used for representation learning. Many approaches train dual encoding models by maximizing agreement between pairs of encodings on centralized training data. However, in many scenarios, datasets are inherently decentralized across many clients (user devices or organizations) due to privacy concerns, motivating federated learning. In this… ▽ More

    Submitted 10 April, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: ICLR 2023 Workshop on Pitfalls of Limited Data and Computation for Trustworthy ML

  12. arXiv:2104.12835  [pdf, other

    cs.CV cs.AI cs.LG

    Less is more: Selecting informative and diverse subsets with balancing constraints

    Authors: Srikumar Ramalingam, Daniel Glasner, Kaushal Patel, Raviteja Vemulapalli, Sadeep Jayasumana, Sanjiv Kumar

    Abstract: Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost. Most models require enormous resources during training, both in terms of computation and in human labeling effort. We show that we can identify informative and diverse subsets of data that lead to deep learning models with similar performance as the ones trained with the… ▽ More

    Submitted 8 October, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: Added error bars to the experiments

  13. arXiv:2104.07608  [pdf, other

    cs.CV

    Camera View Adjustment Prediction for Improving Image Composition

    Authors: Yu-Chuan Su, Raviteja Vemulapalli, Ben Weiss, Chun-Te Chu, Philip Andrew Mansfield, Lior Shapira, Colvin Pitts

    Abstract: Image composition plays an important role in the quality of a photo. However, not every camera user possesses the knowledge and expertise required for capturing well-composed photos. While post-capture cropping can improve the composition sometimes, it does not work in many common scenarios in which the photographer needs to adjust the camera view to capture the best shot. To address this issue, w… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  14. arXiv:2012.06985  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Learning for Label-Efficient Semantic Segmentation

    Authors: Xiangyun Zhao, Raviteja Vemulapalli, Philip Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

    Abstract: Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases. This happen… ▽ More

    Submitted 18 August, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: International Conference on Computer Vision (ICCV), 2021

  15. arXiv:2010.07811  [pdf, other

    cs.CV

    Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze

    Authors: Bardia Doosti, Ching-Hui Chen, Raviteja Vemulapalli, Xuhui Jia, Yukun Zhu, Bradley Green

    Abstract: Mutual gaze detection, i.e., predicting whether or not two people are looking at each other, plays an important role in understanding human interactions. In this work, we focus on the task of image-based mutual gaze detection, and propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase. We achieve the performance boos… ▽ More

    Submitted 22 December, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

  16. arXiv:2010.03019  [pdf, other

    cs.CV cs.LG

    Global Self-Attention Networks for Image Recognition

    Authors: Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui Chen

    Abstract: Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention. However, due to the quadratic computational and memory complexities of self-attention, these works either apply attention only to low-resolution feature maps in later stages of a deep network or restrict the receptive field of attention in each layer to a… ▽ More

    Submitted 14 October, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

  17. arXiv:1911.09074  [pdf, other

    cs.CV cs.LG

    Search to Distill: Pearls are Everywhere but not the Eyes

    Authors: Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

    Abstract: Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture. However, the knowledge of a neural network, which is represented by the network's output distribution conditioned on its input, depends not only on its parameters but also on its architecture. Hence, a more generalized approach… ▽ More

    Submitted 16 March, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: Accepted as an oral representation to CVPR 2020

  18. arXiv:1811.11283  [pdf, other

    cs.CV cs.AI

    A Compact Embedding for Facial Expression Similarity

    Authors: Raviteja Vemulapalli, Aseem Agarwala

    Abstract: Most of the existing work on automatic facial expression analysis focuses on discrete emotion recognition, or facial action unit detection. However, facial expressions do not always fall neatly into pre-defined semantic categories. Also, the similarity between expressions measured in the action unit space need not correspond to how humans perceive expression similarity. Different from previous wor… ▽ More

    Submitted 9 January, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

  19. arXiv:1801.04590  [pdf, other

    cs.CV cs.AI stat.ML

    Frame-Recurrent Video Super-Resolution

    Authors: Mehdi S. M. Sajjadi, Raviteja Vemulapalli, Matthew Brown

    Abstract: Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire… ▽ More

    Submitted 25 March, 2018; v1 submitted 14 January, 2018; originally announced January 2018.

    Comments: Accepted at CVPR 2018

  20. arXiv:1702.01499  [pdf, other

    cs.CV

    Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation

    Authors: Kota Hara, Raviteja Vemulapalli, Rama Chellappa

    Abstract: Deep Convolutional Neural Networks (DCNN) have been proven to be effective for various computer vision problems. In this work, we demonstrate its effectiveness on a continuous object orientation estimation task, which requires prediction of 0 to 360 degrees orientation of the objects. We do so by proposing and comparing three continuous orientation prediction approaches designed for the DCNNs. The… ▽ More

    Submitted 6 February, 2017; originally announced February 2017.

  21. arXiv:1511.04067  [pdf, other

    cs.CV

    Deep Gaussian Conditional Random Field Network: A Model-based Deep Network for Discriminative Denoising

    Authors: Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu

    Abstract: We propose a novel deep network architecture for image\\ denoising based on a Gaussian Conditional Random Field (GCRF) model. In contrast to the existing discriminative denoising methods that train a separate model for each noise level, the proposed deep network explicitly models the input noise variance and hence is capable of handling a range of noise levels. Our deep network, which we refer to… ▽ More

    Submitted 12 November, 2015; originally announced November 2015.

    Comments: 10 pages, 5 figures

  22. arXiv:1501.02393  [pdf, ps, other

    cs.CV cs.LG

    Riemannian Metric Learning for Symmetric Positive Definite Matrices

    Authors: Raviteja Vemulapalli, David W. Jacobs

    Abstract: Over the past few years, symmetric positive definite (SPD) matrices have been receiving considerable attention from computer vision community. Though various distance measures have been proposed in the past for comparing SPD matrices, the two most widely-used measures are affine-invariant distance and log-Euclidean distance. This is because these two measures are true geodesic distances induced by… ▽ More

    Submitted 10 January, 2015; originally announced January 2015.

  23. arXiv:1410.4470  [pdf, other

    cs.CV cs.LG

    MKL-RT: Multiple Kernel Learning for Ratio-trace Problems via Convex Optimization

    Authors: Raviteja Vemulapalli, Vinay Praneeth Boda, Rama Chellappa

    Abstract: In the recent past, automatic selection or combination of kernels (or features) based on multiple kernel learning (MKL) approaches has been receiving significant attention from various research communities. Though MKL has been extensively studied in the context of support vector machines (SVM), it is relatively less explored for ratio-trace problems. In this paper, we show that MKL can be formulat… ▽ More

    Submitted 17 October, 2014; v1 submitted 16 October, 2014; originally announced October 2014.