Skip to main content

Showing 1–21 of 21 results for author: Fedorov, I

  1. arXiv:2407.06167  [pdf, other

    cs.CV cs.LG

    DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

    Authors: Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov

    Abstract: CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment constraints. As the number of deployment scenarios grows, there is a need to find scalable solutions to design and train specialized CNNs. Once-for-all tr… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to the 18th European Conference on Computer Vision (ECCV 2024)

  2. arXiv:2405.16406  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    SpinQuant: LLM quantization with learned rotations

    Authors: Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

    Abstract: Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2402.14905  [pdf, other

    cs.LG cs.AI cs.CL

    MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

    Authors: Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra

    Abstract: This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Code is available at https://github.com/facebookresearch/MobileLLM

  4. arXiv:2311.13169  [pdf, other

    cs.LG cs.AI

    SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

    Authors: Hua Zheng, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Wen-Yen Chen, Wei Wen

    Abstract: Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) inability to use the informati… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 24 pages, 7 figures

  5. arXiv:2311.08430  [pdf, other

    cs.LG cs.AI cs.IR

    Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

    Authors: Wei Wen, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Hang Yin, Weiwei Chu, Kaveh Hassani, Mengying Sun, Jiang Liu, Xu Wang, Lin Jiang, Yuxin Chen, Buyun Zhang, Xi Liu, Dehua Cheng, Zhengxing Chen, Guang Zhao, Fangqiu Han, Jiyan Yang, Yuchen Hao, Liang Xiong, Wen-Yen Chen

    Abstract: Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Wei Wen and Kuang-Hung Liu contribute equally

  6. arXiv:2311.00231  [pdf, other

    cs.IR cs.LG

    DistDNAS: Search Efficient Feature Interactions within 2 Hours

    Authors: Tunhou Zhang, Wei Wen, Igor Fedorov, Xi Liu, Buyun Zhang, Fangqiu Han, Wen-Yen Chen, Yiping Han, Feng Yan, Hai Li, Yiran Chen

    Abstract: Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. On large-scale benchmarks, searching for the optimal feature interaction design requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathemati… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  7. arXiv:2301.10999  [pdf, other

    cs.LG cs.PF

    PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

    Authors: Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough

    Abstract: The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability is critical for the (manual or automatic) design, optimization, and deployment of practical DNNs for a specific hardware deployment platform. Unfortuna… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  8. arXiv:2208.08562  [pdf, other

    cs.CV cs.AI stat.ML

    Restructurable Activation Networks

    Authors: Kartikeya Bhardwaj, James Ward, Caleb Tung, Dibakar Gope, Lingchuan Meng, Igor Fedorov, Alex Chalfin, Paul Whatmough, Danny Loh

    Abstract: Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency. First, we propose RAN-explicit (RAN-e) -- a new hardware-aware search sp… ▽ More

    Submitted 7 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: This work was presented at an Arm AI virtual tech talk. Video is available at https://www.youtube.com/watch?v=EUqFNE28Kq4

  9. arXiv:2202.13826  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Magnitude-aware Probabilistic Speaker Embeddings

    Authors: Nikita Kuzmin, Igor Fedorov, Alexey Sholokhov

    Abstract: Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality o… ▽ More

    Submitted 23 October, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Accepted to Odyssey 2022: The Speaker and Language Recognition Workshop, camera-ready version

  10. arXiv:2201.05842  [pdf, other

    cs.LG

    UDC: Unified DNAS for Compressible TinyML Models

    Authors: Igor Fedorov, Ramon Matas, Hokchhay Tann, Chuteng Zhou, Matthew Mattina, Paul Whatmough

    Abstract: Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to fit more parameters in the same footprint. However, designing compressible neural networks (NNs) is challenging, as it expands the design space across… ▽ More

    Submitted 5 January, 2023; v1 submitted 15 January, 2022; originally announced January 2022.

  11. arXiv:2010.11267  [pdf, other

    cs.LG

    MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

    Authors: Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas Navarro, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, Paul N. Whatmough

    Abstract: Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models tha… ▽ More

    Submitted 12 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: 10 pages, 8 figures, 3 tables

  12. arXiv:2005.11394  [pdf, other

    cs.LG stat.ML

    MANGO: A Python Library for Parallel Hyperparameter Tuning

    Authors: Sandeep Singh Sandha, Mohit Aggarwal, Igor Fedorov, Mani Srivastava

    Abstract: Tuning hyperparameters for machine learning algorithms is a tedious task, one that is typically done manually. To enable automated hyperparameter tuning, recent works have started to use techniques based on Bayesian optimization. However, to practically enable automated tuning for large scale machine learning training pipelines, significant gaps remain in existing libraries, including lack of abst… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

    Comments: 5 pages, 3 figures, ICASSP Conference

  13. arXiv:2005.11138  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids

    Authors: Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, Paul N. Whatmough

    Abstract: Modern speech enhancement algorithms achieve remarkable noise suppression by means of large recurrent neural networks (RNNs). However, large RNNs limit practical deployment in hearing aid hardware (HW) form-factors, which are battery powered and run on resource-constrained microcontroller units (MCUs) with limited memory capacity and compute capability. In this work, we use model compression techn… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: First four authors contributed equally. For audio samples, see https://github.com/BoseCorp/efficient-neural-speech-enhancement

  14. arXiv:1910.02558  [pdf, other

    cs.LG stat.ML

    Pushing the limits of RNN Compression

    Authors: Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou, Ganesh Dasika, Matthew Mattina

    Abstract: Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x… ▽ More

    Submitted 9 October, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: 6 pages. arXiv admin note: substantial text overlap with arXiv:1906.02876

    Journal ref: 5th edition of Workshop on Energy Efficient Machine Learning and Cognitive Computing at NeurIPS 2019

  15. arXiv:1906.02876  [pdf, other

    cs.LG cs.NE stat.ML

    Compressing RNNs for IoT devices by 15-38x using Kronecker Products

    Authors: Urmish Thakker, Jesse Beu, Dibakar Gope, Chu Zhou, Igor Fedorov, Ganesh Dasika, Matthew Mattina

    Abstract: Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size.As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 15-38x… ▽ More

    Submitted 31 January, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

  16. arXiv:1905.12107  [pdf, ps, other

    cs.LG cs.CV

    SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

    Authors: Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough

    Abstract: The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet of Things (IoT) promises to inject machine learning into many of these every-day objects via tiny, cheap MCUs. However, these resource-impoverished hardware pl… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  17. arXiv:1804.03740  [pdf, other

    stat.ML cs.LG

    Multimodal Sparse Bayesian Dictionary Learning

    Authors: Igor Fedorov, Bhaskar D. Rao

    Abstract: This paper addresses the problem of learning dictionaries for multimodal datasets, i.e. datasets collected from multiple data sources. We present an algorithm called multimodal sparse Bayesian dictionary learning (MSBDL). MSBDL leverages information from all available data modalities through a joint sparsity constraint. The underlying framework offers a considerable amount of flexibility to practi… ▽ More

    Submitted 28 May, 2019; v1 submitted 10 April, 2018; originally announced April 2018.

  18. arXiv:1802.01616  [pdf, ps, other

    cs.LG

    Re-Weighted Learning for Sparsifying Deep Neural Networks

    Authors: Igor Fedorov, Bhaskar D. Rao

    Abstract: This paper addresses the topic of sparsifying deep neural networks (DNN's). While DNN's are powerful models that achieve state-of-the-art performance on a large number of tasks, the large number of model parameters poses serious storage and computational challenges. To combat these difficulties, a growing line of work focuses on pruning network weights without sacrificing performance. We propose a… ▽ More

    Submitted 5 February, 2018; originally announced February 2018.

  19. arXiv:1703.10645  [pdf, other

    cs.CV

    Relevance Subject Machine: A Novel Person Re-identification Framework

    Authors: Igor Fedorov, Ritwik Giri, Bhaskar D. Rao, Truong Q. Nguyen

    Abstract: We propose a novel method called the Relevance Subject Machine (RSM) to solve the person re-identification (re-id) problem. RSM falls under the category of Bayesian sparse recovery algorithms and uses the sparse representation of the input video under a pre-defined dictionary to identify the subject in the video. Our approach focuses on the multi-shot re-id problem, which is the prevalent problem… ▽ More

    Submitted 30 March, 2017; originally announced March 2017.

    Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  20. arXiv:1605.02057  [pdf, other

    cs.CV

    Robust Bayesian Method for Simultaneous Block Sparse Signal Recovery with Applications to Face Recognition

    Authors: Igor Fedorov, Ritwik Giri, Bhaskar D. Rao, Truong Q. Nguyen

    Abstract: In this paper, we present a novel Bayesian approach to recover simultaneously block sparse signals in the presence of outliers. The key advantage of our proposed method is the ability to handle non-stationary outliers, i.e. outliers which have time varying support. We validate our approach with empirical results showing the superiority of the proposed method over competing approaches in synthetic… ▽ More

    Submitted 10 May, 2016; v1 submitted 6 May, 2016; originally announced May 2016.

    Comments: To appear in ICIP 2016

  21. arXiv:1601.06207  [pdf, other

    cs.LG stat.ML

    Rectified Gaussian Scale Mixtures and the Sparse Non-Negative Least Squares Problem

    Authors: Alican Nalci, Igor Fedorov, Maher Al-Shoukairi, Thomas T. Liu, Bhaskar D. Rao

    Abstract: In this paper, we develop a Bayesian evidence maximization framework to solve the sparse non-negative least squares (S-NNLS) problem. We introduce a family of probability densities referred to as the Rectified Gaussian Scale Mixture (R- GSM) to model the sparsity enforcing prior distribution for the solution. The R-GSM prior encompasses a variety of heavy-tailed densities such as the rectified Lap… ▽ More

    Submitted 27 March, 2018; v1 submitted 22 January, 2016; originally announced January 2016.

    Comments: Under Review by IEEE Transactions on Signal Processing