Skip to main content

Showing 1–17 of 17 results for author: Javaheripi, M

  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2306.11644  [pdf, other

    cs.CL cs.AI cs.LG

    Textbooks Are All You Need

    Authors: Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li

    Abstract: We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accu… ▽ More

    Submitted 2 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 26 pages; changed color scheme of plot. fixed minor typos and added couple clarifications

  3. arXiv:2304.01441  [pdf, other

    eess.IV cs.CR cs.CV

    NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression

    Authors: Jung-Woo Chang, Nojan Sheybani, Shehzeen Samarah Hussain, Mojan Javaheripi, Seira Hidano, Farinaz Koushanfar

    Abstract: Video compression plays a significant role in IoT devices for the efficient transport of visual data while satisfying all underlying bandwidth constraints. Deep learning-based video compression methods are rapidly replacing traditional algorithms and providing state-of-the-art results on edge devices. However, recently developed adversarial attacks demonstrate that digitally crafted perturbations… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: 8 pages; Accepted to ICLR 2023 ML4IoT workshop

  4. arXiv:2206.12100  [pdf, other

    cs.LG cs.CR

    zPROBE: Zero Peek Robustness Checks for Federated Learning

    Authors: Zahra Ghodsi, Mojan Javaheripi, Nojan Sheybani, Xinqiao Zhang, Ke Huang, Farinaz Koushanfar

    Abstract: Privacy-preserving federated learning allows multiple users to jointly train a model with coordination of a central server. The server only learns the final aggregation result, thus the users' (private) training data is not leaked from the individual model updates. However, keeping the individual updates private allows malicious users to perform Byzantine attacks and degrade the accuracy without b… ▽ More

    Submitted 5 September, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: ICCV 2023

  5. RoVISQ: Reduction of Video Service Quality via Adversarial Attacks on Deep Learning-based Video Compression

    Authors: Jung-Woo Chang, Mojan Javaheripi, Seira Hidano, Farinaz Koushanfar

    Abstract: Video compression plays a crucial role in video streaming and classification systems by maximizing the end-user quality of experience (QoE) at a given bandwidth budget. In this paper, we conduct the first systematic study for adversarial attacks on deep learning-based video compression and downstream classification systems. Our attack framework, dubbed RoVISQ, manipulates the Rate-Distortion (… ▽ More

    Submitted 8 December, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted at NDSS 2023

  6. arXiv:2203.02094  [pdf, other

    cs.LG cs.CL

    LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

    Authors: Mojan Javaheripi, Gustavo H. de Rosa, Subhabrata Mukherjee, Shital Shah, Tomasz L. Religa, Caio C. T. Mendes, Sebastien Bubeck, Farinaz Koushanfar, Debadeepta Dey

    Abstract: The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak memory utilization and latency is non-trivial. This is exacerbated by the proliferation of various hardware. We leverage the somewhat surprising empir… ▽ More

    Submitted 17 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  7. arXiv:2111.03985  [pdf

    cs.LG cs.ET

    Machine Learning-Assisted E-jet Printing of Organic Flexible Biosensors

    Authors: Mehran Abbasi Shirsavar, Mehrnoosh Taghavimehr, Lionel J. Ouedraogo, Mojan Javaheripi, Nicole N. Hashemi, Farinaz Koushanfar, Reza Montazami

    Abstract: Electrohydrodynamic-jet (e-jet) printing technique enables the high-resolution printing of complex soft electronic devices. As such, it has an unmatched potential for becoming the conventional technique for printing soft electronic devices. In this study, the electrical conductivity of the e-jet printed circuits was studied as a function of key printing parameters (nozzle speed, ink flow rate, and… ▽ More

    Submitted 6 November, 2021; originally announced November 2021.

  8. arXiv:2111.01932  [pdf, other

    cs.CR cs.AI cs.LG

    HASHTAG: Hash Signatures for Online Detection of Fault-Injection Attacks on Deep Neural Networks

    Authors: Mojan Javaheripi, Farinaz Koushanfar

    Abstract: We propose HASHTAG, the first framework that enables high-accuracy detection of fault-injection attacks on Deep Neural Networks (DNNs) with provable bounds on detection performance. Recent literature in fault-injection attacks shows the severe DNN accuracy degradation caused by bit flips. In this scenario, the attacker changes a few weight bits during DNN execution by tampering with the program's… ▽ More

    Submitted 2 November, 2021; originally announced November 2021.

  9. arXiv:2109.02836  [pdf, other

    cs.LG

    Trojan Signatures in DNN Weights

    Authors: Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar, Tara Javidi

    Abstract: Deep neural networks have been shown to be vulnerable to backdoor, or trojan, attacks where an adversary has embedded a trigger in the network at training time such that the model correctly classifies all standard inputs, but generates a targeted, incorrect classification on any input which contains the trigger. In this paper, we present the first ultra light-weight and highly effective trojan det… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 8 pages, 13 figures

  10. arXiv:2009.02326  [pdf, other

    cs.LG cs.AR cs.CR cs.CV stat.ML

    CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

    Authors: Mojan Javaheripi, Mohammad Samragh, Gregory Fields, Tara Javidi, Farinaz Koushanfar

    Abstract: We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the groun… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

  11. arXiv:2007.00051  [pdf, other

    cs.LG stat.ML

    Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

    Authors: Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel

    Abstract: Knowledge distillation has been used to transfer knowledge learned by a sophisticated model (teacher) to a simpler model (student). This technique is widely used to compress model complexity. However, in most applications the compressed student model suffers from an accuracy gap with its teacher. We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (… ▽ More

    Submitted 20 November, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

  12. arXiv:2004.04249  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    GeneCAI: Genetic Evolution for Acquiring Compact AI

    Authors: Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar

    Abstract: In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy such compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyper-parameters that require per-layer customization to ensure high accuracy.… ▽ More

    Submitted 14 April, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

  13. FastWave: Accelerating Autoregressive Convolutional Neural Networks on FPGA

    Authors: Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, Farinaz Koushanfar

    Abstract: Autoregressive convolutional neural networks (CNNs) have been widely exploited for sequence generation tasks such as audio synthesis, language modeling and neural machine translation. WaveNet is a deep autoregressive CNN composed of several stacked layers of dilated convolution that is used for sequence generation. While WaveNet produces state-of-the art audio generation results, the naive inferen… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: Published as a conference paper at ICCAD 2019

    Journal ref: @inproceedings {1143,booktitle = {IEEE/ACM 2019 International Conference On Computer Aided Design (ICCAD)},year = {2019},month = {November}}

  14. arXiv:1911.06471  [pdf, other

    cs.LG cs.NE stat.ML

    ASCAI: Adaptive Sampling for acquiring Compact AI

    Authors: Mojan Javaheripi, Mohammad Samragh, Tara Javidi, Farinaz Koushanfar

    Abstract: This paper introduces ASCAI, a novel adaptive sampling methodology that can learn how to effectively compress Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing such hyperparameters is cumbersome as the pertinent search spac… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

  15. arXiv:1904.04862  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    SWNet: Small-World Neural Networks and Rapid Convergence

    Authors: Mojan Javaheripi, Bita Darvish Rouhani, Farinaz Koushanfar

    Abstract: Training large and highly accurate deep learning (DL) models is computationally costly. This cost is in great part due to the excessive number of trained parameters, which are well-known to be redundant and compressible for the execution phase. This paper proposes a novel transformation which changes the topology of the DL architecture such that it reaches an optimal cross-layer connectivity. This… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

  16. arXiv:1901.05582  [pdf, other

    cs.LG stat.ML

    CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs

    Authors: Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar

    Abstract: This paper proposes CodeX, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. CodeX incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execut… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

  17. arXiv:1709.02538  [pdf, other

    cs.CR cs.LG stat.ML

    DeepFense: Online Accelerated Defense Against Adversarial Deep Learning

    Authors: Bita Darvish Rouhani, Mohammad Samragh, Mojan Javaheripi, Tara Javidi, Farinaz Koushanfar

    Abstract: Recent advances in adversarial Deep Learning (DL) have opened up a largely unexplored surface for malicious attacks jeopardizing the integrity of autonomous DL systems. With the wide-spread usage of DL in critical and time-sensitive applications, including unmanned vehicles, drones, and video surveillance systems, online detection of malicious inputs is of utmost importance. We propose DeepFense,… ▽ More

    Submitted 20 August, 2018; v1 submitted 8 September, 2017; originally announced September 2017.

    Comments: Adding hardware acceleration for real-time execution of defender modules