Skip to main content

Showing 1–19 of 19 results for author: Awan, A A

  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2401.08671  [pdf, other

    cs.PF cs.LG

    DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

    Authors: Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He

    Abstract: The deployment and scaling of large language models (LLMs) have become critical as they permeate various applications, demanding high-throughput and low-latency serving systems. Existing frameworks struggle to balance these requirements, especially for workloads with long prompts. This paper introduces DeepSpeed-FastGen, a system that employs Dynamic SplitFuse, a novel prompt and generation compos… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  3. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  4. arXiv:2309.14327  [pdf, other

    cs.CV cs.CL

    DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention

    Authors: Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He

    Abstract: Most of the existing multi-modal models, hindered by their incapacity to adeptly manage interleaved image-and-text inputs in multi-image, multi-round dialogues, face substantial constraints in resource allocation for training and data accessibility, impacting their adaptability and scalability across varied interaction realms. To address this, we present the DeepSpeed-VisualChat framework, designe… ▽ More

    Submitted 29 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  5. arXiv:2308.01320  [pdf, other

    cs.LG cs.AI cs.CL

    DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

    Authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He

    Abstract: ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 14 pages, 7 figures

  6. arXiv:2303.08374  [pdf, other

    cs.DC cs.LG

    MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

    Authors: Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar Panda

    Abstract: In recent years, the training requirements of many state-of-the-art Deep Learning (DL) models have scaled beyond the compute and memory capabilities of a single processor, and necessitated distribution among processors. Training such massive models necessitates advanced parallelism strategies to maintain efficiency. However, such distributed DL parallelism strategies require a varied mixture of co… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted, to be presented at IPDPS 2023

  7. arXiv:2303.06318  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

    Authors: Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele

    Abstract: Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert blocks to a base model, increasing the number of parameters without impacting computational costs. However, current distributed deep learning frameworks are limited in their ability to train high-quality MoE models with large base models. In this work, we present DeepSpeed-TED, a novel, three-dimensional,… ▽ More

    Submitted 13 May, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

  8. arXiv:2207.00032  [pdf, other

    cs.LG cs.DC cs.PF

    DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

    Authors: Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He

    Abstract: The past several years have witnessed the success of transformer-based models, and their scale and application scenarios continue to grow aggressively. The current landscape of transformer models is increasingly diverse: the model size varies drastically with the largest being of hundred-billion parameters; the model characteristics differ due to the sparsity introduced by the Mixture-of-Experts;… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

  9. arXiv:2201.05596  [pdf, other

    cs.LG cs.AI cs.DC

    DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

    Authors: Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He

    Abstract: As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost reduction compared to a quality-equivalent dense model. Its training cost saving is demonstrated from encoder-decoder models (prior works) to a 5x savin… ▽ More

    Submitted 21 July, 2022; v1 submitted 14 January, 2022; originally announced January 2022.

    Comments: This paper is published at ICML 2022: https://proceedings.mlr.press/v162/rajbhandari22a

  10. arXiv:2109.10465  [pdf, other

    cs.CL cs.AI cs.LG

    Scalable and Efficient MoE Training for Multitask Multilingual Models

    Authors: Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

    Abstract: The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE tra… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  11. arXiv:2104.06069  [pdf, other

    cs.LG cs.DC

    1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

    Authors: Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He

    Abstract: To train large models (like BERT and GPT-3) on hundreds of GPUs, communication has become a major bottleneck, especially on commodity systems with limited-bandwidth TCP network. On one side large batch-size optimization such as LAMB algorithm was proposed to reduce the frequency of communication. On the other side, communication compression algorithms such as 1-bit Adam help to reduce the volume o… ▽ More

    Submitted 5 October, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

  12. arXiv:2102.02888  [pdf, other

    cs.LG cs.DC

    1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

    Authors: Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He

    Abstract: Scalable training of large models (like BERT and GPT-3) requires careful optimization rooted in model design, architecture, and system capabilities. From a system standpoint, communication has become a major bottleneck, especially on commodity systems with standard TCP interconnects that offer limited network bandwidth. Communication compression is an important technique to reduce training time on… ▽ More

    Submitted 29 June, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:2008.11343

  13. arXiv:1911.05146  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow

    Authors: Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: To reduce training time of large-scale DNNs, scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has been extensively studied and developed, several problems exist in realizing model-parallelism and hybrid-parallelism efficiently. Four major problems we focus on are: 1) defining a notion of a distrib… ▽ More

    Submitted 19 February, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: 18 pages, 10 figures, Accepted, to be presented at ISC '20

  14. Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

    Authors: Ammar Ahmad Awan, Jeroen Bedorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google… ▽ More

    Submitted 25 October, 2018; originally announced October 2018.

    Comments: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-review

    Journal ref: IEEE CCGrid, 2019

  15. arXiv:1808.00878  [pdf

    cs.LG cs.CV eess.SP stat.ML

    Supervised classification for object identification in urban areas using satellite imagery

    Authors: Hazrat Ali, Adnan Ali Awan, Sanaullah Khan, Omer Shafique, Atiq ur Rahman, Shahid Khan

    Abstract: This paper presents a useful method to achieve classification in satellite imagery. The approach is based on pixel level study employing various features such as correlation, homogeneity, energy and contrast. In this study gray-scale images are used for training the classification model. For supervised classification, two classification techniques are employed namely the Support Vector Machine (SV… ▽ More

    Submitted 2 August, 2018; originally announced August 2018.

    Comments: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)

    Journal ref: H. Ali et al., 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, 2018, pp. 1-4

  16. arXiv:1707.09414  [pdf, other

    cs.DC

    Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

    Authors: Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda

    Abstract: Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled w… ▽ More

    Submitted 28 July, 2017; originally announced July 2017.

    Comments: 8 pages, 3 figures

  17. arXiv:1309.4374  [pdf

    cs.NI

    An Energy Efficient Decoding Scheme for Wireless Body Area Sensor Networks

    Authors: O. Rehman, N. Javaid, A. Haider, N. Amjad, A. A. Awan, M. Qamar, Z. A. Khan, U. Qasim

    Abstract: One of the major challenges in Wireless Body Area Networks (WBANs) is to prolong the lifetime of network. Traditional research work focuses on minimizing transmit power; however, in the case of short range communication the consumption power in decoding is significantly larger than transmit power. This paper investigates the minimization of total power consumption by reducing the decoding power co… ▽ More

    Submitted 17 September, 2013; originally announced September 2013.

    Comments: Journal of Basic and Applied Scientific Research. 2013. arXiv admin note: substantial text overlap with arXiv:1309.0752

  18. DREEM-ME: Distributed Regional Energy Efficient Multi-hop Routing Protocol based on Maximum Energy in WSNs

    Authors: N. Amjad, N. Javaid, A. Haider, A. A. Awan, M. Rahman

    Abstract: Wireless distributed sensor network consists of randomly deployed sensors having low energy assets. These networks can be used for monitoring a variety of environments. Major problems of these networks are energy constraints and their finite lifetimes. To overcome these problems different routing protocols and clustering techniques are introduced. We propose DREEM-ME which uses a unique technique… ▽ More

    Submitted 26 July, 2013; originally announced July 2013.

    Comments: IEEE 8th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA'13), Compiegne, France

  19. REECH-ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol for WSNs

    Authors: A. Haider, N. Javaid, N. Amjad, A. A. Awan, A. Khan, N. Khan

    Abstract: In this paper, we propose Regional Energy Efficient Cluster Heads based on Maximum Energy (REECH-ME) Routing Protocol for Wireless Sensor Networks (WSNs) . The main purpose of this protocol is to improve the network lifetime and particularly the stability period of the network. In REECH-ME, the node with the maximum energy in a region becomes Cluster Head (CH) of that region for that particular ro… ▽ More

    Submitted 26 July, 2013; originally announced July 2013.

    Comments: IEEE 8th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA'13), Compiegne, France