Skip to main content

Showing 1–17 of 17 results for author: Zhang, S Q

  1. arXiv:2405.19751  [pdf, other

    cs.CV cs.AI

    HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

    Authors: Wenxuan Liu, Sai Qian Zhang

    Abstract: Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the enhanced performance of DiTs also comes with high parameter counts and implementation costs, seriously restricting their use on resource-limited devices such as mobil… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  2. arXiv:2404.05182  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    DLoRA: Distributed Parameter-Efficient Fine-Tuning Solution for Large Language Model

    Authors: Chao Gao, Sai Qian Zhang

    Abstract: To enhance the performance of large language models (LLM) on downstream tasks, one solution is to fine-tune certain LLM parameters and make it better align with the characteristics of the training dataset. This process is commonly known as parameter-efficient fine-tuning (PEFT). Due to the scale of LLM, PEFT operations are usually executed in the public environment (e.g., cloud server). This neces… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2403.14608  [pdf, other

    cs.LG

    Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

    Authors: Zeyu Han, Chao Gao, Jinyang Liu, Jeff Zhang, Sai Qian Zhang

    Abstract: Large models represent a groundbreaking advancement in multiple application fields, enabling remarkable achievements across various tasks. However, their unprecedented scale comes with significant computational costs. These models, often consisting of billions of parameters, require vast amounts of computational resources for execution. Especially, the expansive scale and computational demands pos… ▽ More

    Submitted 12 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 42 pages, 12 figures. Due to word limit, the abstract here is truncated. The full abstract is available in the PDF

  4. arXiv:2311.17218  [pdf, other

    cs.CV cs.LG

    BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

    Authors: Yixuan Luo, Mengye Ren, Sai Qian Zhang

    Abstract: Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically dema… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  5. arXiv:2311.13290  [pdf, other

    cs.AR

    Softmax Acceleration with Adaptive Numeric Format for both Training and Inference

    Authors: Tianhua Xia, Sai Qian Zhang

    Abstract: The attention mechanism is a pivotal element within the Transformer architecture, making a substantial contribution to its exceptional performance. Within this attention mechanism, Softmax is an imperative component that enables the model to assess the degree of correlation between various segments of the input. Yet, prior research has shown that Softmax operations can significantly increase proce… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  6. arXiv:2305.03148  [pdf, other

    cs.AR cs.LG cs.NE

    CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning

    Authors: Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks

    Abstract: On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-acce… ▽ More

    Submitted 22 December, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

  7. arXiv:2207.09413  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    SphereFed: Hyperspherical Federated Learning

    Authors: Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

    Abstract: Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: European Conference on Computer Vision 2022

  8. arXiv:2201.02932  [pdf, other

    cs.LG cs.AI

    A Multi-agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning

    Authors: Sai Qian Zhang, Jieyu Lin, Qi Zhang

    Abstract: Federated learning (FL) is a training technique that enables client devices to jointly learn a shared model by aggregating locally-computed models without exposing their raw data. While most of the existing work focuses on improving the FL model accuracy, in this paper, we focus on the improving the training efficiency, which is often a hurdle for adopting FL in real-world applications. Specifical… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

    Comments: To be appeared at AAAI, 2022

  9. arXiv:2110.15456  [pdf, other

    cs.LG cs.AR

    FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

    Authors: Sai Qian Zhang, Bradley McDanel, H. T. Kung

    Abstract: Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precis… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  10. arXiv:2010.14391  [pdf, other

    cs.AI cs.LG cs.MA

    Succinct and Robust Multi-Agent Communication With Temporal Message Control

    Authors: Sai Qian Zhang, Jieyu Lin, Qi Zhang

    Abstract: Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing communication schemes often require agents to exchange an excessive number of messages at run-time under a reliable communication channel, which hinders its practicality in many real-world situations. In th… ▽ More

    Submitted 24 December, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

  11. arXiv:2007.06389  [pdf, other

    cs.CV cs.LG

    Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 13 pages, 19 figures, 4 tables, To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020 Update: Revised writing/figures and added more references for Section IV Update: Revised Section IV writing/figures and added additional references on signed digit representations

  12. arXiv:2003.03722  [pdf, other

    cs.LG cs.CR stat.ML

    On the Robustness of Cooperative Multi-Agent Reinforcement Learning

    Authors: Jieyu Lin, Kristina Dzeparoska, Sai Qian Zhang, Alberto Leon-Garcia, Nicolas Papernot

    Abstract: In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team to maximize a total team reward. We analyze the robustness of c-MARL to adversaries capable of attacking one of the agents on a team. Through the ability to manipulate this agent's observations, the adversary seeks to decrease the total team reward. Attacking c-MARL is challenging for… ▽ More

    Submitted 8 March, 2020; originally announced March 2020.

  13. arXiv:1912.02057  [pdf, other

    cs.LG eess.SP

    RTN: Reparameterized Ternary Network

    Authors: Yuhang Li, Xin Dong, Sai Qian Zhang, Haoli Bai, Yuanpeng Chen, Wei Wang

    Abstract: To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and t… ▽ More

    Submitted 12 December, 2019; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: To appear at AAAI-20

  14. arXiv:1909.02682  [pdf, other

    cs.LG stat.ML

    Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

    Authors: Sai Qian Zhang, Qi Zhang, Jieyu Lin

    Abstract: Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications. However, achieving efficient communication among agents has always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL. By limi… ▽ More

    Submitted 1 November, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

  15. arXiv:1905.00462  [pdf, other

    cs.LG

    Full-stack Optimization for Accelerating CNNs with FPGA Validation

    Authors: Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

    Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference la… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  16. arXiv:1811.04770  [pdf, other

    cs.LG cs.AR stat.ML

    Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter ma… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: To appear in ASPLOS 2019

  17. arXiv:1802.03373  [pdf, other

    cs.NI

    InferBeam: A Fast Beam Alignment Protocol for Millimeter-wave Networking

    Authors: Sai Qian Zhang, H. T. Kung, Youngjune Gwon

    Abstract: We introduce fast millimeter-wave base station (BS) and its antenna sector selection for user equipment based on its location. Using a conditional random field inference model with specially designed parameters, which are robust to change of environment, InferBeam allows the use of measurement samples on best beam selection at a small number of locations to infer the rest dynamically. Compared to… ▽ More

    Submitted 5 March, 2018; v1 submitted 9 February, 2018; originally announced February 2018.