Skip to main content

Showing 1–11 of 11 results for author: McDanel, B

  1. arXiv:2301.01947  [pdf, ps, other

    cs.LG cs.AI cs.CV

    StitchNet: Composing Neural Networks from Pre-Trained Fragments

    Authors: Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung

    Abstract: We propose StitchNet, a novel neural network creation paradigm that stitches together fragments (one or more consecutive network layers) from multiple pre-trained neural networks. StitchNet allows the creation of high-performing neural networks without the large compute and data requirements needed under traditional model creation processes via backpropagation training. We leverage Centered Kernel… ▽ More

    Submitted 23 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  2. arXiv:2208.09520  [pdf, other

    cs.CV

    Accelerating Vision Transformer Training via a Patch Sampling Schedule

    Authors: Bradley McDanel, Chi Phuong Huynh

    Abstract: We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of Vision Transformer (ViT) patches used per batch during training. Since all patches are not equally important for most vision objectives (e.g., classification), we argue that less important patches can be used in fewer training iterations, leading to shorter training time with minimal impact on performance. Additi… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Comments: 7 pages, 3 page appendix, 13 figures

  3. arXiv:2202.00774  [pdf, other

    cs.LG cs.CV

    Accelerating DNN Training with Structured Data Gradient Pruning

    Authors: Bradley McDanel, Helia Dinh, John Magallanes

    Abstract: Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not speed up DNN training and can even require more iterations to reach model convergence. In this work, we propose a novel Structured Data Gradient Pruning (SDGP) meth… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

  4. arXiv:2110.15456  [pdf, other

    cs.LG cs.AR

    FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

    Authors: Sai Qian Zhang, Bradley McDanel, H. T. Kung

    Abstract: Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precis… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  5. arXiv:2007.06389  [pdf, other

    cs.CV cs.LG

    Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 13 pages, 19 figures, 4 tables, To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020 Update: Revised writing/figures and added more references for Section IV Update: Revised Section IV writing/figures and added additional references on signed digit representations

  6. arXiv:1905.00462  [pdf, other

    cs.LG

    Full-stack Optimization for Accelerating CNNs with FPGA Validation

    Authors: Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

    Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference la… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  7. arXiv:1811.04770  [pdf, other

    cs.LG cs.AR stat.ML

    Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter ma… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: To appear in ASPLOS 2019

  8. arXiv:1710.07830  [pdf, other

    cs.LG cs.CV stat.ML

    Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We propose the use of incomplete dot products (IDP) to dynamically adjust the number of input channels used in each layer of a convolutional neural network during feedforward inference. IDP adds monotonically non-increasing coefficients, referred to as a "profile", to the channels during training. The profile orders the contribution of each channel in non-increasing order. At inference time, the n… ▽ More

    Submitted 21 October, 2017; originally announced October 2017.

  9. arXiv:1709.02260  [pdf, other

    cs.CV cs.LG

    Embedded Binarized Neural Networks

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  10. arXiv:1709.01921  [pdf, other

    cs.CV cs.DC

    Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed c… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  11. arXiv:1709.01686  [pdf, other

    cs.NE cs.CV cs.LG

    BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.