Skip to main content

Showing 1–29 of 29 results for author: Kung, H T

  1. arXiv:2403.16451  [pdf, other

    cs.LG cs.AI

    DeepMachining: Online Prediction of Machining Errors of Lathe Machines

    Authors: Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng

    Abstract: We describe DeepMachining, a deep learning-based AI system for online prediction of machining errors of lathe machine operations. We have built and evaluated DeepMachining based on manufacturing data from factories. Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states. Then, we fine-tune the pretrained model… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  2. arXiv:2402.15504  [pdf, other

    cs.CV cs.AI

    Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

    Authors: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

    Abstract: Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Preprint; Project Page: https://danielchyeh.github.io/Gen4Gen/

  3. arXiv:2307.03930  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

    Authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

    Abstract: We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to e… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Rosko's CPU implementation can be found at https://github.com/vnatesh/Rosko

  4. arXiv:2304.05544  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

    Authors: Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting

    Abstract: We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in th… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted as a full paper by the TinyML Research Symposium 2023

  5. arXiv:2301.01947  [pdf, ps, other

    cs.LG cs.AI cs.CV

    StitchNet: Composing Neural Networks from Pre-Trained Fragments

    Authors: Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung

    Abstract: We propose StitchNet, a novel neural network creation paradigm that stitches together fragments (one or more consecutive network layers) from multiple pre-trained neural networks. StitchNet allows the creation of high-performing neural networks without the large compute and data requirements needed under traditional model creation processes via backpropagation training. We leverage Centered Kernel… ▽ More

    Submitted 23 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  6. arXiv:2209.12127  [pdf, other

    cs.LG

    SpeedLimit: Neural Architecture Search for Quantized Transformer Models

    Authors: Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

    Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More

    Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

  7. arXiv:2207.09413  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    SphereFed: Hyperspherical Federated Learning

    Authors: Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

    Abstract: Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: European Conference on Computer Vision 2022

  8. arXiv:2204.04705  [pdf, other

    cs.LG cs.AI cs.DC

    SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

    Authors: Xin Dong, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, Ziyun Li

    Abstract: We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural archi… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

  9. arXiv:2110.15456  [pdf, other

    cs.LG cs.AR

    FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

    Authors: Sai Qian Zhang, Bradley McDanel, H. T. Kung

    Abstract: Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precis… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  10. arXiv:2107.06304  [pdf, other

    cs.LG cs.CV

    Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

    Authors: Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

    Abstract: Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead o… ▽ More

    Submitted 24 October, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: A new data-free inversion method to reverse neural networks and get input from intermediate feature maps. BMVC'22

  11. arXiv:2104.11408  [pdf, other

    cs.LG

    Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

    Authors: Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu, H. T. Kung

    Abstract: Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical stu… ▽ More

    Submitted 26 March, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

  12. arXiv:2007.06389  [pdf, other

    cs.CV cs.LG

    Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 13 pages, 19 figures, 4 tables, To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020 Update: Revised writing/figures and added more references for Section IV Update: Revised Section IV writing/figures and added additional references on signed digit representations

  13. arXiv:1907.08377  [pdf, other

    cs.LG cs.AI cs.CR

    DaiMoN: A Decentralized Artificial Intelligence Model Network

    Authors: Surat Teerapittayanon, H. T. Kung

    Abstract: We introduce DaiMoN, a decentralized artificial intelligence model network, which incentivizes peer collaboration in improving the accuracy of machine learning models for a given classification problem. It is an autonomous network where peers may submit models with improved accuracy and other peers may verify the accuracy improvement. The system maintains an append-only decentralized ledger to kee… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: 2019 IEEE International Conference on Blockchain

  14. arXiv:1906.07148  [pdf, other

    cs.LG cs.CR stat.ML

    CheckNet: Secure Inference on Untrusted Devices

    Authors: Marcus Comiter, Surat Teerapittayanon, H. T. Kung

    Abstract: We introduce CheckNet, a method for secure inference with deep neural networks on untrusted devices. CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker. CheckNet is completely transpa… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  15. arXiv:1905.00462  [pdf, other

    cs.LG

    Full-stack Optimization for Accelerating CNNs with FPGA Validation

    Authors: Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

    Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference la… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  16. arXiv:1812.05083  [pdf, other

    cs.CV cs.CL cs.LG

    Adversarial Learning of Semantic Relevance in Text to Image Synthesis

    Authors: Miriam Cha, Youngjune L. Gwon, H. T. Kung

    Abstract: We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching… ▽ More

    Submitted 5 February, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

  17. arXiv:1811.04770  [pdf, other

    cs.LG cs.AR stat.ML

    Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter ma… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: To appear in ASPLOS 2019

  18. arXiv:1802.03373  [pdf, other

    cs.NI

    InferBeam: A Fast Beam Alignment Protocol for Millimeter-wave Networking

    Authors: Sai Qian Zhang, H. T. Kung, Youngjune Gwon

    Abstract: We introduce fast millimeter-wave base station (BS) and its antenna sector selection for user equipment based on its location. Using a conditional random field inference model with specially designed parameters, which are robust to change of environment, InferBeam allows the use of measurement samples on best beam selection at a small number of locations to infer the rest dynamically. Compared to… ▽ More

    Submitted 5 March, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

  19. arXiv:1710.07830  [pdf, other

    cs.LG cs.CV stat.ML

    Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We propose the use of incomplete dot products (IDP) to dynamically adjust the number of input channels used in each layer of a convolutional neural network during feedforward inference. IDP adds monotonically non-increasing coefficients, referred to as a "profile", to the channels during training. The profile orders the contribution of each channel in non-increasing order. At inference time, the n… ▽ More

    Submitted 21 October, 2017; originally announced October 2017.

  20. arXiv:1709.02260  [pdf, other

    cs.CV cs.LG

    Embedded Binarized Neural Networks

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  21. arXiv:1709.01921  [pdf, other

    cs.CV cs.DC

    Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed c… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  22. arXiv:1709.01888  [pdf, other

    cs.CL cs.LG

    Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

    Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

    Abstract: We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

  23. arXiv:1709.01686  [pdf, other

    cs.NE cs.CV cs.LG

    BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  24. arXiv:1708.09321  [pdf, other

    cs.CV

    Adversarial nets with perceptual losses for text-to-image synthesis

    Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

    Abstract: Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text. Despite the overall fair quality, the generated images often expose visible flaws that lack structural definition for an object of interest. In this paper, we aim to extend state of the art for GAN-based text-to-image synthesis by improving perceptual quality of generate… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

  25. arXiv:1605.05212  [pdf, other

    cs.LG cs.CV

    Multimodal Sparse Coding for Event Detection

    Authors: Youngjune Gwon, William Campbell, Kevin Brady, Douglas Sturim, Miriam Cha, H. T. Kung

    Abstract: Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as… ▽ More

    Submitted 17 May, 2016; originally announced May 2016.

    Comments: Multimodal Machine Learning Workshop at NIPS 2015

  26. arXiv:1511.06238  [pdf, other

    cs.LG cs.CV stat.ML

    Multimodal sparse representation learning and applications

    Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

    Abstract: Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalities. The framework can model relationships at a higher level by forcing the shared sparse representation. In particular, we propose the use of joint dictionary lea… ▽ More

    Submitted 2 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  27. arXiv:1212.2894  [pdf, other

    cs.IT cs.DC

    Reducing Reconciliation Communication Cost with Compressed Sensing

    Authors: H. T. Kung, Chia-Mu Yu

    Abstract: We consider a reconciliation problem, where two hosts wish to synchronize their respective sets. Efficient solutions for minimizing the communication cost between the two hosts have been previously proposed in the literature. However, they rely on prior knowledge about the size of the set differences between the two sets to be reconciled. In this paper, we propose a method which can achieve compar… ▽ More

    Submitted 4 December, 2012; originally announced December 2012.

    Comments: 4 pages, 2 figures

  28. arXiv:1004.3716  [pdf, ps, other

    cs.DS cs.DC math.NA

    Some linear-time algorithms for systolic arrays

    Authors: Richard P. Brent, Franklin T. Luk, H. T. Kung

    Abstract: We survey some results on linear-time algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two n-bit binary numbers. We show how n * n Toeplitz systems of linear equations can be solved in time O(n) on a linear ar… ▽ More

    Submitted 21 April, 2010; originally announced April 2010.

    Comments: Corrected version of an old (1983) paper. 23 pages. For further details, see http://wwwmaths.anu.edu.au/~brent/pub/pub079.html

    Report number: Report TR-CS-82-15, DCS, Australian National University, December 1982 MSC Class: 65Y05 (Primary) 37B15; 68Q10; 68Q80 (Secondary) ACM Class: G.1.3; B.6.1; C.1.3

    Journal ref: Information Processing 83 (edited by R.E.A. Mason), North-Holland, Amsterdam, 1983, 865-876

  29. arXiv:cs/9811028  [pdf, ps

    cs.NI

    TCP Trunking

    Authors: H. T. Kung, S. Y. Wang

    Abstract: A TCP trunk is an IP tunnel under TCP control, capable of carrying packets from any number of user flows. By exploiting properties of TCP, a TCP trunk provides elastic and reliable transmission over a network, and automatically shares the network fairly with other competing trunks. Moreover, by aggregating user flows into a single trunk flow, TCP trunking can significantly reduce the number of f… ▽ More

    Submitted 20 November, 1998; originally announced November 1998.

    Comments: postscript file

    ACM Class: C.2.1