subscribe to arXiv mailings

arXiv:2406.13578 [pdf, other]

Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Findings at ACL 2024

arXiv:2402.05625 [pdf, other]

Coded Many-User Multiple Access via Approximate Message Passing

Authors: Xiaoqi Liu, Kuan Hsieh, Ramji Venkataramanan

Abstract: We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. In this regime, schemes based on sparse superposition coding can achieve a near-optimal tradeoff between spectral efficiency and signal-to-noise ratio. However, these schemes are feasible only for small values of user payload. This paper investigates effic… ▽ More We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. In this regime, schemes based on sparse superposition coding can achieve a near-optimal tradeoff between spectral efficiency and signal-to-noise ratio. However, these schemes are feasible only for small values of user payload. This paper investigates efficient schemes for larger user payloads, focusing on coded CDMA schemes where each user's information is encoded via a linear code before being modulated with a signature sequence. We propose an efficient approximate message passing (AMP) decoder that can be tailored to the structure of the linear code, and provide an exact asymptotic characterization of its performance. Based on this result, we consider a decoder that integrates AMP and belief propagation and characterize its tradeoff between spectral efficiency and signal-to-noise ratio, for a given target error rate. Simulation results show that the decoder achieves state-of-the-art performance at finite lengths, with a coded CDMA scheme defined using LDPC codes and a spatially coupled matrix of signature sequences. △ Less

Submitted 1 July, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 23 pages, 8 figures. A shorter version of this paper to appear in the Proceedings of IEEE ISIT 2024

arXiv:2310.01673 [pdf, other]

doi 10.1109/ICDH60066.2023.00021

A Versatile Data Fabric for Advanced IoT-Based Remote Health Monitoring

Authors: Italo Buleje, Vince S. Siu, Kuan Yu Hsieh, Nigel Hinds, Bing Dang, Erhan Bilal, Thanhnha Nguyen, Ellen E. Lee, Colin A. Depp, Jeffrey L. Rogers

Abstract: This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. M… ▽ More This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. Moreover, compliance with regulatory and contractual obligations is essential. The proposed data fabric comprises an architecture and a toolkit that facilitate the integration of heterogeneous data sources, across different environments, to provide a unified view of the data in dashboards. Furthermore, the data fabric supports the development of reusable and configurable data integration components, which can be shared as open-source or inner-source software. These components are used to generate data pipelines that can be deployed and scheduled to run either in the cloud or on-premises. Additionally, we present the implementation of our data fabric in a home-based telemonitoring research project involving older adults, conducted in collaboration with the University of California, San Diego (UCSD). The study showcases the streamlined integration of data collected from various IoT sensors and mobile applications to create a unified view of older adults' health for further analysis and research. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH), Chicago, IL, USA, 2023, pp. 88-90

arXiv:2309.08404 [pdf, other]

Bayes-Optimal Estimation in Generalized Linear Models via Spatial Coupling

Authors: Pablo Pascual Cobo, Kuan Hsieh, Ramji Venkataramanan

Abstract: We consider the problem of signal estimation in a generalized linear model (GLM). GLMs include many canonical problems in statistical estimation, such as linear regression, phase retrieval, and 1-bit compressed sensing. Recent work has precisely characterized the asymptotic minimum mean-squared error (MMSE) for GLMs with i.i.d. Gaussian sensing matrices. However, in many models there is a signific… ▽ More We consider the problem of signal estimation in a generalized linear model (GLM). GLMs include many canonical problems in statistical estimation, such as linear regression, phase retrieval, and 1-bit compressed sensing. Recent work has precisely characterized the asymptotic minimum mean-squared error (MMSE) for GLMs with i.i.d. Gaussian sensing matrices. However, in many models there is a significant gap between the MMSE and the performance of the best known feasible estimators. In this work, we address this issue by considering GLMs defined via spatially coupled sensing matrices. We propose an efficient approximate message passing (AMP) algorithm for estimation and prove that with a simple choice of spatially coupled design, the MSE of a carefully tuned AMP estimator approaches the asymptotic MMSE in the high-dimensional limit. To prove the result, we first rigorously characterize the asymptotic performance of AMP for a GLM with a generic spatially coupled design. This characterization is in terms of a deterministic recursion (`state evolution') that depends on the parameters defining the spatial coupling. Then, using a simple spatially coupled design and judicious choice of functions defining the AMP, we analyze the fixed points of the resulting state evolution and show that it achieves the asymptotic MMSE. Numerical results for phase retrieval and rectified linear regression show that spatially coupled designs can yield substantially lower MSE than i.i.d. Gaussian designs at finite dimensions when used with AMP algorithms. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 39 pages, 4 figures. A shorter version of this paper appeared in the proceedings of the 2023 IEEE International Symposium on Information Theory

arXiv:2308.06261 [pdf, other]

Enhancing Network Management Using Code Generated by Large Language Models

Authors: Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh, Santiago Segarra, Ranveer Chandra, Srikanth Kandula

Abstract: Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate t… ▽ More Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2305.13792 [pdf, other]

Mitigating the Performance Impact of Network Failures in Public Clouds

Authors: Pooria Namyar, Behnaz Arzani, Daniel Crankshaw, Daniel S. Berger, Kevin Hsieh, Srikanth Kandula, Ramesh Govindan

Abstract: Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWA… ▽ More Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWARM, the approach described in this paper, can pick orders of magnitude better mitigations by estimating end-to-end connection-level performance (CLP) metrics. At its core is a scalable CLP estimator that quickly ranks mitigations with high fidelity and, on failures observed at a large cloud provider, outperforms the state-of-the-art by over 700$\times$ in some cases. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2211.06330 [pdf, other]

doi 10.1109/ICDH55609.2022.00015

Health Guardian Platform: A technology stack to accelerate discovery in Digital Health research

Authors: Bo Wen, Vince S. Siu, Italo Buleje, Kuan Yu Hsieh, Takashi Itoh, Lukas Zimmerli, Nigel Hinds, Elif Eyigoz, Bing Dang, Stefan von Cavallar, Jeffrey L. Rogers

Abstract: This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical… ▽ More This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical cohorts to understand disease and enable early prevention. The platform can be connected to mobile applications, wearables, or Internet of things (IoT) devices to collect health-related data into a secure database. When the analytics are created, the researchers can containerize and deploy their code on the cloud using pre-defined templates, and validate the models using the data collected from one or more sensing devices. The Health Guardian platform currently supports time-series, text, audio, and video inputs with 70+ analytic capabilities and is used for non-commercial scientific research. We provide an example of the Alzheimer's disease (AD) assessment microservice which uses AI methods to extract linguistic features from audio recordings to evaluate an individual's mini-mental state, the likelihood of having AD, and to predict the onset of AD before turning the age of 85. Today, IBM research teams across the globe use the Health Guardian internally as a test bed for early-stage research ideas, and externally with collaborators to support and enhance AI model development and clinical study efforts. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 6 pages, 3 figures, https://ieeexplore.ieee.org/document/9861047

Journal ref: IEEE International Conference on Digital Health (ICDH), 2022, pp. 40-46

arXiv:2206.00799 [pdf, other]

Federated Learning under Distributed Concept Drift

Authors: Ellango Jothimurugesan, Kevin Hsieh, Jianyu Wang, Gauri Joshi, Phillip B. Gibbons

Abstract: Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). To the best of our knowledge, this work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solu… ▽ More Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). To the best of our knowledge, this work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation that use a single global model are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step. △ Less

Submitted 27 February, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: 20 pages. Published in AISTATS 2023

ACM Class: I.2.6

arXiv:2202.01267 [pdf, other]

FedSpace: An Efficient Federated Learning Framework at Satellites and Ground Stations

Authors: Jinhyun So, Kevin Hsieh, Behnaz Arzani, Shadi Noghabi, Salman Avestimehr, Ranveer Chandra

Abstract: Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, spar… ▽ More Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, sparse connectivity, and regularization constraints on the imagery resolution. To address these challenges, we leverage Federated Learning (FL), where ground stations and satellites collaboratively train a global ML model without sharing the captured images on the satellites. We show fundamental challenges in applying existing FL algorithms among satellites and ground stations, and we formulate an optimization problem which captures a unique trade-off between staleness and idleness. We propose a novel FL framework, named FedSpace, which dynamically schedules model aggregation based on the deterministic and time-varying connectivity according to satellite orbits. Extensive numerical evaluations based on real-world satellite images and satellite networks show that FedSpace reduces the training time by 1.7 days (38.6%) over the state-of-the-art FL algorithms. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2110.05554 [pdf, other]

Towards a Cost vs. Quality Sweet Spot for Monitoring Networks

Authors: Nofel Yaseen, Behnaz Arzani, Krishna Chintalapudi, Vaishnavi Ranganathan, Felipe Frujeri, Kevin Hsieh, Daniel Berger, Vincent Liu, Srikanth Kandula

Abstract: Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled t… ▽ More Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled time-series, we show that we can use signal processing techniques such as the Nyquist-Shannon theorem to avoid wasteful data collection. We show that large savings appear possible by analyzing tens of popular measurements from a production datacenter network. We also discuss the technical challenges that must be solved when applying these techniques in practice. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2102.11267 [pdf, other]

Interpret-able feedback for AutoML systems

Authors: Behnaz Arzani, Kevin Hsieh, Haoxian Chen

Abstract: Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model other than hiring a data scientist or learning ML -- this defeats the purpose of AutoML and limits its adoption. We introduce an interpretable data f… ▽ More Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model other than hiring a data scientist or learning ML -- this defeats the purpose of AutoML and limits its adoption. We introduce an interpretable data feedback solution for AutoML. Our solution suggests new data points for the user to label (without requiring a pool of unlabeled data) to improve the model's accuracy. Our solution analyzes how features influence the prediction among all ML models in an AutoML ensemble, and we suggest more data samples from feature ranges that have high variance in such analysis. Our evaluation shows that our solution can improve the accuracy of AutoML by 7-8% and significantly outperforms popular active learning solutions in data efficiency, all the while providing the added benefit of being interpretable. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2102.04730 [pdf, other]

doi 10.1109/JSAIT.2022.3158827

Near-Optimal Coding for Many-user Multiple Access Channels

Authors: Kuan Hsieh, Cynthia Rush, Ramji Venkataramanan

Abstract: This paper considers the Gaussian multiple-access channel (MAC) in the asymptotic regime where the number of users grows linearly with the code length. We propose efficient coding schemes based on random linear models with approximate message passing (AMP) decoding and derive the asymptotic error rate achieved for a given user density, user payload (in bits), and user energy. The tradeoff between… ▽ More This paper considers the Gaussian multiple-access channel (MAC) in the asymptotic regime where the number of users grows linearly with the code length. We propose efficient coding schemes based on random linear models with approximate message passing (AMP) decoding and derive the asymptotic error rate achieved for a given user density, user payload (in bits), and user energy. The tradeoff between energy-per-bit and achievable user density (for a fixed user payload and target error rate) is studied, and it is demonstrated that in the large system limit, a spatially coupled coding scheme with AMP decoding achieves near-optimal tradeoffs for a wide range of user densities. Furthermore, in the regime where the user payload is large, we also study the tradeoff between energy-per-bit and spectral efficiency and discuss methods to reduce decoding complexity. △ Less

Submitted 9 March, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

Comments: 15 pages, 4 figures. To appear in IEEE Journal on Selected Areas in Information Theory

Journal ref: IEEE Journal on Selected Areas in Information Theory, vol. 3, no. 1, pp. 21-36, March 2022

arXiv:2012.10557 [pdf, other]

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

Authors: Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica

Abstract: Video analytics applications use edge compute servers for the analytics of the videos (for bandwidth and privacy). Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data. Continuous learning handles data drift by periodically retraining the models on new data. Our work addresses the challenge of jointl… ▽ More Video analytics applications use edge compute servers for the analytics of the videos (for bandwidth and privacy). Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data. Continuous learning handles data drift by periodically retraining the models on new data. Our work addresses the challenge of jointly supporting inference and retraining tasks on edge servers, which requires navigating the fundamental tradeoff between the retrained model's accuracy and the inference accuracy. Our solution Ekya balances this tradeoff across multiple models and uses a micro-profiler to identify the models that will benefit the most by retraining. Ekya's accuracy gain compared to a baseline scheduler is 29% higher, and the baseline requires 4x more GPU resources to achieve the same accuracy as Ekya. △ Less

Submitted 18 December, 2020; originally announced December 2020.

arXiv:2009.10931 [pdf]

doi 10.1038/s41598-021-02353-5

Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence

Authors: Kanglin Hsieh, Yinyin Wang, Luyao Chen, Zhongming Zhao, Sean Savitz, Xiaoqian Jiang, Jing Tang, Yejin Kim

Abstract: Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neura… ▽ More Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neural networks, and in-vitro/population-based validations. We first collected all the available drugs (n= 3,635) involved in COVID-19 patient treatment through CTDbase. We built a SARS-CoV-2 knowledge graph based on the interactions among virus baits, host genes, pathways, drugs, and phenotypes. A deep graph neural network approach was used to derive the candidate representation based on the biological interactions. We prioritized the candidate drugs using clinical trial history, and then validated them with their genetic profiles, in vitro experimental efficacy, and electronic health records. We highlight the top 22 drugs including Azithromycin, Atorvastatin, Aspirin, Acetaminophen, and Albuterol. We further pinpointed drug combinations that may synergistically target COVID-19. In summary, we demonstrated that the integration of extensive interactions, deep neural networks, and rigorous validation can facilitate the rapid identification of candidate drugs for COVID-19 treatment. This is a post-peer-review, pre-copyedit version of an article published in Scientific Reports The final authenticated version is available online at: https://www.nature.com/articles/s41598-021-02353-5 △ Less

Submitted 1 February, 2022; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: 13 pages

Journal ref: Sci Rep 11, 23179 (2021)

arXiv:2004.09549 [pdf, other]

doi 10.1109/TIT.2021.3081368

Modulated Sparse Superposition Codes for the Complex AWGN Channel

Authors: Kuan Hsieh, Ramji Venkataramanan

Abstract: This paper studies a generalization of sparse superposition codes (SPARCs) for communication over the complex additive white Gaussian noise (AWGN) channel. In a SPARC, the codebook is defined in terms of a design matrix, and each codeword is a generated by multiplying the design matrix with a sparse message vector. In the standard SPARC construction, information is encoded in the locations of the… ▽ More This paper studies a generalization of sparse superposition codes (SPARCs) for communication over the complex additive white Gaussian noise (AWGN) channel. In a SPARC, the codebook is defined in terms of a design matrix, and each codeword is a generated by multiplying the design matrix with a sparse message vector. In the standard SPARC construction, information is encoded in the locations of the non-zero entries of the message vector. In this paper we generalize the construction and consider modulated SPARCs, where information in encoded in both the locations and the values of the non-zero entries of the message vector. We focus on the case where the non-zero entries take values from a phase-shift keying (PSK) constellation. We propose a computationally efficient approximate message passing (AMP) decoder, and obtain analytical bounds on the state evolution parameters which predict the error performance of the decoder. Using these bounds we show that PSK-modulated SPARCs are asymptotically capacity achieving for the complex AWGN channel, with either spatial coupling or power allocation. We also provide numerical simulation results to demonstrate the error performance at finite code lengths. These results show that introducing modulation to the SPARC design can significantly reduce decoding complexity without sacrificing error performance. △ Less

Submitted 11 May, 2021; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: 20 pages, 6 figures. To appear in IEEE Transactions on Information Theory

Journal ref: IEEE Transactions on Information Theory, vol. 67, no. 7, pp. 4385-4404, July 2021

arXiv:2002.07844 [pdf, other]

doi 10.1109/TIT.2021.3083733

Capacity-achieving Spatially Coupled Sparse Superposition Codes with AMP Decoding

Authors: Cynthia Rush, Kuan Hsieh, Ramji Venkataramanan

Abstract: Sparse superposition codes, also called sparse regression codes (SPARCs), are a class of codes for efficient communication over the AWGN channel at rates approaching the channel capacity. In a standard SPARC, codewords are sparse linear combinations of columns of an i.i.d. Gaussian design matrix, while in a spatially coupled SPARC the design matrix has a block-wise structure, where the variance of… ▽ More Sparse superposition codes, also called sparse regression codes (SPARCs), are a class of codes for efficient communication over the AWGN channel at rates approaching the channel capacity. In a standard SPARC, codewords are sparse linear combinations of columns of an i.i.d. Gaussian design matrix, while in a spatially coupled SPARC the design matrix has a block-wise structure, where the variance of the Gaussian entries can be varied across blocks. A well-designed spatial coupling structure can significantly enhance the error performance of iterative decoding algorithms such as Approximate Message Passing (AMP). In this paper, we obtain a non-asymptotic bound on the probability of error of spatially coupled SPARCs with AMP decoding. Applying this bound to a simple band-diagonal design matrix, we prove that spatially coupled SPARCs with AMP decoding achieve the capacity of the AWGN channel. The bound also highlights how the decay of error probability depends on each design parameter of the spatially coupled SPARC. An attractive feature of AMP decoding is that its asymptotic mean squared error (MSE) can be predicted via a deterministic recursion called state evolution. Our result provides the first proof that the MSE concentrates on the state evolution prediction for spatially coupled designs. Combined with the state evolution prediction, this result implies that spatially coupled SPARCs with the proposed band-diagonal design are capacity-achieving. Using the proof technique used to establish the main result, we also obtain a concentration inequality for the MSE of AMP applied to compressed sensing with spatially coupled design matrices. Finally we provide numerical simulation results that demonstrate the finite length error performance of spatially coupled SPARCs. The performance is compared with coded modulation schemes that use LDPC codes from the DVB-S2 standard. △ Less

Submitted 8 May, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: To appear in IEEE Transactions on Information Theory. This version contains proofs of two technical lemmas that were omitted in the journal version

Journal ref: IEEE Transactions on Information Theory, vol. 67, no. 7, pp. 4446-4484, July 2021

arXiv:1910.08663 [pdf, other]

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

Authors: Kevin Hsieh

Abstract: The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Suc… ▽ More The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Such real-world data pose many challenges in communication and computation. For example, when training data are distributed across data centers that span multiple continents, communication among data centers can easily overwhelm the limited wide-area network bandwidth, leading to prohibitively high latency and high cost. In this dissertation, we demonstrate that the latency and cost of ML on highly-distributed and rapidly-growing data can be improved by one to two orders of magnitude by designing ML systems that exploit the characteristics of ML algorithms, ML model structures, and ML training/serving data. We support this thesis statement with three contributions. First, we design a system that provides both low-latency and low-cost ML serving (inferencing) over large-scale and continuously-growing datasets, such as videos. Second, we build a system that makes ML training over geo-distributed datasets as fast as training within a single data center. Third, we present a first detailed study and a system-level solution on a fundamental and largely overlooked problem: ML training over non-IID (i.e., not independent and identically distributed) data partitions (e.g., facial images collected by cameras varies according to the demographics of each camera's location). △ Less

Submitted 18 October, 2019; originally announced October 2019.

arXiv:1910.00189 [pdf, other]

The Non-IID Data Quagmire of Decentralized Machine Learning

Authors: Kevin Hsieh, Amar Phanishayee, Onur Mutlu, Phillip B. Gibbons

Abstract: Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by… ▽ More Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization. △ Less

Submitted 18 August, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

Journal ref: International Conference on Machine Learning (ICML), 2020

arXiv:1805.03154 [pdf, other]

Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips

Authors: Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Khan, Onur Mutlu

Abstract: This article summarizes key results of our work on experimental characterization and analysis of latency variation and latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential. The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chi… ▽ More This article summarizes key results of our work on experimental characterization and analysis of latency variation and latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential. The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make six major new observations about latency variation within DRAM. Notably, we find that (i) there is large latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the latency of each operation is reduced. Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1805.02498 [pdf, other]

Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance

Authors: Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, Onur Mutlu

Abstract: The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of modern GPU programming models. This specification determines the parallelism, and hence performance, of the application during execution because the corresponding on-chip hardware resources are allocated a… ▽ More The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of modern GPU programming models. This specification determines the parallelism, and hence performance, of the application during execution because the corresponding on-chip hardware resources are allocated and managed based on this specification. This tight-coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance. Zorua is a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. We demonstrate that by providing the illusion of more resources than physically available via controlled and coordinated virtualization, Zorua offers several important benefits: (i) Programming Ease. Zorua eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability. Zorua alleviates the necessity of re-tuning an application's resource usage when porting the application across GPU generations. (iii) Performance. By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Comments: arXiv admin note: substantial text overlap with arXiv:1802.02573

arXiv:1803.08625 [pdf, other]

A Concept Learning Tool Based On Calculating Version Space Cardinality

Authors: Kuo-Kai Hsieh, Li-C. Wang

Abstract: In this paper, we proposed VeSC-CoL (Version Space Cardinality based Concept Learning) to deal with concept learning on extremely imbalanced datasets, especially when cross-validation is not a viable option. VeSC-CoL uses version space cardinality as a measure for model quality to replace cross-validation. Instead of naive enumeration of the version space, Ordered Binary Decision Diagram and Boole… ▽ More In this paper, we proposed VeSC-CoL (Version Space Cardinality based Concept Learning) to deal with concept learning on extremely imbalanced datasets, especially when cross-validation is not a viable option. VeSC-CoL uses version space cardinality as a measure for model quality to replace cross-validation. Instead of naive enumeration of the version space, Ordered Binary Decision Diagram and Boolean Satisfiability are used to compute the version space. Experiments show that VeSC-CoL can accurately learn the target concept when computational resource is allowed. △ Less

Submitted 22 March, 2018; originally announced March 2018.

arXiv:1802.02573 [pdf, other]

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management

Authors: Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Khan, Ashish Shrestha, Saugata Ghose, Phillip B. Gibbons, Onur Mutlu

Abstract: The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of the existing GPU programming models. This specification determines the performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely… ▽ More The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of the existing GPU programming models. This specification determines the performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely based on this specification. This tight coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance, as we demonstrate in this work. Our goal in this work is to reduce the dependence of performance on the software-provided resource specification to simultaneously alleviate the above challenges. To this end, we introduce Zorua, a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer. We demonstrate that by providing the illusion of more resources than physically available, Zorua offers several important benefits: (i) Programming Ease: Zorua eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability: Zorua alleviates the necessity of re-tuning an application's resource usage when porting the application across GPU generations. (iii) Performance: By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. The holistic virtualization provided by Zorua has many other potential uses which we describe in this paper. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Report number: SAFARI Technical Report 2016-005

arXiv:1802.00320 [pdf, other]

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

Authors: Saugata Ghose, Kevin Hsieh, Amirali Boroumand, Rachata Ausavarungnirun, Onur Mutlu

Abstract: Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a pin-limited memory channel to the CPU before any computation can take place. This requires a high latency and energy overhead, and the data often cannot benefit from c… ▽ More Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a pin-limited memory channel to the CPU before any computation can take place. This requires a high latency and energy overhead, and the data often cannot benefit from caching in the CPU, making it difficult to amortize the overhead. Modern 3D-stacked DRAM architectures include a logic layer, where compute logic can be integrated underneath multiple layers of DRAM cell arrays within the same chip. Architects can take advantage of the logic layer to perform processing-in-memory (PIM), or near-data processing. In a PIM architecture, the logic layer within DRAM has access to the high internal bandwidth available within 3D-stacked DRAM (which is much greater than the bandwidth available between DRAM and the CPU). Thus, PIM architectures can effectively free up valuable memory channel bandwidth while reducing system energy consumption. A number of important issues arise when we add compute logic to DRAM. In particular, the logic does not have low-latency access to common CPU structures that are essential for modern application execution, such as the virtual memory and cache coherence mechanisms. To ease the widespread adoption of PIM, we ideally would like to maintain traditional virtual memory abstractions and the shared memory programming model. This requires efficient mechanisms that can provide logic in DRAM with access to CPU structures without having to communicate frequently with the CPU. To this end, we propose and evaluate two general-purpose solutions that minimize unnecessary off-chip communication for PIM architectures. We show that both mechanisms improve the performance and energy consumption of many important memory-intensive applications. △ Less

Submitted 1 February, 2018; originally announced February 2018.

arXiv:1801.03493 [pdf, other]

Focus: Querying Large Video Datasets with Low Latency and Low Cost

Authors: Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, Onur Mutlu

Abstract: Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive… ▽ More Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time. △ Less

Submitted 10 January, 2018; originally announced January 2018.

arXiv:1801.01796 [pdf, other]

Spatially Coupled Sparse Regression Codes: Design and State Evolution Analysis

Authors: Kuan Hsieh, Cynthia Rush, Ramji Venkataramanan

Abstract: We consider the design and analysis of spatially coupled sparse regression codes (SC-SPARCs), which were recently introduced by Barbier et al. for efficient communication over the additive white Gaussian noise channel. SC-SPARCs can be efficiently decoded using an Approximate Message Passing (AMP) decoder, whose performance in each iteration can be predicted via a set of equations called state evo… ▽ More We consider the design and analysis of spatially coupled sparse regression codes (SC-SPARCs), which were recently introduced by Barbier et al. for efficient communication over the additive white Gaussian noise channel. SC-SPARCs can be efficiently decoded using an Approximate Message Passing (AMP) decoder, whose performance in each iteration can be predicted via a set of equations called state evolution. In this paper, we give an asymptotic characterization of the state evolution equations for SC-SPARCs. For any given base matrix (that defines the coupling structure of the SC-SPARC) and rate, this characterization can be used to predict whether or not AMP decoding will succeed in the large system limit. We then consider a simple base matrix defined by two parameters $(ω, Λ)$, and show that AMP decoding succeeds in the large system limit for all rates $R < \mathcal{C}$. The asymptotic result also indicates how the parameters of the base matrix affect the decoding progression. Simulation results are presented to evaluate the performance of SC-SPARCs defined with the proposed base matrix. △ Less

Submitted 26 April, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

Comments: 8 pages, 6 figures. A shorter version of this paper to appear in ISIT 2018

arXiv:1711.03906 [pdf, other]

doi 10.1145/3084041.3084049

D-SLATS: Distributed Simultaneous Localization and Time Synchronization

Authors: Amr Alanwar, Henrique Ferraz, Kevin Hsieh, Rohit Thazhath, Paul Martin, Joao Hespanha, Mani Srivastava

Abstract: Through the last decade, we have witnessed a surge of Internet of Things (IoT) devices, and with that a greater need to choreograph their actions across both time and space. Although these two problems, namely time synchronization and localization, share many aspects in common, they are traditionally treated separately or combined on centralized approaches that results in an ineffcient use of reso… ▽ More Through the last decade, we have witnessed a surge of Internet of Things (IoT) devices, and with that a greater need to choreograph their actions across both time and space. Although these two problems, namely time synchronization and localization, share many aspects in common, they are traditionally treated separately or combined on centralized approaches that results in an ineffcient use of resources, or in solutions that are not scalable in terms of the number of IoT devices. Therefore, we propose D-SLATS, a framework comprised of three different and independent algorithms to jointly solve time synchronization and localization problems in a distributed fashion. The First two algorithms are based mainly on the distributed Extended Kalman Filter (EKF) whereas the third one uses optimization techniques. No fusion center is required, and the devices only communicate with their neighbors. The proposed methods are evaluated on custom Ultra-Wideband communication Testbed and a quadrotor, representing a network of both static and mobile nodes. Our algorithms achieve up to three microseconds time synchronization accuracy and 30 cm localization error. △ Less

Submitted 10 November, 2017; originally announced November 2017.

arXiv:1706.03162 [pdf, other]

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures

Authors: Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Nastaran Hajinazar, Kevin Hsieh, Krishna T. Malladi, Hongzhong Zheng, Onur Mutlu

Abstract: Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to e… ▽ More Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to execute within the processor. Unfortunately, PIM architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages, which, as we illustrate in this work, can undo the benefits of PIM execution for many data-intensive applications. We propose LazyPIM, a new hardware cache coherence mechanism designed specifically for PIM. Prior approaches for coherence in PIM are ill-suited to applications that share a large amount of data between the processor and the PIM logic. LazyPIM uses a combination of speculative cache coherence and compressed coherence signatures to greatly reduce the overhead of keeping PIM coherent with the processor, even when a large amount of sharing exists.We find that LazyPIM improves average performance across a range of data-intensive PIM applications by 19.6%, reduces off-chip traffic by 30.9%, and reduces energy consumption by 18.0%, over the best prior approaches to PIM coherence. △ Less

Submitted 9 June, 2017; originally announced June 2017.

Showing 1–27 of 27 results for author: Hsieh, K