-
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration
Authors:
Han-Cheng Yu,
Yu-An Shih,
Kin-Man Law,
Kai-Yu Hsieh,
Yu-Chen Cheng,
Hsin-Chih Ho,
Zih-An Lin,
Wen-Chuan Hsu,
Yao-Chung Fan
Abstract:
In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug…
▽ More
In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Through experiments with benchmarking datasets, we show that our models significantly outperform the state-of-the-art results. Our best-performing model advances the F1@3 score from 14.80 to 16.47 in MCQ dataset and from 15.92 to 16.50 in Sciq dataset.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Coded Many-User Multiple Access via Approximate Message Passing
Authors:
Xiaoqi Liu,
Kuan Hsieh,
Ramji Venkataramanan
Abstract:
We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. In this regime, schemes based on sparse superposition coding can achieve a near-optimal tradeoff between spectral efficiency and signal-to-noise ratio. However, these schemes are feasible only for small values of user payload. This paper investigates effic…
▽ More
We consider communication over the Gaussian multiple-access channel in the regime where the number of users grows linearly with the codelength. In this regime, schemes based on sparse superposition coding can achieve a near-optimal tradeoff between spectral efficiency and signal-to-noise ratio. However, these schemes are feasible only for small values of user payload. This paper investigates efficient schemes for larger user payloads, focusing on coded CDMA schemes where each user's information is encoded via a linear code before being modulated with a signature sequence. We propose an efficient approximate message passing (AMP) decoder that can be tailored to the structure of the linear code, and provide an exact asymptotic characterization of its performance. Based on this result, we consider a decoder that integrates AMP and belief propagation and characterize its tradeoff between spectral efficiency and signal-to-noise ratio, for a given target error rate. Simulation results show that the decoder achieves state-of-the-art performance at finite lengths, with a coded CDMA scheme defined using LDPC codes and a spatially coupled matrix of signature sequences.
△ Less
Submitted 1 July, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
A Versatile Data Fabric for Advanced IoT-Based Remote Health Monitoring
Authors:
Italo Buleje,
Vince S. Siu,
Kuan Yu Hsieh,
Nigel Hinds,
Bing Dang,
Erhan Bilal,
Thanhnha Nguyen,
Ellen E. Lee,
Colin A. Depp,
Jeffrey L. Rogers
Abstract:
This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. M…
▽ More
This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. Moreover, compliance with regulatory and contractual obligations is essential. The proposed data fabric comprises an architecture and a toolkit that facilitate the integration of heterogeneous data sources, across different environments, to provide a unified view of the data in dashboards. Furthermore, the data fabric supports the development of reusable and configurable data integration components, which can be shared as open-source or inner-source software. These components are used to generate data pipelines that can be deployed and scheduled to run either in the cloud or on-premises. Additionally, we present the implementation of our data fabric in a home-based telemonitoring research project involving older adults, conducted in collaboration with the University of California, San Diego (UCSD). The study showcases the streamlined integration of data collected from various IoT sensors and mobile applications to create a unified view of older adults' health for further analysis and research.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Bayes-Optimal Estimation in Generalized Linear Models via Spatial Coupling
Authors:
Pablo Pascual Cobo,
Kuan Hsieh,
Ramji Venkataramanan
Abstract:
We consider the problem of signal estimation in a generalized linear model (GLM). GLMs include many canonical problems in statistical estimation, such as linear regression, phase retrieval, and 1-bit compressed sensing. Recent work has precisely characterized the asymptotic minimum mean-squared error (MMSE) for GLMs with i.i.d. Gaussian sensing matrices. However, in many models there is a signific…
▽ More
We consider the problem of signal estimation in a generalized linear model (GLM). GLMs include many canonical problems in statistical estimation, such as linear regression, phase retrieval, and 1-bit compressed sensing. Recent work has precisely characterized the asymptotic minimum mean-squared error (MMSE) for GLMs with i.i.d. Gaussian sensing matrices. However, in many models there is a significant gap between the MMSE and the performance of the best known feasible estimators. In this work, we address this issue by considering GLMs defined via spatially coupled sensing matrices. We propose an efficient approximate message passing (AMP) algorithm for estimation and prove that with a simple choice of spatially coupled design, the MSE of a carefully tuned AMP estimator approaches the asymptotic MMSE in the high-dimensional limit. To prove the result, we first rigorously characterize the asymptotic performance of AMP for a GLM with a generic spatially coupled design. This characterization is in terms of a deterministic recursion (`state evolution') that depends on the parameters defining the spatial coupling. Then, using a simple spatially coupled design and judicious choice of functions defining the AMP, we analyze the fixed points of the resulting state evolution and show that it achieves the asymptotic MMSE. Numerical results for phase retrieval and rectified linear regression show that spatially coupled designs can yield substantially lower MSE than i.i.d. Gaussian designs at finite dimensions when used with AMP algorithms.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Enhancing Network Management Using Code Generated by Large Language Models
Authors:
Sathiya Kumaran Mani,
Yajie Zhou,
Kevin Hsieh,
Santiago Segarra,
Ranveer Chandra,
Srikanth Kandula
Abstract:
Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate t…
▽ More
Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Mitigating the Performance Impact of Network Failures in Public Clouds
Authors:
Pooria Namyar,
Behnaz Arzani,
Daniel Crankshaw,
Daniel S. Berger,
Kevin Hsieh,
Srikanth Kandula,
Ramesh Govindan
Abstract:
Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWA…
▽ More
Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWARM, the approach described in this paper, can pick orders of magnitude better mitigations by estimating end-to-end connection-level performance (CLP) metrics. At its core is a scalable CLP estimator that quickly ranks mitigations with high fidelity and, on failures observed at a large cloud provider, outperforms the state-of-the-art by over 700$\times$ in some cases.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Health Guardian Platform: A technology stack to accelerate discovery in Digital Health research
Authors:
Bo Wen,
Vince S. Siu,
Italo Buleje,
Kuan Yu Hsieh,
Takashi Itoh,
Lukas Zimmerli,
Nigel Hinds,
Elif Eyigoz,
Bing Dang,
Stefan von Cavallar,
Jeffrey L. Rogers
Abstract:
This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical…
▽ More
This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical cohorts to understand disease and enable early prevention. The platform can be connected to mobile applications, wearables, or Internet of things (IoT) devices to collect health-related data into a secure database. When the analytics are created, the researchers can containerize and deploy their code on the cloud using pre-defined templates, and validate the models using the data collected from one or more sensing devices. The Health Guardian platform currently supports time-series, text, audio, and video inputs with 70+ analytic capabilities and is used for non-commercial scientific research. We provide an example of the Alzheimer's disease (AD) assessment microservice which uses AI methods to extract linguistic features from audio recordings to evaluate an individual's mini-mental state, the likelihood of having AD, and to predict the onset of AD before turning the age of 85. Today, IBM research teams across the globe use the Health Guardian internally as a test bed for early-stage research ideas, and externally with collaborators to support and enhance AI model development and clinical study efforts.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Federated Learning under Distributed Concept Drift
Authors:
Ellango Jothimurugesan,
Kevin Hsieh,
Jianyu Wang,
Gauri Joshi,
Phillip B. Gibbons
Abstract:
Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). To the best of our knowledge, this work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solu…
▽ More
Federated Learning (FL) under distributed concept drift is a largely unexplored area. Although concept drift is itself a well-studied phenomenon, it poses particular challenges for FL, because drifts arise staggered in time and space (across clients). To the best of our knowledge, this work is the first to explicitly study data heterogeneity in both dimensions. We first demonstrate that prior solutions to drift adaptation that use a single global model are ill-suited to staggered drifts, necessitating multiple-model solutions. We identify the problem of drift adaptation as a time-varying clustering problem, and we propose two new clustering algorithms for reacting to drifts based on local drift detection and hierarchical clustering. Empirical evaluation shows that our solutions achieve significantly higher accuracy than existing baselines, and are comparable to an idealized algorithm with oracle knowledge of the ground-truth clustering of clients to concepts at each time step.
△ Less
Submitted 27 February, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
FedSpace: An Efficient Federated Learning Framework at Satellites and Ground Stations
Authors:
Jinhyun So,
Kevin Hsieh,
Behnaz Arzani,
Shadi Noghabi,
Salman Avestimehr,
Ranveer Chandra
Abstract:
Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, spar…
▽ More
Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, sparse connectivity, and regularization constraints on the imagery resolution. To address these challenges, we leverage Federated Learning (FL), where ground stations and satellites collaboratively train a global ML model without sharing the captured images on the satellites. We show fundamental challenges in applying existing FL algorithms among satellites and ground stations, and we formulate an optimization problem which captures a unique trade-off between staleness and idleness. We propose a novel FL framework, named FedSpace, which dynamically schedules model aggregation based on the deterministic and time-varying connectivity according to satellite orbits. Extensive numerical evaluations based on real-world satellite images and satellite networks show that FedSpace reduces the training time by 1.7 days (38.6%) over the state-of-the-art FL algorithms.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Towards a Cost vs. Quality Sweet Spot for Monitoring Networks
Authors:
Nofel Yaseen,
Behnaz Arzani,
Krishna Chintalapudi,
Vaishnavi Ranganathan,
Felipe Frujeri,
Kevin Hsieh,
Daniel Berger,
Vincent Liu,
Srikanth Kandula
Abstract:
Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled t…
▽ More
Continuously monitoring a wide variety of performance and fault metrics has become a crucial part of operating large-scale datacenter networks. In this work, we ask whether we can reduce the costs to monitor -- in terms of collection, storage and analysis -- by judiciously controlling how much and which measurements we collect. By positing that we can treat almost all measured signals as sampled time-series, we show that we can use signal processing techniques such as the Nyquist-Shannon theorem to avoid wasteful data collection. We show that large savings appear possible by analyzing tens of popular measurements from a production datacenter network. We also discuss the technical challenges that must be solved when applying these techniques in practice.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Interpret-able feedback for AutoML systems
Authors:
Behnaz Arzani,
Kevin Hsieh,
Haoxian Chen
Abstract:
Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model other than hiring a data scientist or learning ML -- this defeats the purpose of AutoML and limits its adoption. We introduce an interpretable data f…
▽ More
Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts. A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model other than hiring a data scientist or learning ML -- this defeats the purpose of AutoML and limits its adoption. We introduce an interpretable data feedback solution for AutoML. Our solution suggests new data points for the user to label (without requiring a pool of unlabeled data) to improve the model's accuracy. Our solution analyzes how features influence the prediction among all ML models in an AutoML ensemble, and we suggest more data samples from feature ranges that have high variance in such analysis. Our evaluation shows that our solution can improve the accuracy of AutoML by 7-8% and significantly outperforms popular active learning solutions in data efficiency, all the while providing the added benefit of being interpretable.
△ Less
Submitted 22 February, 2021;
originally announced February 2021.
-
Near-Optimal Coding for Many-user Multiple Access Channels
Authors:
Kuan Hsieh,
Cynthia Rush,
Ramji Venkataramanan
Abstract:
This paper considers the Gaussian multiple-access channel (MAC) in the asymptotic regime where the number of users grows linearly with the code length. We propose efficient coding schemes based on random linear models with approximate message passing (AMP) decoding and derive the asymptotic error rate achieved for a given user density, user payload (in bits), and user energy. The tradeoff between…
▽ More
This paper considers the Gaussian multiple-access channel (MAC) in the asymptotic regime where the number of users grows linearly with the code length. We propose efficient coding schemes based on random linear models with approximate message passing (AMP) decoding and derive the asymptotic error rate achieved for a given user density, user payload (in bits), and user energy. The tradeoff between energy-per-bit and achievable user density (for a fixed user payload and target error rate) is studied, and it is demonstrated that in the large system limit, a spatially coupled coding scheme with AMP decoding achieves near-optimal tradeoffs for a wide range of user densities. Furthermore, in the regime where the user payload is large, we also study the tradeoff between energy-per-bit and spectral efficiency and discuss methods to reduce decoding complexity.
△ Less
Submitted 9 March, 2022; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers
Authors:
Romil Bhardwaj,
Zhengxu Xia,
Ganesh Ananthanarayanan,
Junchen Jiang,
Nikolaos Karianakis,
Yuanchao Shu,
Kevin Hsieh,
Victor Bahl,
Ion Stoica
Abstract:
Video analytics applications use edge compute servers for the analytics of the videos (for bandwidth and privacy). Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data. Continuous learning handles data drift by periodically retraining the models on new data. Our work addresses the challenge of jointl…
▽ More
Video analytics applications use edge compute servers for the analytics of the videos (for bandwidth and privacy). Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data. Continuous learning handles data drift by periodically retraining the models on new data. Our work addresses the challenge of jointly supporting inference and retraining tasks on edge servers, which requires navigating the fundamental tradeoff between the retrained model's accuracy and the inference accuracy. Our solution Ekya balances this tradeoff across multiple models and uses a micro-profiler to identify the models that will benefit the most by retraining. Ekya's accuracy gain compared to a baseline scheduler is 29% higher, and the baseline requires 4x more GPU resources to achieve the same accuracy as Ekya.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence
Authors:
Kanglin Hsieh,
Yinyin Wang,
Luyao Chen,
Zhongming Zhao,
Sean Savitz,
Xiaoqian Jiang,
Jing Tang,
Yejin Kim
Abstract:
Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neura…
▽ More
Amid the pandemic of 2019 novel coronavirus disease (COVID-19) infected by SARS-CoV-2, a vast amount of drug research for prevention and treatment has been quickly conducted, but these efforts have been unsuccessful thus far. Our objective is to prioritize repurposable drugs using a drug repurposing pipeline that systematically integrates multiple SARS-CoV-2 and drug interactions, deep graph neural networks, and in-vitro/population-based validations. We first collected all the available drugs (n= 3,635) involved in COVID-19 patient treatment through CTDbase. We built a SARS-CoV-2 knowledge graph based on the interactions among virus baits, host genes, pathways, drugs, and phenotypes. A deep graph neural network approach was used to derive the candidate representation based on the biological interactions. We prioritized the candidate drugs using clinical trial history, and then validated them with their genetic profiles, in vitro experimental efficacy, and electronic health records. We highlight the top 22 drugs including Azithromycin, Atorvastatin, Aspirin, Acetaminophen, and Albuterol. We further pinpointed drug combinations that may synergistically target COVID-19. In summary, we demonstrated that the integration of extensive interactions, deep neural networks, and rigorous validation can facilitate the rapid identification of candidate drugs for COVID-19 treatment. This is a post-peer-review, pre-copyedit version of an article published in Scientific Reports The final authenticated version is available online at: https://www.nature.com/articles/s41598-021-02353-5
△ Less
Submitted 1 February, 2022; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Modulated Sparse Superposition Codes for the Complex AWGN Channel
Authors:
Kuan Hsieh,
Ramji Venkataramanan
Abstract:
This paper studies a generalization of sparse superposition codes (SPARCs) for communication over the complex additive white Gaussian noise (AWGN) channel. In a SPARC, the codebook is defined in terms of a design matrix, and each codeword is a generated by multiplying the design matrix with a sparse message vector. In the standard SPARC construction, information is encoded in the locations of the…
▽ More
This paper studies a generalization of sparse superposition codes (SPARCs) for communication over the complex additive white Gaussian noise (AWGN) channel. In a SPARC, the codebook is defined in terms of a design matrix, and each codeword is a generated by multiplying the design matrix with a sparse message vector. In the standard SPARC construction, information is encoded in the locations of the non-zero entries of the message vector. In this paper we generalize the construction and consider modulated SPARCs, where information in encoded in both the locations and the values of the non-zero entries of the message vector. We focus on the case where the non-zero entries take values from a phase-shift keying (PSK) constellation. We propose a computationally efficient approximate message passing (AMP) decoder, and obtain analytical bounds on the state evolution parameters which predict the error performance of the decoder. Using these bounds we show that PSK-modulated SPARCs are asymptotically capacity achieving for the complex AWGN channel, with either spatial coupling or power allocation. We also provide numerical simulation results to demonstrate the error performance at finite code lengths. These results show that introducing modulation to the SPARC design can significantly reduce decoding complexity without sacrificing error performance.
△ Less
Submitted 11 May, 2021; v1 submitted 20 April, 2020;
originally announced April 2020.
-
Capacity-achieving Spatially Coupled Sparse Superposition Codes with AMP Decoding
Authors:
Cynthia Rush,
Kuan Hsieh,
Ramji Venkataramanan
Abstract:
Sparse superposition codes, also called sparse regression codes (SPARCs), are a class of codes for efficient communication over the AWGN channel at rates approaching the channel capacity. In a standard SPARC, codewords are sparse linear combinations of columns of an i.i.d. Gaussian design matrix, while in a spatially coupled SPARC the design matrix has a block-wise structure, where the variance of…
▽ More
Sparse superposition codes, also called sparse regression codes (SPARCs), are a class of codes for efficient communication over the AWGN channel at rates approaching the channel capacity. In a standard SPARC, codewords are sparse linear combinations of columns of an i.i.d. Gaussian design matrix, while in a spatially coupled SPARC the design matrix has a block-wise structure, where the variance of the Gaussian entries can be varied across blocks. A well-designed spatial coupling structure can significantly enhance the error performance of iterative decoding algorithms such as Approximate Message Passing (AMP).
In this paper, we obtain a non-asymptotic bound on the probability of error of spatially coupled SPARCs with AMP decoding. Applying this bound to a simple band-diagonal design matrix, we prove that spatially coupled SPARCs with AMP decoding achieve the capacity of the AWGN channel. The bound also highlights how the decay of error probability depends on each design parameter of the spatially coupled SPARC. An attractive feature of AMP decoding is that its asymptotic mean squared error (MSE) can be predicted via a deterministic recursion called state evolution. Our result provides the first proof that the MSE concentrates on the state evolution prediction for spatially coupled designs. Combined with the state evolution prediction, this result implies that spatially coupled SPARCs with the proposed band-diagonal design are capacity-achieving. Using the proof technique used to establish the main result, we also obtain a concentration inequality for the MSE of AMP applied to compressed sensing with spatially coupled design matrices. Finally we provide numerical simulation results that demonstrate the finite length error performance of spatially coupled SPARCs. The performance is compared with coded modulation schemes that use LDPC codes from the DVB-S2 standard.
△ Less
Submitted 8 May, 2021; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data
Authors:
Kevin Hsieh
Abstract:
The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Suc…
▽ More
The usability and practicality of any machine learning (ML) applications are largely influenced by two critical but hard-to-attain factors: low latency and low cost. Unfortunately, achieving low latency and low cost is very challenging when ML depends on real-world data that are highly distributed and rapidly growing (e.g., data collected by mobile phones and video cameras all over the world). Such real-world data pose many challenges in communication and computation. For example, when training data are distributed across data centers that span multiple continents, communication among data centers can easily overwhelm the limited wide-area network bandwidth, leading to prohibitively high latency and high cost.
In this dissertation, we demonstrate that the latency and cost of ML on highly-distributed and rapidly-growing data can be improved by one to two orders of magnitude by designing ML systems that exploit the characteristics of ML algorithms, ML model structures, and ML training/serving data. We support this thesis statement with three contributions. First, we design a system that provides both low-latency and low-cost ML serving (inferencing) over large-scale and continuously-growing datasets, such as videos. Second, we build a system that makes ML training over geo-distributed datasets as fast as training within a single data center. Third, we present a first detailed study and a system-level solution on a fundamental and largely overlooked problem: ML training over non-IID (i.e., not independent and identically distributed) data partitions (e.g., facial images collected by cameras varies according to the demographics of each camera's location).
△ Less
Submitted 18 October, 2019;
originally announced October 2019.
-
The Non-IID Data Quagmire of Decentralized Machine Learning
Authors:
Kevin Hsieh,
Amar Phanishayee,
Onur Mutlu,
Phillip B. Gibbons
Abstract:
Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by…
▽ More
Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.
△ Less
Submitted 18 August, 2020; v1 submitted 30 September, 2019;
originally announced October 2019.
-
Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips
Authors:
Kevin K. Chang,
Abhijith Kashyap,
Hasan Hassan,
Saugata Ghose,
Kevin Hsieh,
Donghyuk Lee,
Tianshi Li,
Gennady Pekhimenko,
Samira Khan,
Onur Mutlu
Abstract:
This article summarizes key results of our work on experimental characterization and analysis of latency variation and latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential.
The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chi…
▽ More
This article summarizes key results of our work on experimental characterization and analysis of latency variation and latency-reliability trade-offs in modern DRAM chips, which was published in SIGMETRICS 2016, and examines the work's significance and future potential.
The goal of this work is to (i) experimentally characterize and understand the latency variation across cells within a DRAM chip for these three fundamental DRAM operations, and (ii) develop new mechanisms that exploit our understanding of the latency variation to reliably improve performance. To this end, we comprehensively characterize 240 DRAM chips from three major vendors, and make six major new observations about latency variation within DRAM. Notably, we find that (i) there is large latency variation across the cells for each of the three operations; (ii) variation characteristics exhibit significant spatial locality: slower cells are clustered in certain regions of a DRAM chip; and (iii) the three fundamental operations exhibit different reliability characteristics when the latency of each operation is reduced.
Based on our observations, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism that exploits latency variation across DRAM cells within a DRAM chip to improve system performance. The key idea of FLY-DRAM is to exploit the spatial locality of slower cells within DRAM, and access the faster DRAM regions with reduced latencies for the fundamental operations. Our evaluations show that FLY-DRAM improves the performance of a wide range of applications by 13.3%, 17.6%, and 19.5%, on average, for each of the three different vendors' real DRAM chips, in a simulated 8-core system.
△ Less
Submitted 8 May, 2018;
originally announced May 2018.
-
Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance
Authors:
Nandita Vijaykumar,
Kevin Hsieh,
Gennady Pekhimenko,
Samira Khan,
Ashish Shrestha,
Saugata Ghose,
Adwait Jog,
Phillip B. Gibbons,
Onur Mutlu
Abstract:
The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of modern GPU programming models. This specification determines the parallelism, and hence performance, of the application during execution because the corresponding on-chip hardware resources are allocated a…
▽ More
The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of modern GPU programming models. This specification determines the parallelism, and hence performance, of the application during execution because the corresponding on-chip hardware resources are allocated and managed based on this specification. This tight-coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance. Zorua is a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer.
We demonstrate that by providing the illusion of more resources than physically available via controlled and coordinated virtualization, Zorua offers several important benefits: (i) Programming Ease. Zorua eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability. Zorua alleviates the necessity of re-tuning an application's resource usage when porting the application across GPU generations. (iii) Performance. By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
A Concept Learning Tool Based On Calculating Version Space Cardinality
Authors:
Kuo-Kai Hsieh,
Li-C. Wang
Abstract:
In this paper, we proposed VeSC-CoL (Version Space Cardinality based Concept Learning) to deal with concept learning on extremely imbalanced datasets, especially when cross-validation is not a viable option. VeSC-CoL uses version space cardinality as a measure for model quality to replace cross-validation. Instead of naive enumeration of the version space, Ordered Binary Decision Diagram and Boole…
▽ More
In this paper, we proposed VeSC-CoL (Version Space Cardinality based Concept Learning) to deal with concept learning on extremely imbalanced datasets, especially when cross-validation is not a viable option. VeSC-CoL uses version space cardinality as a measure for model quality to replace cross-validation. Instead of naive enumeration of the version space, Ordered Binary Decision Diagram and Boolean Satisfiability are used to compute the version space. Experiments show that VeSC-CoL can accurately learn the target concept when computational resource is allowed.
△ Less
Submitted 22 March, 2018;
originally announced March 2018.
-
Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management
Authors:
Nandita Vijaykumar,
Kevin Hsieh,
Gennady Pekhimenko,
Samira Khan,
Ashish Shrestha,
Saugata Ghose,
Phillip B. Gibbons,
Onur Mutlu
Abstract:
The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of the existing GPU programming models. This specification determines the performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely…
▽ More
The application resource specification--a static specification of several parameters such as the number of threads and the scratchpad memory usage per thread block--forms a critical component of the existing GPU programming models. This specification determines the performance of the application during execution because the corresponding on-chip hardware resources are allocated and managed purely based on this specification. This tight coupling between the software-provided resource specification and resource management in hardware leads to significant challenges in programming ease, portability, and performance, as we demonstrate in this work.
Our goal in this work is to reduce the dependence of performance on the software-provided resource specification to simultaneously alleviate the above challenges. To this end, we introduce Zorua, a new resource virtualization framework, that decouples the programmer-specified resource usage of a GPU application from the actual allocation in the on-chip hardware resources. Zorua enables this decoupling by virtualizing each resource transparently to the programmer.
We demonstrate that by providing the illusion of more resources than physically available, Zorua offers several important benefits: (i) Programming Ease: Zorua eases the burden on the programmer to provide code that is tuned to efficiently utilize the physically available on-chip resources. (ii) Portability: Zorua alleviates the necessity of re-tuning an application's resource usage when porting the application across GPU generations. (iii) Performance: By dynamically allocating resources and carefully oversubscribing them when necessary, Zorua improves or retains the performance of applications that are already highly tuned to best utilize the resources. The holistic virtualization provided by Zorua has many other potential uses which we describe in this paper.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions
Authors:
Saugata Ghose,
Kevin Hsieh,
Amirali Boroumand,
Rachata Ausavarungnirun,
Onur Mutlu
Abstract:
Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a pin-limited memory channel to the CPU before any computation can take place. This requires a high latency and energy overhead, and the data often cannot benefit from c…
▽ More
Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a pin-limited memory channel to the CPU before any computation can take place. This requires a high latency and energy overhead, and the data often cannot benefit from caching in the CPU, making it difficult to amortize the overhead.
Modern 3D-stacked DRAM architectures include a logic layer, where compute logic can be integrated underneath multiple layers of DRAM cell arrays within the same chip. Architects can take advantage of the logic layer to perform processing-in-memory (PIM), or near-data processing. In a PIM architecture, the logic layer within DRAM has access to the high internal bandwidth available within 3D-stacked DRAM (which is much greater than the bandwidth available between DRAM and the CPU). Thus, PIM architectures can effectively free up valuable memory channel bandwidth while reducing system energy consumption.
A number of important issues arise when we add compute logic to DRAM. In particular, the logic does not have low-latency access to common CPU structures that are essential for modern application execution, such as the virtual memory and cache coherence mechanisms. To ease the widespread adoption of PIM, we ideally would like to maintain traditional virtual memory abstractions and the shared memory programming model. This requires efficient mechanisms that can provide logic in DRAM with access to CPU structures without having to communicate frequently with the CPU. To this end, we propose and evaluate two general-purpose solutions that minimize unnecessary off-chip communication for PIM architectures. We show that both mechanisms improve the performance and energy consumption of many important memory-intensive applications.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
Focus: Querying Large Video Datasets with Low Latency and Low Cost
Authors:
Kevin Hsieh,
Ganesh Ananthanarayanan,
Peter Bodik,
Paramvir Bahl,
Matthai Philipose,
Phillip B. Gibbons,
Onur Mutlu
Abstract:
Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive…
▽ More
Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Spatially Coupled Sparse Regression Codes: Design and State Evolution Analysis
Authors:
Kuan Hsieh,
Cynthia Rush,
Ramji Venkataramanan
Abstract:
We consider the design and analysis of spatially coupled sparse regression codes (SC-SPARCs), which were recently introduced by Barbier et al. for efficient communication over the additive white Gaussian noise channel. SC-SPARCs can be efficiently decoded using an Approximate Message Passing (AMP) decoder, whose performance in each iteration can be predicted via a set of equations called state evo…
▽ More
We consider the design and analysis of spatially coupled sparse regression codes (SC-SPARCs), which were recently introduced by Barbier et al. for efficient communication over the additive white Gaussian noise channel. SC-SPARCs can be efficiently decoded using an Approximate Message Passing (AMP) decoder, whose performance in each iteration can be predicted via a set of equations called state evolution. In this paper, we give an asymptotic characterization of the state evolution equations for SC-SPARCs. For any given base matrix (that defines the coupling structure of the SC-SPARC) and rate, this characterization can be used to predict whether or not AMP decoding will succeed in the large system limit. We then consider a simple base matrix defined by two parameters $(ω, Λ)$, and show that AMP decoding succeeds in the large system limit for all rates $R < \mathcal{C}$. The asymptotic result also indicates how the parameters of the base matrix affect the decoding progression. Simulation results are presented to evaluate the performance of SC-SPARCs defined with the proposed base matrix.
△ Less
Submitted 26 April, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
D-SLATS: Distributed Simultaneous Localization and Time Synchronization
Authors:
Amr Alanwar,
Henrique Ferraz,
Kevin Hsieh,
Rohit Thazhath,
Paul Martin,
Joao Hespanha,
Mani Srivastava
Abstract:
Through the last decade, we have witnessed a surge of Internet of Things (IoT) devices, and with that a greater need to choreograph their actions across both time and space. Although these two problems, namely time synchronization and localization, share many aspects in common, they are traditionally treated separately or combined on centralized approaches that results in an ineffcient use of reso…
▽ More
Through the last decade, we have witnessed a surge of Internet of Things (IoT) devices, and with that a greater need to choreograph their actions across both time and space. Although these two problems, namely time synchronization and localization, share many aspects in common, they are traditionally treated separately or combined on centralized approaches that results in an ineffcient use of resources, or in solutions that are not scalable in terms of the number of IoT devices. Therefore, we propose D-SLATS, a framework comprised of three different and independent algorithms to jointly solve time synchronization and localization problems in a distributed fashion. The First two algorithms are based mainly on the distributed Extended Kalman Filter (EKF) whereas the third one uses optimization techniques. No fusion center is required, and the devices only communicate with their neighbors. The proposed methods are evaluated on custom Ultra-Wideband communication Testbed and a quadrotor, representing a network of both static and mobile nodes. Our algorithms achieve up to three microseconds time synchronization accuracy and 30 cm localization error.
△ Less
Submitted 10 November, 2017;
originally announced November 2017.
-
LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures
Authors:
Amirali Boroumand,
Saugata Ghose,
Minesh Patel,
Hasan Hassan,
Brandon Lucia,
Nastaran Hajinazar,
Kevin Hsieh,
Krishna T. Malladi,
Hongzhong Zheng,
Onur Mutlu
Abstract:
Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to e…
▽ More
Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to execute within the processor. Unfortunately, PIM architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages, which, as we illustrate in this work, can undo the benefits of PIM execution for many data-intensive applications. We propose LazyPIM, a new hardware cache coherence mechanism designed specifically for PIM. Prior approaches for coherence in PIM are ill-suited to applications that share a large amount of data between the processor and the PIM logic. LazyPIM uses a combination of speculative cache coherence and compressed coherence signatures to greatly reduce the overhead of keeping PIM coherent with the processor, even when a large amount of sharing exists.We find that LazyPIM improves average performance across a range of data-intensive PIM applications by 19.6%, reduces off-chip traffic by 30.9%, and reduces energy consumption by 18.0%, over the best prior approaches to PIM coherence.
△ Less
Submitted 9 June, 2017;
originally announced June 2017.