subscribe to arXiv mailings

Model-Based Diffusion for Trajectory Optimization

Authors: Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu

Abstract: Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new sc… ▽ More Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD's ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models. △ Less

Submitted 28 May, 2024; originally announced July 2024.

Comments: Website: https://lecar-lab.github.io/mbd/

arXiv:2406.06823 [pdf, other]

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

Authors: Alex DeWeese, Guannan Qu

Abstract: Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many dispa… ▽ More Many multi-agent systems in practice are decentralized and have dynamically varying dependencies. There has been a lack of attempts in the literature to analyze these systems theoretically. In this paper, we propose and theoretically analyze a decentralized model with dynamically varying dependencies called the Locally Interdependent Multi-Agent MDP. This model can represent problems in many disparate domains such as cooperative navigation, obstacle avoidance, and formation control. Despite the intractability that general partially observable multi-agent systems suffer from, we propose three closed-form policies that are theoretically near-optimal in this setting and can be scalable to compute and store. Consequentially, we reveal a fundamental property of Locally Interdependent Multi-Agent MDP's that the partially observable decentralized solution is exponentially close to the fully observable solution with respect to the visibility radius. We then discuss extensions of our closed-form policies to further improve tractability. We conclude by providing simulations to investigate some long horizon behaviors of our closed-form policies. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted to International Conference on Machine Learning 2024

arXiv:2406.00234 [pdf, other]

Learning to Stabilize Unknown LTI Systems on a Single Trajectory under Stochastic Noise

Authors: Ziyi Zhang, Yorie Nakahira, Guannan Qu

Abstract: We study the problem of learning to stabilize unknown noisy Linear Time-Invariant (LTI) systems on a single trajectory. It is well known in the literature that the learn-to-stabilize problem suffers from exponential blow-up in which the state norm blows up in the order of $Θ(2^n)$ where $n$ is the state space dimension. This blow-up is due to the open-loop instability when exploring the $n$-dimens… ▽ More We study the problem of learning to stabilize unknown noisy Linear Time-Invariant (LTI) systems on a single trajectory. It is well known in the literature that the learn-to-stabilize problem suffers from exponential blow-up in which the state norm blows up in the order of $Θ(2^n)$ where $n$ is the state space dimension. This blow-up is due to the open-loop instability when exploring the $n$-dimensional state space. To address this issue, we develop a novel algorithm that decouples the unstable subspace of the LTI system from the stable subspace, based on which the algorithm only explores and stabilizes the unstable subspace, the dimension of which can be much smaller than $n$. With a new singular-value-decomposition(SVD)-based analytical framework, we prove that the system is stabilized before the state norm reaches $2^{O(k \log n)}$, where $k$ is the dimension of the unstable subspace. Critically, this bound avoids exponential blow-up in state dimension in the order of $Θ(2^n)$ as in the previous works, and to the best of our knowledge, this is the first paper to avoid exponential blow-up in dimension for stabilizing LTI systems with noise. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.15322 [pdf, other]

Dishonest Approximate Computing: A Coming Crisis for Cloud Clients

Authors: Ye Wang, Jian Dong, Ming Han, Jin Wu, Gang Qu

Abstract: Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial prof… ▽ More Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial profits, untrusted CSPs may deploy low-cost AC devices and deceive clients by presenting AC services as promised accurate computing products, while falsely claiming AC outputs as accurate results. This misuse of AC will cause both financial loss and computing degradation to cloud clients. In this paper, we define this malicious attack as DisHonest Approximate Computing (DHAC) and analyze the technical challenges faced by clients in detecting such attacks. To address this issue, we propose two golden model free detection methods: Residual Class Check (RCC) and Forward-Backward Check (FBC). RCC provides clients a low-cost approach to infer the residual class to which a legitimate accurate output should belong. By comparing the residual class of the returned result, clients can determine whether a computing service contains any AC elements. FBC detects potential DHAC by computing an invertible check branch using the intermediate values of the program. It compares the values before entering and after returning from the check branch to identify any discrepancies. Both RCC and FBC can be executed concurrently with real computing tasks, enabling real-time DHAC detection with current inputs. Our experimental results show that both RCC and FBC can detect over 96%-99% of DHAC cases without misjudging any legitimate accurate results. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures

arXiv:2405.15307 [pdf, other]

Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation

Authors: Ge Qu, Jinyang Li, Bowen Li, Bowen Qin, Nan Huo, Chenhao Ma, Reynold Cheng

Abstract: Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-SQL. Previous methods generally employ a two-stage reasoning framework, namely 1) schema linking and 2) logical synthesis, making the framework not only effective but also interpretable. Despite these advancements, the inherent bad nature of the generalization of LLMs often resul… ▽ More Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-SQL. Previous methods generally employ a two-stage reasoning framework, namely 1) schema linking and 2) logical synthesis, making the framework not only effective but also interpretable. Despite these advancements, the inherent bad nature of the generalization of LLMs often results in hallucinations, which limits the full potential of LLMs. In this work, we first identify and categorize the common types of hallucinations at each stage in text-to-SQL. We then introduce a novel strategy, Task Alignment (TA), designed to mitigate hallucinations at each stage. TA encourages LLMs to take advantage of experiences from similar tasks rather than starting the tasks from scratch. This can help LLMs reduce the burden of generalization, thereby mitigating hallucinations effectively. We further propose TA-SQL, a text-to-SQL framework based on this strategy. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Specifically, it enhances the performance of the GPT-4 baseline by 21.23% relatively on BIRD dev and it yields significant improvements across six models and four mainstream, complex text-to-SQL benchmarks. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted to ACL Findings 2024

arXiv:2405.07977 [pdf, other]

A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

Authors: Anton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-Ping Wang

Abstract: Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized re… ▽ More Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 12 pages

arXiv:2405.03990 [pdf, other]

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Authors: Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models. △ Less

Submitted 19 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. The extended version of this paper is at arXiv:2404.14204

arXiv:2404.15819 [pdf, other]

APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption

Authors: Lin Ding, Song Bian, Penggao He, Yan Xu, Gang Qu, Jiliang Zhang

Abstract: Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE… ▽ More Fully Homomorphic Encryption (FHE) allows one to outsource computation over encrypted data to untrusted servers without worrying about data breaching. Since FHE is known to be extremely computationally-intensive, application-specific accelerators emerged as a powerful solution to narrow the performance gap. Nonetheless, due to the increasing complexities in FHE schemes per se and multi-scheme FHE algorithm designs in end-to-end privacy-preserving tasks, existing FHE accelerators often face the challenges of low hardware utilization rates and insufficient memory bandwidth. In this work, we present APACHE, a layered near-memory computing hierarchy tailored for multi-scheme FHE acceleration. By closely inspecting the data flow across different FHE schemes, we propose a layered near-memory computing architecture with fine-grained functional unit design to significantly enhance the utilization rates of both computational resources and memory bandwidth. In addition, we propose a multi-scheme operator compiler to efficiently schedule high-level FHE computations across lower-level functional units. In the experiment, we evaluate APACHE on various FHE applications, such as Lola MNIST, HELR, fully-packed bootstrapping, and fully homomorphic processors. The results illustrate that APACHE outperforms the state-of-the-art ASIC FHE accelerators by 2.4x to 19.8x over a variety of operator and application benchmarks. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.14204 [pdf, other]

TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

Authors: Guanqiao Qu, Zheng Lin, Qian Chen, Jian Li, Fangming Liu, Xianhao Chen, Kaibin Huang

Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models. △ Less

Submitted 12 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 15 pages, 11 figures. Part of this work has been accepted by ICDCS 2024

arXiv:2404.05001 [pdf, other]

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Authors: Gang Qu, Ping Wang, Xin Yuan

Abstract: Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, w… ▽ More Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, we propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras. Specifically, we unfold the computation graph of the iterative shrinkagethresholding algorithm (ISTA) into two alternative modules: efficient tensor gradient descent and hybrid-attention multiscale denoising. By virtue of Kronecker SPI, the gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI. The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration. Moreover, we build a SPI prototype to verify the effectiveness of the proposed method. Extensive experiments on synthetic and real data demonstrate that our method achieves the state-of-the-art performance. The source code and pre-trained models are available at https://github.com/Gang-Qu/HATNet-SPI. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: CVPR 2024

arXiv:2404.00144 [pdf, other]

An Interpretable Cross-Attentive Multi-modal MRI Fusion Framework for Schizophrenia Diagnosis

Authors: Ziyu Zhou, Anton Orlichenko, Gang Qu, Zening Fu, Vince D Calhoun, Zhengming Ding, Yu-Ping Wang

Abstract: Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In t… ▽ More Both functional and structural magnetic resonance imaging (fMRI and sMRI) are widely used for the diagnosis of mental disorder. However, combining complementary information from these two modalities is challenging due to their heterogeneity. Many existing methods fall short of capturing the interaction between these modalities, frequently defaulting to a simple combination of latent features. In this paper, we propose a novel Cross-Attentive Multi-modal Fusion framework (CAMF), which aims to capture both intra-modal and inter-modal relationships between fMRI and sMRI, enhancing multi-modal data representation. Specifically, our CAMF framework employs self-attention modules to identify interactions within each modality while cross-attention modules identify interactions between modalities. Subsequently, our approach optimizes the integration of latent features from both modalities. This approach significantly improves classification accuracy, as demonstrated by our evaluations on two extensive multi-modal brain imaging datasets, where CAMF consistently outperforms existing methods. Furthermore, the gradient-guided Score-CAM is applied to interpret critical functional networks and brain regions involved in schizophrenia. The bio-markers identified by CAMF align with established research, potentially offering new insights into the diagnosis and pathological endophenotypes of schizophrenia. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.13101 [pdf, other]

AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks

Authors: Zheng Lin, Guanqiao Qu, Wei Wei, Xianhao Chen, Kin K. Leung

Abstract: The increasing complexity of deep neural networks poses significant barriers to democratizing them to resource-limited edge devices. To address this challenge, split federated learning (SFL) has emerged as a promising solution by of floading the primary training workload to a server via model partitioning while enabling parallel training among edge devices. However, although system optimization su… ▽ More The increasing complexity of deep neural networks poses significant barriers to democratizing them to resource-limited edge devices. To address this challenge, split federated learning (SFL) has emerged as a promising solution by of floading the primary training workload to a server via model partitioning while enabling parallel training among edge devices. However, although system optimization substantially influences the performance of SFL under resource-constrained systems, the problem remains largely uncharted. In this paper, we provide a convergence analysis of SFL which quantifies the impact of model splitting (MS) and client-side model aggregation (MA) on the learning performance, serving as a theoretical foundation. Then, we propose AdaptSFL, a novel resource-adaptive SFL framework, to expedite SFL under resource-constrained edge computing systems. Specifically, AdaptSFL adaptively controls client-side MA and MS to balance communication-computing latency and training convergence. Extensive simulations across various datasets validate that our proposed AdaptSFL framework takes considerably less time to achieve a target accuracy than benchmarks, demonstrating the effectiveness of the proposed strategies. △ Less

Submitted 22 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 15 pages, 10 figures

arXiv:2403.05307 [pdf, other]

Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents

Authors: Jinyang Li, Nan Huo, Yan Gao, Jiayi Shi, Yingxiu Zhao, Ge Qu, Yurong Wu, Chenhao Ma, Jian-Guang Lou, Reynold Cheng

Abstract: Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM ag… ▽ More Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM agents on interactive data analysis. Tapilot-Crossing contains 1024 interactions, covering 4 practical scenarios: Normal, Action, Private, and Private Action. Notably, Tapilot-Crossing is constructed by an economical multi-agent environment, Decision Company, with few human efforts. We evaluate popular and advanced LLM agents in Tapilot-Crossing, which underscores the challenges of interactive data analysis. Furthermore, we propose Adaptive Interaction Reflection (AIR), a self-generated reflection strategy that guides LLM agents to learn from successful history. Experiments demonstrate that Air can evolve LLMs into effective interactive data analysis agents, achieving a relative performance improvement of up to 44.5%. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 30 pages, 7 figures

arXiv:2403.00222 [pdf, other]

Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale

Authors: Emile Anand, Guannan Qu

Abstract: We study reinforcement learning for global decision-making in the presence of many local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the rewards of both the global and the local agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has… ▽ More We study reinforcement learning for global decision-making in the presence of many local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the rewards of both the global and the local agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state/action space which can be exponential in the number of agents. This work proposes the $\texttt{SUB-SAMPLE-Q}$ algorithm where the global agent subsamples $k\leq n$ local agents to compute an optimal policy in time that is only exponential in $k$, providing an exponential speedup from standard methods that are exponential in $n$. We show that the learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+ε_{k,m})$ as the number of sub-sampled agents $k$ increases, where $ε_{k,m}$ is the Bellman noise, by proving a novel generalization of the Dvoretzky-Kiefer-Wolfowitz inequality to the regime of sampling without replacement. We also conduct numerical simulations in a demand-response setting and a queueing setting. △ Less

Submitted 22 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: 30 pages, 6 figures

ACM Class: I.2.6

arXiv:2402.01147 [pdf, other]

Efficient Reinforcement Learning for Routing Jobs in Heterogeneous Queueing Systems

Authors: Neharika Jali, Guannan Qu, Weina Wang, Gauri Joshi

Abstract: We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-tr… ▽ More We consider the problem of efficiently routing jobs that arrive into a central queue to a system of heterogeneous servers. Unlike homogeneous systems, a threshold policy, that routes jobs to the slow server(s) when the queue length exceeds a certain threshold, is known to be optimal for the one-fast-one-slow two-server system. But an optimal policy for the multi-server system is unknown and non-trivial to find. While Reinforcement Learning (RL) has been recognized to have great potential for learning policies in such cases, our problem has an exponentially large state space size, rendering standard RL inefficient. In this work, we propose ACHQ, an efficient policy gradient based algorithm with a low dimensional soft threshold policy parameterization that leverages the underlying queueing structure. We provide stationary-point convergence guarantees for the general case and despite the low-dimensional parameterization prove that ACHQ converges to an approximate global optimum for the special case of two servers. Simulations demonstrate an improvement in expected response time of up to ~30% over the greedy policy that routes to the fastest available server. △ Less

Submitted 21 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: AISTATS 2024; Corrected typos

arXiv:2401.16448 [pdf, other]

doi 10.1109/AsianHOST59942.2023.10409307

LLM4SecHW: Leveraging Domain Specific Large Language Model for Hardware Debugging

Authors: Weimin Fu, Kaichen Yang, Raj Gautam Dutta, Xiaolong Guo, Gang Qu

Abstract: This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose… ▽ More This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps, utilizing version control data. This dataset provides a substantial foundation for training machine learning models for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs. This pioneering approach offers a reference workflow for the application of fine tuning domain specific LLMs in other research areas. We evaluate the performance of our proposed system on various open source hardware designs, demonstrating its efficacy in accurately identifying and correcting defects. Our work brings a new perspective on automating the quality control process in hardware design. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 6 pages. 1 figure

Journal ref: 2023 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Tianjin, China, 2023, pp. 1-6

arXiv:2401.10348 [pdf, other]

Exploring General Intelligence via Gated Graph Transformer in Functional Connectivity Studies

Authors: Gang Qu, Anton Orlichenko, Junqi Wang, Gemeng Zhang, Li Xiao, Aiying Zhang, Zhengming Ding, Yu-Ping Wang

Abstract: Functional connectivity (FC) as derived from fMRI has emerged as a pivotal tool in elucidating the intricacies of various psychiatric disorders and delineating the neural pathways that underpin cognitive and behavioral dynamics inherent to the human brain. While Graph Neural Networks (GNNs) offer a structured approach to represent neuroimaging data, they are limited by their need for a predefined… ▽ More Functional connectivity (FC) as derived from fMRI has emerged as a pivotal tool in elucidating the intricacies of various psychiatric disorders and delineating the neural pathways that underpin cognitive and behavioral dynamics inherent to the human brain. While Graph Neural Networks (GNNs) offer a structured approach to represent neuroimaging data, they are limited by their need for a predefined graph structure to depict associations between brain regions, a detail not solely provided by FCs. To bridge this gap, we introduce the Gated Graph Transformer (GGT) framework, designed to predict cognitive metrics based on FCs. Empirical validation on the Philadelphia Neurodevelopmental Cohort (PNC) underscores the superior predictive prowess of our model, further accentuating its potential in identifying pivotal neural connectivities that correlate with human cognitive processes. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07369 [pdf, other]

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

Authors: Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi

Abstract: Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the… ▽ More Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: 32 pages, 4 figures

arXiv:2312.11501 [pdf, other]

SYNC+SYNC: Software Cache Write Covert Channels Exploiting Memory-disk Synchronization

Authors: Congcong Chen, Jinhua Cui, Gang Qu, Jiliang Zhang

Abstract: Memory-disk synchronization is a critical technology for ensuring data correctness, integrity, and security, especially in systems that handle sensitive information like financial transactions and medical records. We propose SYNC+SYNC, a group of attacks that exploit the memory-disk synchronization primitives. SYNC+SYNC works by subtly varying the timing of synchronization on the write buffer, off… ▽ More Memory-disk synchronization is a critical technology for ensuring data correctness, integrity, and security, especially in systems that handle sensitive information like financial transactions and medical records. We propose SYNC+SYNC, a group of attacks that exploit the memory-disk synchronization primitives. SYNC+SYNC works by subtly varying the timing of synchronization on the write buffer, offering several advantages: 1) implemented purely in software, enabling deployment on any hardware devices; 2) resilient against existing cache partitioning and randomization techniques; 3) unaffected by prefetching techniques and cache replacement strategies. We present the principles of SYNC+SYNC through the implementation of two write covert channel protocols, using either a single file or page, and introduce three enhanced strategies that utilize multiple files and pages. The feasibility of these channels is demonstrated in both cross-process and cross-sandbox scenarios across diverse operating systems (OSes). Experimental results show that, the average rate can reach 2.036 Kb/s (with a peak rate of 14.762 Kb/s) and the error rate is 0% on Linux; when running on macOS, the average rate achieves 10.211 Kb/s (with a peak rate of 253.022 Kb/s) and the error rate is 0.004%. To the best of our knowledge, SYNC+SYNC is the first high-speed write covert channel for software cache. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: This manuscript was first submitted to the 33rd USENIX Security Symposium on June 6, 2023 (Summer Review Cycle)

arXiv:2312.04371 [pdf, other]

A Scalable Network-Aware Multi-Agent Reinforcement Learning Framework for Decentralized Inverter-based Voltage Control

Authors: Han Xu, Jialin Zheng, Guannan Qu

Abstract: This paper addresses the challenges associated with decentralized voltage control in power grids due to an increase in distributed generations (DGs). Traditional model-based voltage control methods struggle with the rapid energy fluctuations and uncertainties of these DGs. While multi-agent reinforcement learning (MARL) has shown potential for decentralized secondary control, scalability issues ar… ▽ More This paper addresses the challenges associated with decentralized voltage control in power grids due to an increase in distributed generations (DGs). Traditional model-based voltage control methods struggle with the rapid energy fluctuations and uncertainties of these DGs. While multi-agent reinforcement learning (MARL) has shown potential for decentralized secondary control, scalability issues arise when dealing with a large number of DGs. This problem lies in the dominant centralized training and decentralized execution (CTDE) framework, where the critics take global observations and actions. To overcome these challenges, we propose a scalable network-aware (SNA) framework that leverages network structure to truncate the input to the critic's Q-function, thereby improving scalability and reducing communication costs during training. Further, the SNA framework is theoretically grounded with provable approximation guarantee, and it can seamlessly integrate with multiple multi-agent actor-critic algorithms. The proposed SNA framework is successfully demonstrated in a system with 114 DGs, providing a promising solution for decentralized voltage control in increasingly complex power grid systems. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.01832 [pdf, other]

SPECRUN: The Danger of Speculative Runahead Execution in Processors

Authors: Chaoqun Shen, Gang Qu, Jiliang Zhang

Abstract: Runahead execution is a continuously evolving microarchitectural technique for processor performance. This paper introduces the first transient execution attack on the runahead execution, called SPECRUN, which exploits the unresolved branch prediction during runahead execution. We show that SPECRUN eliminates the limitation on the number of transient instructions posed by the reorder buffer size,… ▽ More Runahead execution is a continuously evolving microarchitectural technique for processor performance. This paper introduces the first transient execution attack on the runahead execution, called SPECRUN, which exploits the unresolved branch prediction during runahead execution. We show that SPECRUN eliminates the limitation on the number of transient instructions posed by the reorder buffer size, enhancing the exploitability and harmfulness of the attack. We concretely demonstrate a proof-of-concept attack that causes leaking secrets from a victim process, validate the merit of SPECRUN, and design a secure runahead execution scheme. This paper highlights the need to consider the security of potential optimization techniques before implementing them in a processor. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2309.16739 [pdf, other]

Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities

Authors: Zheng Lin, Guanqiao Qu, Qiyuan Chen, Xianhao Chen, Zhe Chen, Kaibin Huang

Abstract: Large language models (LLMs), which have shown remarkable capabilities, are revolutionizing AI development and potentially shaping our future. However, given their multimodality, the status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy. 6G mobile edge computing (MEC) systems may resolve these pressing… ▽ More Large language models (LLMs), which have shown remarkable capabilities, are revolutionizing AI development and potentially shaping our future. However, given their multimodality, the status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy. 6G mobile edge computing (MEC) systems may resolve these pressing issues. In this article, we explore the potential of deploying LLMs at the 6G edge. We start by introducing killer applications powered by multimodal LLMs, including robotics and healthcare, to highlight the need for deploying LLMs in the vicinity of end users. Then, we identify the critical challenges for LLM deployment at the edge and envision the 6G MEC architecture for LLMs. Furthermore, we delve into two design aspects, i.e., edge training and edge inference for LLMs. In both aspects, considering the inherent resource limitations at the edge, we discuss various cutting-edge techniques, including split learning/inference, parameter-efficient fine-tuning, quantization, and parameter-sharing inference, to facilitate the efficient deployment of LLMs. This article serves as a position paper for thoroughly identifying the motivation, challenges, and pathway for empowering LLMs at the 6G edge. △ Less

Submitted 4 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures

arXiv:2308.08896 [pdf, other]

Optimal Resource Allocation for U-Shaped Parallel Split Learning

Authors: Song Lyu, Zheng Lin, Guanqiao Qu, Xianhao Chen, Xiaoxia Huang, Pan Li

Abstract: Split learning (SL) has emerged as a promising approach for model training without revealing the raw data samples from the data owners. However, traditional SL inevitably leaks label privacy as the tail model (with the last layers) should be placed on the server. To overcome this limitation, one promising solution is to utilize U-shaped architecture to leave both early layers and last layers on th… ▽ More Split learning (SL) has emerged as a promising approach for model training without revealing the raw data samples from the data owners. However, traditional SL inevitably leaks label privacy as the tail model (with the last layers) should be placed on the server. To overcome this limitation, one promising solution is to utilize U-shaped architecture to leave both early layers and last layers on the user side. In this paper, we develop a novel parallel U-shaped split learning and devise the optimal resource optimization scheme to improve the performance of edge networks. In the proposed framework, multiple users communicate with an edge server for SL. We analyze the end-to-end delay of each client during the training process and design an efficient resource allocation algorithm, called LSCRA, which finds the optimal computing resource allocation and split layers. Our experimental results show the effectiveness of LSCRA and that U-shaped parallel split learning can achieve a similar performance with other SL baselines while preserving label privacy. Index Terms: U-shaped network, split learning, label privacy, resource allocation, 5G/6G edge networks. △ Less

Submitted 8 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 6 pages, 6 figures

arXiv:2307.07291 [pdf, other]

Sampling-Priors-Augmented Deep Unfolding Network for Robust Video Compressive Sensing

Authors: Yuhao Huang, Gangrong Qu, Youran Ge

Abstract: Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those network… ▽ More Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those networks need to be retrained to accommodate the new system. Such limitations hinder the real-time imaging and practical deployment of models. In this work, we propose a Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) for efficient and robust VCS reconstruction. Under the optimization-inspired deep unfolding framework, a lightweight and efficient U-net is exploited to downsize the model while improving overall performance. Moreover, the prior knowledge from the sampling model is utilized to dynamically modulate the network features to enable single SPA-DUN to handle arbitrary sampling settings, augmenting interpretability and generality. Extensive experiments on both simulation and real datasets demonstrate that SPA-DUN is not only applicable for various sampling settings with one single model but also achieves SOTA performance with incredible efficiency. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.12194 [pdf, ps, other]

Split Learning in 6G Edge Networks

Authors: Zheng Lin, Guanqiao Qu, Xianhao Chen, Kaibin Huang

Abstract: With the proliferation of distributed edge computing resources, the 6G mobile network will evolve into a network for connected intelligence. Along this line, the proposal to incorporate federated learning into the mobile edge has gained considerable interest in recent years. However, the deployment of federated learning faces substantial challenges as massive resource-limited IoT devices can hardl… ▽ More With the proliferation of distributed edge computing resources, the 6G mobile network will evolve into a network for connected intelligence. Along this line, the proposal to incorporate federated learning into the mobile edge has gained considerable interest in recent years. However, the deployment of federated learning faces substantial challenges as massive resource-limited IoT devices can hardly support on-device model training. This leads to the emergence of split learning (SL) which enables servers to handle the major training workload while still enhancing data privacy. In this article, we offer a brief overview of key advancements in SL and articulate its seamless integration with wireless edge networks. We begin by illustrating the tailored 6G architecture to support edge SL. Then, we examine the critical design issues for edge SL, including innovative resource-efficient learning frameworks and resource management strategies under a single edge server. Additionally, we expand the scope to multi-edge scenarios, exploring multi-edge collaboration and mobility management from a networking perspective. Finally, we discuss open problems for edge SL, including convergence analysis, asynchronous SL and U-shaped SL. △ Less

Submitted 24 January, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: 7 pages, 6 figures

arXiv:2305.03111 [pdf, other]

Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

Authors: Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C. C. Chang, Fei Huang, Reynold Cheng, Yongbin Li

Abstract: Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and rea… ▽ More Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and real-world applications. To mitigate this gap, we present Bird, a big benchmark for large-scale database grounded in text-to-SQL tasks, containing 12,751 pairs of text-to-SQL data and 95 databases with a total size of 33.4 GB, spanning 37 professional domains. Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases. To solve these problems, text-to-SQL models must feature database value comprehension in addition to semantic parsing. The experimental results demonstrate the significance of database values in generating accurate text-to-SQLs for big databases. Furthermore, even the most effective text-to-SQL models, i.e. ChatGPT, only achieves 40.08% in execution accuracy, which is still far from the human result of 92.96%, proving that challenges still stand. Besides, we also provide an efficiency analysis to offer insights into generating text-to-efficient-SQLs that are beneficial to industries. We believe that BIRD will contribute to advancing real-world applications of text-to-SQL research. The leaderboard and source code are available: https://bird-bench.github.io/. △ Less

Submitted 14 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2304.12072 [pdf, other]

Exploration and Exploitation of Hidden PMU Events

Authors: Yihao Yang, Pengfei Qiu, Chunlu Wang, Yu Jin, Dongsheng Wang, Gang Qu

Abstract: Performance Monitoring Unit (PMU) is a common hardware module in Intel CPUs. It can be used to record various CPU behaviors therefore it is often used for performance analysis and optimization. Of the 65536 event spaces, Intel has officially published only 200 or so. In this paper, we design a hidden PMU event collection method. And we found a large number of undocumented PMU events in CPUs of Sky… ▽ More Performance Monitoring Unit (PMU) is a common hardware module in Intel CPUs. It can be used to record various CPU behaviors therefore it is often used for performance analysis and optimization. Of the 65536 event spaces, Intel has officially published only 200 or so. In this paper, we design a hidden PMU event collection method. And we found a large number of undocumented PMU events in CPUs of Skylake, Kabylake, and Alderlake microarchitectures. We further demonstrate the existence of these events by using them for transient execution attack detection and build-side channel attacks. This also implies that these hidden PMU events have huge exploitation potential and security threats. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.11397 [pdf, other]

Vehicle as a Service (VaaS): Leverage Vehicles to Build Service Networks and Capabilities for Smart Cities

Authors: Xianhao Chen, Yiqin Deng, Haichuan Ding, Guanqiao Qu, Haixia Zhang, Pan Li, Yuguang Fang

Abstract: Smart cities demand resources for rich immersive sensing, ubiquitous communications, powerful computing, large storage, and high intelligence (SCCSI) to support various kinds of applications, such as public safety, connected and autonomous driving, smart and connected health, and smart living. At the same time, it is widely recognized that vehicles such as autonomous cars, equipped with significan… ▽ More Smart cities demand resources for rich immersive sensing, ubiquitous communications, powerful computing, large storage, and high intelligence (SCCSI) to support various kinds of applications, such as public safety, connected and autonomous driving, smart and connected health, and smart living. At the same time, it is widely recognized that vehicles such as autonomous cars, equipped with significantly powerful SCCSI capabilities, will become ubiquitous in future smart cities. By observing the convergence of these two trends, this article advocates the use of vehicles to build a cost-effective service network, called the Vehicle as a Service (VaaS) paradigm, where vehicles empowered with SCCSI capability form a web of mobile servers and communicators to provide SCCSI services in smart cities. Towards this direction, we first examine the potential use cases in smart cities and possible upgrades required for the transition from traditional vehicular ad hoc networks (VANETs) to VaaS. Then, we will introduce the system architecture of the VaaS paradigm and discuss how it can provide SCCSI services in future smart cities, respectively. At last, we identify the open problems of this paradigm and future research directions, including architectural design, service provisioning, incentive design, and security & privacy. We expect that this paper paves the way towards developing a cost-effective and sustainable approach for building smart cities. △ Less

Submitted 8 September, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

Comments: 32 pages, 11 figures

arXiv:2304.10877 [pdf, other]

Timing the Transient Execution: A New Side-Channel Attack on Intel CPUs

Authors: Yu Jin, Pengfei Qiu, Chunlu Wang, Yihao Yang, Dongsheng Wang, Gang Qu

Abstract: The transient execution attack is a type of attack leveraging the vulnerability of modern CPU optimization technologies. New attacks surface rapidly. The side-channel is a key part of transient execution attacks to leak data. In this work, we discover a vulnerability that the change of the EFLAGS register in transient execution may have a side effect on the Jcc (jump on condition code) instruction… ▽ More The transient execution attack is a type of attack leveraging the vulnerability of modern CPU optimization technologies. New attacks surface rapidly. The side-channel is a key part of transient execution attacks to leak data. In this work, we discover a vulnerability that the change of the EFLAGS register in transient execution may have a side effect on the Jcc (jump on condition code) instruction after it in Intel CPUs. Based on our discovery, we propose a new side-channel attack that leverages the timing of both transient execution and Jcc instructions to deliver data. This attack encodes secret data to the change of register which makes the execution time of context slightly slower, which can be measured by the attacker to decode data. This attack doesn't rely on the cache system and doesn't need to reset the EFLAGS register manually to its initial state before the attack, which may make it more difficult to detect or mitigate. We implemented this side-channel on machines with Intel Core i7-6700, i7-7700, and i9-10980XE CPUs. In the first two processors, we combined it as the side-channel of the Meltdown attack, which could achieve 100\% success leaking rate. We evaluate and discuss potential defenses against the attack. Our contributions include discovering security vulnerabilities in the implementation of Jcc instructions and EFLAGS register and proposing a new side-channel attack that does not rely on the cache system. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2211.17116 [pdf, other]

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Authors: Yizhou Zhang, Guannan Qu, Pan Xu, Yiheng Lin, Zaiwei Chen, Adam Wierman

Abstract: We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy us… ▽ More We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its $κ$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $κ$. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing $κ$. Numerical simulations demonstrate the effectiveness of LPI. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2211.11855 [pdf, other]

MES-Attacks: Software-Controlled Covert Channels based on Mutual Exclusion and Synchronization

Authors: Chaoqun Shen, Jiliang Zhang, Gang Qu

Abstract: Multi-process concurrency is effective in improving program efficiency and maximizing CPU utilization. The correct execution of concurrency is ensured by the mutual exclusion and synchronization mechanism (MESM) that manages the shared hardware and software resources. We propose MES-Attacks, a new set of software-controlled covert channel attacks based on MESM to transmit confidential information.… ▽ More Multi-process concurrency is effective in improving program efficiency and maximizing CPU utilization. The correct execution of concurrency is ensured by the mutual exclusion and synchronization mechanism (MESM) that manages the shared hardware and software resources. We propose MES-Attacks, a new set of software-controlled covert channel attacks based on MESM to transmit confidential information. MES-Attacks offer several advantages: 1) the covert channels are constructed at software level and can be deployed on any hardware; 2) closed share of resource ensures the quality of the channels with low interference and makes them hard to be detected; and 3) it utilizes the system's software resources which are abound and hence difficult to isolate. We built covert channels using different MESMs on Windows and Linux, including Event, Timer, FileLockEX, Mutex, Semaphore and flock. Experimental results demonstrate that these covert channels can achieve transmission rate of 13.105 kb/s, 12.383 kb/s, and 6.552 kb/s, respectively in the scenarios of local, cross-sandbox and cross-VM, where the bit error rates are all under 1\%. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2207.11689 [pdf, other]

PMUSpill: The Counters in Performance Monitor Unit that Leak SGX-Protected Secrets

Authors: Pengfei Qiu, Yongqiang Lyu, Haixia Wang, Dongsheng Wang, Chang Liu, Qiang Gao, Chunlu Wang, Rihui Sun, Gang Qu

Abstract: Performance Monitor Unit (PMU) is a significant hardware module on the current processors, which counts the events launched by processor into a set of PMU counters. Ideally, the events triggered by instructions that are executed but the results are not successfully committed (transient execution) should not be recorded. However, in this study, we discover that some PMU events triggered by the tran… ▽ More Performance Monitor Unit (PMU) is a significant hardware module on the current processors, which counts the events launched by processor into a set of PMU counters. Ideally, the events triggered by instructions that are executed but the results are not successfully committed (transient execution) should not be recorded. However, in this study, we discover that some PMU events triggered by the transient execution instructions will actually be recorded by PMU. Based on this, we propose the PMUSpill attack, which enables attackers to maliciously leak the secret data that are loaded during transient executions. The biggest challenge is how to encode the secret data into PMU events. We construct an instruction gadget to solve this challenge, whose execution path that can be identified by PMU counters represents what values the secret data are. We successfully implement the PMUSpill attack to leak the secret data stored in Intel Software Guard Extensions (SGX) (a Trusted Execution Environment (TEE) in the Intel's processors) through real experiments. Besides, we locate the vulnerable PMU counters and their trigger instructions by iterating all the valid PMU counters and instructions. The experiment results demonstrate that there are up to 20 PMU counters available to implement the PMUSpill attack. We also provide some possible hardware and software-based countermeasures for addressing the PMUSpill attack, which can be utilized to enhance the security of processors in future. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2206.01704 [pdf, ps, other]

KCRL: Krasovskii-Constrained Reinforcement Learning with Guaranteed Stability in Nonlinear Dynamical Systems

Authors: Sahin Lale, Yuanyuan Shi, Guannan Qu, Kamyar Azizzadenesheli, Adam Wierman, Anima Anandkumar

Abstract: Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lya… ▽ More Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lyapunov functions as a stability constraint. The proposed method learns the system dynamics up to a confidence interval using feature representation, e.g. Random Fourier Features. It then solves a constrained policy optimization problem with a stability constraint based on Krasovskii's method using a primal-dual approach to recover a stabilizing policy. We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system. We also derive the sample complexity upper bound for stabilization of unknown nonlinear dynamical systems via the KCRL framework. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2206.01341 [pdf, other]

Equipping Black-Box Policies with Model-Based Advice for Stable Nonlinear Control

Authors: Tongxin Li, Ruixiao Yang, Guannan Qu, Yiheng Lin, Steven Low, Adam Wierman

Abstract: Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equipping a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive conv… ▽ More Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of equipping a black-box control policy with model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive $λ$-confident policy, with a coefficient $λ$ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive $λ$-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive $λ$-confident policy and verify its efficacy in case studies about the CartPole problem and a real-world electric vehicle (EV) charging problem with data bias due to COVID-19. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 33 pages, 7 figures

arXiv:2204.05551 [pdf, other]

Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems

Authors: Sungho Shin, Yiheng Lin, Guannan Qu, Adam Wierman, Mihai Anitescu

Abstract: This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. Th… ▽ More This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $κ$-distributed control, which lets the agents make control decisions based on the state information within distance $κ$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $κ$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $κ$-distributed control and centralized optimal control becomes exponentially small in $κ$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems. △ Less

Submitted 11 September, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

arXiv:2201.06192 [pdf, other]

Fooling the Eyes of Autonomous Vehicles: Robust Physical Adversarial Examples Against Traffic Sign Recognition Systems

Authors: Wei Jia, Zhaojun Lu, Haichun Zhang, Zhenglin Liu, Jie Wang, Gang Qu

Abstract: Adversarial Examples (AEs) can deceive Deep Neural Networks (DNNs) and have received a lot of attention recently. However, majority of the research on AEs is in the digital domain and the adversarial patches are static, which is very different from many real-world DNN applications such as Traffic Sign Recognition (TSR) systems in autonomous vehicles. In TSR systems, object detectors use DNNs to pr… ▽ More Adversarial Examples (AEs) can deceive Deep Neural Networks (DNNs) and have received a lot of attention recently. However, majority of the research on AEs is in the digital domain and the adversarial patches are static, which is very different from many real-world DNN applications such as Traffic Sign Recognition (TSR) systems in autonomous vehicles. In TSR systems, object detectors use DNNs to process streaming video in real time. From the view of object detectors, the traffic sign`s position and quality of the video are continuously changing, rendering the digital AEs ineffective in the physical world. In this paper, we propose a systematic pipeline to generate robust physical AEs against real-world object detectors. Robustness is achieved in three ways. First, we simulate the in-vehicle cameras by extending the distribution of image transformations with the blur transformation and the resolution transformation. Second, we design the single and multiple bounding boxes filters to improve the efficiency of the perturbation training. Third, we consider four representative attack vectors, namely Hiding Attack, Appearance Attack, Non-Target Attack and Target Attack. We perform a comprehensive set of experiments under a variety of environmental conditions, and considering illuminations in sunny and cloudy weather as well as at night. The experimental results show that the physical AEs generated from our pipeline are effective and robust when attacking the YOLO v5 based TSR system. The attacks have good transferability and can deceive other state-of-the-art object detectors. We launched HA and NTA on a brand-new 2021 model vehicle. Both attacks are successful in fooling the TSR system, which could be a life-threatening case for autonomous vehicles. Finally, we discuss three defense mechanisms based on image preprocessing, AEs detection, and model enhancing. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: 17 pages, 15 figures

arXiv:2112.03662 [pdf, other]

Lightning: Striking the Secure Isolation on GPU Clouds with Transient Hardware Faults

Authors: Rihui Sun, Pefei Qiu, Yongqiang Lyu, Donsheng Wang, Jiang Dong, Gang Qu

Abstract: GPU clouds have become a popular computing platform because of the cost of owning and maintaining high-performance computing clusters. Many cloud architectures have also been proposed to ensure a secure execution environment for guest applications by enforcing strong security policies to isolate the untrusted hypervisor from the guest virtual machines (VMs). In this paper, we study the impact of G… ▽ More GPU clouds have become a popular computing platform because of the cost of owning and maintaining high-performance computing clusters. Many cloud architectures have also been proposed to ensure a secure execution environment for guest applications by enforcing strong security policies to isolate the untrusted hypervisor from the guest virtual machines (VMs). In this paper, we study the impact of GPU chip's hardware faults on the security of cloud "trusted" execution environment using Deep Neural Network (DNN) as the underlying application. We show that transient hardware faults of GPUs can be generated by exploiting the Dynamic Voltage and Frequency Scaling (DVFS) technology, and these faults may cause computation errors, but they have limited impact on the inference accuracy of DNN due to the robustness and fault-tolerant nature of well-developed DNN models. To take full advantage of these transient hardware faults, we propose the Lightning attack to locate the fault injection targets of DNNs and to control the fault injection precision in terms of timing and position. We conduct experiments on three commodity GPUs to attack four widely-used DNNs. Experimental results show that the proposed attack can reduce the inference accuracy of the models by as high as 78.3\% and 64.5\% on average. More importantly, 67.9\% of the targeted attacks have successfully misled the models to give our desired incorrect inference result. This demonstrates that the secure isolation on GPU clouds is vulnerable against transient hardware faults and the computation results may not be trusted. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2112.00471 [pdf, other]

doi 10.1109/TC.2021.3131049

Triangle Counting Accelerations: From Algorithm to In-Memory Computing Architecture

Authors: Xueyan Wang, Jianlei Yang, Yinglin Zhao, Xiaotao Jia, Rong Yin, Xuhang Chen, Gang Qu, Weisheng Zhao

Abstract: Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the high memory-computation ratio and random memory access pattern, TC involves a large amount of data transfers thus suffers from the bandwidth bottleneck in the trad… ▽ More Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the high memory-computation ratio and random memory access pattern, TC involves a large amount of data transfers thus suffers from the bandwidth bottleneck in the traditional Von-Neumann architecture. To overcome this challenge, in this paper, we propose to accelerate TC with the emerging processing-in-memory (PIM) architecture through an algorithm-architecture co-optimization manner. To enable the efficient in-memory implementations, we come up to reformulate TC with bitwise logic operations (such as AND), and develop customized graph compression and mapping techniques for efficient data flow management. With the emerging computational Spin-Transfer Torque Magnetic RAM (STT-MRAM) array, which is one of the most promising PIM enabling techniques, the device-to-architecture co-simulation results demonstrate that the proposed TC in-memory accelerator outperforms the state-of-the-art GPU and FPGA accelerations by 12.2x and 31.8x, respectively, and achieves a 34x energy efficiency improvement over the FPGA accelerator. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.10702

Journal ref: IEEE Transactions on Computers, 2021

arXiv:2110.00096 [pdf, other]

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

Authors: Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, Yongming Liu

Abstract: In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each… ▽ More In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of Q-function on other agents decreases exponentially as the distance between them increases. Furthermore, the complexity of DGRM is related to the local information size of the largest $κ$-hop neighborhood, and DGRM can find an $O(ρ^{κ+1})$-approximation of a stationary point of the objective function. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. △ Less

Submitted 30 September, 2021; originally announced October 2021.

arXiv:2104.14134 [pdf, other]

Stable Online Control of Linear Time-Varying Systems

Authors: Guannan Qu, Yuanyuan Shi, Sahin Lale, Anima Anandkumar, Adam Wierman

Abstract: Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the central problems in control theory. However, existing approaches that guarantee stability typically lead to significantly sub-optimal cumulative control cost in online settings where only current or short-te… ▽ More Linear time-varying (LTV) systems are widely used for modeling real-world dynamical systems due to their generality and simplicity. Providing stability guarantees for LTV systems is one of the central problems in control theory. However, existing approaches that guarantee stability typically lead to significantly sub-optimal cumulative control cost in online settings where only current or short-term system information is available. In this work, we propose an efficient online control algorithm, COvariance Constrained Online Linear Quadratic (COCO-LQ) control, that guarantees input-to-state stability for a large class of LTV systems while also minimizing the control cost. The proposed method incorporates a state covariance constraint into the semi-definite programming (SDP) formulation of the LQ optimal controller. We empirically demonstrate the performance of COCO-LQ in both synthetic experiments and a power system frequency control example. △ Less

Submitted 29 April, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: 3rd Annual Learning for Dynamics & Control Conference (L4DC)

arXiv:2103.03701 [pdf, other]

Don't Forget to Sign the Gradients!

Authors: Omid Aramoon, Pin-Yu Chen, Gang Qu

Abstract: Engineering a top-notch deep learning model is an expensive procedure that involves collecting data, hiring human resources with expertise in machine learning, and providing high computational resources. For that reason, deep learning models are considered as valuable Intellectual Properties (IPs) of the model vendors. To ensure reliable commercialization of deep learning models, it is crucial to… ▽ More Engineering a top-notch deep learning model is an expensive procedure that involves collecting data, hiring human resources with expertise in machine learning, and providing high computational resources. For that reason, deep learning models are considered as valuable Intellectual Properties (IPs) of the model vendors. To ensure reliable commercialization of deep learning models, it is crucial to develop techniques to protect model vendors against IP infringements. One of such techniques that recently has shown great promise is digital watermarking. However, current watermarking approaches can embed very limited amount of information and are vulnerable against watermark removal attacks. In this paper, we present GradSigns, a novel watermarking framework for deep neural networks (DNNs). GradSigns embeds the owner's signature into the gradient of the cross-entropy cost function with respect to inputs to the model. Our approach has a negligible impact on the performance of the protected model and it allows model vendors to remotely verify the watermark through prediction APIs. We evaluate GradSigns on DNNs trained for different image classification tasks using CIFAR-10, SVHN, and YTF datasets. Experimental results show that GradSigns is robust against all known counter-watermark attacks and can embed a large amount of information into DNNs. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: Accepted to MLSys 2021

arXiv:2102.05561 [pdf, other]

Meta Federated Learning

Authors: Omid Aramoon, Pin-Yu Chen, Gang Qu, Yuan Tian

Abstract: Due to its distributed methodology alongside its privacy-preserving features, Federated Learning (FL) is vulnerable to training time adversarial attacks. In this study, our focus is on backdoor attacks in which the adversary's goal is to cause targeted misclassifications for inputs embedded with an adversarial trigger while maintaining an acceptable performance on the main learning task at hand. C… ▽ More Due to its distributed methodology alongside its privacy-preserving features, Federated Learning (FL) is vulnerable to training time adversarial attacks. In this study, our focus is on backdoor attacks in which the adversary's goal is to cause targeted misclassifications for inputs embedded with an adversarial trigger while maintaining an acceptable performance on the main learning task at hand. Contemporary defenses against backdoor attacks in federated learning require direct access to each individual client's update which is not feasible in recent FL settings where Secure Aggregation is deployed. In this study, we seek to answer the following question, Is it possible to defend against backdoor attacks when secure aggregation is in place?, a question that has not been addressed by prior arts. To this end, we propose Meta Federated Learning (Meta-FL), a novel variant of federated learning which not only is compatible with secure aggregation protocol but also facilitates defense against backdoor attacks. We perform a systematic evaluation of Meta-FL on two classification datasets: SVHN and GTSRB. The results show that Meta-FL not only achieves better utility than classic FL, but also enhances the performance of contemporary defenses in terms of robustness against adversarial attacks. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 11 pages, 5 figures

arXiv:2102.01168 [pdf, other]

doi 10.1109/TSG.2022.3154718

Reinforcement Learning for Selective Key Applications in Power Systems: Recent Advances and Future Challenges

Authors: Xin Chen, Guannan Qu, Yujie Tang, Steven Low, Na Li

Abstract: With large-scale integration of renewable generation and distributed energy resources, modern power systems are confronted with new operational challenges, such as growing complexity, increasing uncertainty, and aggravating volatility. Meanwhile, more and more data are becoming available owing to the widespread deployment of smart meters, smart sensors, and upgraded communication networks. As a re… ▽ More With large-scale integration of renewable generation and distributed energy resources, modern power systems are confronted with new operational challenges, such as growing complexity, increasing uncertainty, and aggravating volatility. Meanwhile, more and more data are becoming available owing to the widespread deployment of smart meters, smart sensors, and upgraded communication networks. As a result, data-driven control techniques, especially reinforcement learning (RL), have attracted surging attention in recent years. This paper provides a comprehensive review of various RL techniques and how they can be applied to decision-making and control in power systems. In particular, we select three key applications, i.e., frequency regulation, voltage control, and energy management, as examples to illustrate RL-based models and solutions. We then present the critical issues in the application of RL, i.e., safety, robustness, scalability, and data. Several potential future directions are discussed as well. △ Less

Submitted 25 February, 2022; v1 submitted 26 January, 2021; originally announced February 2021.

arXiv:2101.08316 [pdf, other]

Ensemble manifold based regularized multi-modal graph convolutional network for cognitive ability prediction

Authors: Gang Qu, Li Xiao, Wenxing Hu, Kun Zhang, Vince D. Calhoun, Yu-Ping Wang

Abstract: Objective: Multi-modal functional magnetic resonance imaging (fMRI) can be used to make predictions about individual behavioral and cognitive traits based on brain connectivity networks. Methods: To take advantage of complementary information from multi-modal fMRI, we propose an interpretable multi-modal graph convolutional network (MGCN) model, incorporating the fMRI time series and the functiona… ▽ More Objective: Multi-modal functional magnetic resonance imaging (fMRI) can be used to make predictions about individual behavioral and cognitive traits based on brain connectivity networks. Methods: To take advantage of complementary information from multi-modal fMRI, we propose an interpretable multi-modal graph convolutional network (MGCN) model, incorporating the fMRI time series and the functional connectivity (FC) between each pair of brain regions. Specifically, our model learns a graph embedding from individual brain networks derived from multi-modal data. A manifold-based regularization term is then enforced to consider the relationships of subjects both within and between modalities. Furthermore, we propose the gradient-weighted regression activation mapping (Grad-RAM) and the edge mask learning to interpret the model, which is used to identify significant cognition-related biomarkers. Results: We validate our MGCN model on the Philadelphia Neurodevelopmental Cohort to predict individual wide range achievement test (WRAT) score. Our model obtains superior predictive performance over GCN with a single modality and other competing approaches. The identified biomarkers are cross-validated from different approaches. Conclusion and Significance: This paper develops a new interpretable graph deep learning framework for cognitive ability prediction, with the potential to overcome the limitations of several current data-fusion models. The results demonstrate the power of MGCN in analyzing multi-modal fMRI and discovering significant biomarkers for human brain studies. △ Less

Submitted 20 January, 2021; originally announced January 2021.

arXiv:2008.09930 [pdf, other]

DMRO:A Deep Meta Reinforcement Learning-based Task Offloading Framework for Edge-Cloud Computing

Authors: Guanjin Qu, Huaming Wu

Abstract: With the continuous growth of mobile data and the unprecedented demand for computing power, resource-constrained edge devices cannot effectively meet the requirements of Internet of Things (IoT) applications and Deep Neural Network (DNN) computing. As a distributed computing paradigm, edge offloading that migrates complex tasks from IoT devices to edge-cloud servers can break through the resource… ▽ More With the continuous growth of mobile data and the unprecedented demand for computing power, resource-constrained edge devices cannot effectively meet the requirements of Internet of Things (IoT) applications and Deep Neural Network (DNN) computing. As a distributed computing paradigm, edge offloading that migrates complex tasks from IoT devices to edge-cloud servers can break through the resource limitation of IoT devices, reduce the computing burden and improve the efficiency of task processing. However, the problem of optimal offloading decision-making is NP-hard, traditional optimization methods are difficult to achieve results efficiently. Besides, there are still some shortcomings in existing deep learning methods, e.g., the slow learning speed and the failure of the original network parameters when the environment changes. To tackle these challenges, we propose a Deep Meta Reinforcement Learning-based offloading (DMRO) algorithm, which combines multiple parallel DNNs with Q-learning to make fine-grained offloading decisions. By aggregating the perceptive ability of deep learning, the decision-making ability of reinforcement learning, and the rapid environment learning ability of meta-learning, it is possible to quickly and flexibly obtain the optimal offloading strategy from the IoT environment. Simulation results demonstrate that the proposed algorithm achieves obvious improvement over the Deep Q-Learning algorithm and has strong portability in making real-time offloading decisions even in time-varying IoT environments. △ Less

Submitted 22 August, 2020; originally announced August 2020.

arXiv:2007.15348

Who Is Charging My Phone? Identifying Wireless Chargers via Fingerprinting

Authors: Zhiyun Wang, Jiayu Zhang, Xiaoyu Ji, Wenyuan Xu, Gang Qu, Minjian Zhao

Abstract: With the increasing popularity of the Internet of Things(IoT) devices, the demand for fast and convenient battery charging services grows rapidly. Wireless charging is a promising technology for such a purpose and its usage has become ubiquitous. However, the close distance between the charger and the device being charged not only makes proximity-based and near field communication attacks possible… ▽ More With the increasing popularity of the Internet of Things(IoT) devices, the demand for fast and convenient battery charging services grows rapidly. Wireless charging is a promising technology for such a purpose and its usage has become ubiquitous. However, the close distance between the charger and the device being charged not only makes proximity-based and near field communication attacks possible, but also introduces a new type of vulnerabilities. In this paper, we propose to create fingerprints for wireless chargers based on the intrinsic non-linear distortion effects of the underlying charging circuit. Using such fingerprints, we design the WirelessID system to detect potential short-range malicious wireless charging attacks. WirelessID collects signals in the standby state of the charging process and sends them to a trusted server, which can extract the fingerprint and then identify the charger. △ Less

Submitted 4 August, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: Sorry, the content of this paper has to be revised a lot, so we decided to withdraw it first

arXiv:2007.10702 [pdf, other]

TCIM: Triangle Counting Acceleration With Processing-In-MRAM Architecture

Authors: Xueyan Wang, Jianlei Yang, Yinglin Zhao, Yingjie Qi, Meichen Liu, Xingzhou Cheng, Xiaotao Jia, Xiaoming Chen, Gang Qu, Weisheng Zhao

Abstract: Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer from the bandwidth bottleneck because TC calculation involves a large amount of data transfers. In this paper, we propose to overcome this challenge by designi… ▽ More Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer from the bandwidth bottleneck because TC calculation involves a large amount of data transfers. In this paper, we propose to overcome this challenge by designing a TC accelerator utilizing the emerging processing-in-MRAM (PIM) architecture. The true innovation behind our approach is a novel method to perform TC with bitwise logic operations (such as \texttt{AND}), instead of the traditional approaches such as matrix computations. This enables the efficient in-memory implementations of TC computation, which we demonstrate in this paper with computational Spin-Transfer Torque Magnetic RAM (STT-MRAM) arrays. Furthermore, we develop customized graph slicing and mapping techniques to speed up the computation and reduce the energy consumption. We use a device-to-architecture co-simulation framework to validate our proposed TC accelerator. The results show that our data mapping strategy could reduce $99.99\%$ of the computation and $72\%$ of the memory \texttt{WRITE} operations. Compared with the existing GPU or FPGA accelerators, our in-memory accelerator achieves speedups of $9\times$ and $23.4\times$, respectively, and a $20.6\times$ energy efficiency improvement over the FPGA accelerator. △ Less

Submitted 21 July, 2020; originally announced July 2020.

Comments: published on DAC 2020

arXiv:2006.11029 [pdf, ps, other]

Learning Optimal Power Flow: Worst-Case Guarantees for Neural Networks

Authors: Andreas Venzke, Guannan Qu, Steven Low, Spyros Chatzivasileiadis

Abstract: This paper introduces for the first time a framework to obtain provable worst-case guarantees for neural network performance, using learning for optimal power flow (OPF) problems as a guiding example. Neural networks have the potential to substantially reduce the computing time of OPF solutions. However, the lack of guarantees for their worst-case performance remains a major barrier for their adop… ▽ More This paper introduces for the first time a framework to obtain provable worst-case guarantees for neural network performance, using learning for optimal power flow (OPF) problems as a guiding example. Neural networks have the potential to substantially reduce the computing time of OPF solutions. However, the lack of guarantees for their worst-case performance remains a major barrier for their adoption in practice. This work aims to remove this barrier. We formulate mixed-integer linear programs to obtain worst-case guarantees for neural network predictions related to (i) maximum constraint violations, (ii) maximum distances between predicted and optimal decision variables, and (iii) maximum sub-optimality. We demonstrate our methods on a range of PGLib-OPF networks up to 300 buses. We show that the worst-case guarantees can be up to one order of magnitude larger than the empirical lower bounds calculated with conventional methods. More importantly, we show that the worst-case predictions appear at the boundaries of the training input domain, and we demonstrate how we can systematically reduce the worst-case guarantees by training on a larger input domain than the domain they are evaluated on. △ Less

Submitted 19 June, 2020; originally announced June 2020.

Comments: The code to reproduce the simulation results is available https://doi.org/10.5281/zenodo.3871755

arXiv:2006.07476 [pdf, other]

Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach

Authors: Guannan Qu, Chenkai Yu, Steven Low, Adam Wierman

Abstract: Model-free learning-based control methods have seen great success recently. However, such methods typically suffer from poor sample complexity and limited convergence guarantees. This is in sharp contrast to classical model-based control, which has a rich theory but typically requires strong modeling assumptions. In this paper, we combine the two approaches to achieve the best of both worlds. We c… ▽ More Model-free learning-based control methods have seen great success recently. However, such methods typically suffer from poor sample complexity and limited convergence guarantees. This is in sharp contrast to classical model-based control, which has a rich theory but typically requires strong modeling assumptions. In this paper, we combine the two approaches to achieve the best of both worlds. We consider a dynamical system with both linear and non-linear components and develop a novel approach to use the linear model to define a warm start for a model-free, policy gradient method. We show this hybrid approach outperforms the model-based controller while avoiding the convergence issues associated with model-free approaches via both numerical experiments and theoretical analyses, in which we derive sufficient conditions on the non-linear component such that our approach is guaranteed to converge to the (nearly) global optimal controller. △ Less

Submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.06626 [pdf, other]

Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward

Authors: Guannan Qu, Yiheng Lin, Adam Wierman, Na Li

Abstract: It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifi… ▽ More It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Showing 1–50 of 62 results for author: Qu, G