-
Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond
Authors:
Dewu Zheng,
Yanlin Wang,
Ensheng Shi,
Ruikai Zhang,
Yuchi Ma,
Hongyu Zhang,
Zibin Zheng
Abstract:
To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of…
▽ More
To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of software projects over time, which we refer to as evolving-ignored situation, leading to issues of future context leakage and useful context missing. This in turn results in inaccurate evaluation of LLMs' performance. In this paper, we conduct an empirical study to deeply understand LLMs' code generation performance within settings that reflect the evolving nature of software development. To achieve this, we first construct an evolving-aware repository-level code generation dataset, namely HumanEvo, equipped with an automated execution-based evaluation tool. Second, we manually categorize HumanEvo according to dependency levels to more comprehensively analyze the model's performance in generating functions with different dependency levels. Third, we conduct extensive experiments on HumanEvo with seven representative and diverse LLMs to verify the effectiveness of the proposed benchmark. We obtain many important findings through our experimental study. For example, we find that previous evolving-ignored evaluation approaches lead to inflated performance of the LLMs, ranging from 10.0% to 61.1%. Based on the findings, we give actionable suggestions on more realistic evaluation of LLMs on code generation. We also build a shared evolving-aware code generation toolbox to facilitate future research. Replication package including source code, datasets and appendix is available at https://github.com/DeepSoftwareAnalytics/EvoEval.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Performance Analysis of RIS-aided MISO Systems with EMI and Channel Aging
Authors:
Taoyu Song,
Enyu Shi,
Yu Lu,
Yiyang Zhu,
Jiayi Zhang,
Bo Ai
Abstract:
In this paper, we investigate a reconfigurable intelligent surface (RIS)-aided multiple-input single-output (MISO) system in the presence of electromagnetic interference (EMI) and channel aging with a Rician fading channel model between the base station (BS) and user equipment (UE). Specifically, we derive the closed-form expression for downlink spectral efficiency (SE) with maximum ratio transmis…
▽ More
In this paper, we investigate a reconfigurable intelligent surface (RIS)-aided multiple-input single-output (MISO) system in the presence of electromagnetic interference (EMI) and channel aging with a Rician fading channel model between the base station (BS) and user equipment (UE). Specifically, we derive the closed-form expression for downlink spectral efficiency (SE) with maximum ratio transmission (MRT) precoding. The Monte-Carlo simulation supports the theoretical results, demonstrating that amplifying the weight of the line-of-sight (LoS) component in Rician fading channels can boost SE, while EMI has a detrimental impact. Furthermore, continuously increasing the number of RIS elements is not an optimal choice when EMI exists. Nonetheless, RIS can be deployed to compensate for SE degradation caused by channel aging effects. Finally, enlarging the RIS elements size can significantly improve system performance.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems
Authors:
Yiyang Zhu,
Enyu Shi,
Ziheng Liu,
Jiayi Zhang,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into the CF mMIMO system as a low-cost and power-efficient solution. In this paper, we focus on optimizing the joint precoding design of the RIS-aided CF mMIMO system to maximize the sum SE. This involves optimizing the precoding matrix at the APs and the reflection coefficients at the RIS. To tackle this problem, we propose a fully distributed multi-agent reinforcement learning (MARL) algorithm that incorporates fuzzy logic (FL). Unlike conventional approaches that rely on alternating optimization techniques, our FL-based MARL algorithm only requires local channel state information, which reduces the need for high backhaul capacity. Simulation results demonstrate that our proposed FL-MARL algorithm effectively reduces computational complexity while achieving similar performance as conventional MARL methods.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Graph Neural Network Meets Multi-Agent Reinforcement Learning: Fundamentals, Applications, and Future Directions
Authors:
Ziheng Liu,
Jiayi Zhang,
Enyu Shi,
Zhilong Liu,
Dusit Niyato,
Bo Ai,
Xuemin,
Shen
Abstract:
Multi-agent reinforcement learning (MARL) has become a fundamental component of next-generation wireless communication systems. Theoretically, although MARL has the advantages of low computational complexity and fast convergence rate, there exist several challenges including partial observability, non-stationary, and scalability. In this article, we investigate a novel MARL with graph neural netwo…
▽ More
Multi-agent reinforcement learning (MARL) has become a fundamental component of next-generation wireless communication systems. Theoretically, although MARL has the advantages of low computational complexity and fast convergence rate, there exist several challenges including partial observability, non-stationary, and scalability. In this article, we investigate a novel MARL with graph neural network-aided communication (GNNComm-MARL) to address the aforementioned challenges by making use of graph attention networks to effectively sample neighborhoods and selectively aggregate messages. Furthermore, we thoroughly study the architecture of GNNComm-MARL and present a systematic design solution. We then present the typical applications of GNNComm-MARL from two aspects: resource allocation and mobility management. The results obtained unveil that GNNComm-MARL can achieve better performance with lower communication overhead compared to conventional communication schemes. Finally, several important research directions regarding GNNComm-MARL are presented to facilitate further investigation.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy
Authors:
Wei Dong,
Qiyao Luo,
Giulia Fanti,
Elaine Shi,
Ke Yi
Abstract:
Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integ…
▽ More
Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integer $x_i$ from the domain $\{0,1,\dots,U\}$ and we wish to estimate $\sum_i x_i$. This has a worst-case optimal error of $O(U/\varepsilon)$, while recent work has shown that the clipping mechanism can achieve an instance-optimal error of $O(\max_i x_i \cdot \log\log U /\varepsilon)$. Under the shuffle model, known instance-optimal protocols are less communication-efficient. The clipping mechanism also works in the shuffle model, but requires two rounds: Round one finds the clipping threshold, and round two does the clipping and computes the noisy sum of the clipped data. In this paper, we show how these two seemingly sequential steps can be done simultaneously in one round using just $1+o(1)$ messages per user, while maintaining the instance-optimal error bound. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation (a.k.a. frequency estimation under user-level differential privacy). Our experiments show order-of-magnitude improvements of our protocols in terms of error compared with prior work.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Mechanism Design for Automated Market Makers
Authors:
T-H. Hubert Chan,
Ke Wu,
Elaine Shi
Abstract:
Blockchains have popularized automated market makers (AMMs). An AMM exchange is an application running on a blockchain which maintains a pool of crypto-assets and automatically trades assets with users governed by some pricing function that prices the assets based on their relative demand/supply. AMMs have created an important challenge commonly known as the Miner Extractable Value (MEV). In parti…
▽ More
Blockchains have popularized automated market makers (AMMs). An AMM exchange is an application running on a blockchain which maintains a pool of crypto-assets and automatically trades assets with users governed by some pricing function that prices the assets based on their relative demand/supply. AMMs have created an important challenge commonly known as the Miner Extractable Value (MEV). In particular, the miners who control the contents and ordering of transactions in a block can extract value by front-running and back-running users' transactions, leading to arbitrage opportunities that guarantee them risk-free returns.
In this paper, we consider how to design AMM mechanisms that eliminate MEV opportunities. Specifically, we propose a new AMM mechanism that processes all transactions contained within a block in a batch. We show that our new mechanism satisfies two tiers of guarantees. First, for legacy blockchains where each block is proposed by a single (possibly rotating) miner, we prove that our mechanism satisfies arbitrage resilience, i.e., a miner cannot gain risk-free profit. Moreover, we also guarantee fair treatment among all transactions within the same block, such that the miner is unable to sell off favorable positions in the block to users or arbitragers. Second, for blockchains where the block proposal process is decentralized and offers sequencing-fairness, we prove a stronger notion called incentive compatibility -- roughly speaking, we guarantee that any individual user's best response is to follow the honest strategy.
△ Less
Submitted 21 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Collusion-Resilience in Transaction Fee Mechanism Design
Authors:
Hao Chung,
Tim Roughgarden,
Elaine Shi
Abstract:
Users bid in a transaction fee mechanism (TFM) to get their transactions included and confirmed by a blockchain protocol. Roughgarden (EC'21) initiated the formal treatment of TFMs and proposed three requirements: user incentive compatibility (UIC), miner incentive compatibility (MIC), and a form of collusion-resilience called OCA-proofness. Ethereum's EIP-1559 mechanism satisfies all three proper…
▽ More
Users bid in a transaction fee mechanism (TFM) to get their transactions included and confirmed by a blockchain protocol. Roughgarden (EC'21) initiated the formal treatment of TFMs and proposed three requirements: user incentive compatibility (UIC), miner incentive compatibility (MIC), and a form of collusion-resilience called OCA-proofness. Ethereum's EIP-1559 mechanism satisfies all three properties simultaneously when there is no contention between transactions, but loses the UIC property when there are too many eligible transactions to fit in a single block. Chung and Shi (SODA'23) considered an alternative notion of collusion-resilience, called c-side-contract-proofness (c-SCP), and showed that, when there is contention between transactions, no TFM can satisfy UIC, MIC, and c-SCP for any c at least 1. OCA-proofness asserts that the users and a miner should not be able to "steal from the protocol." On the other hand, the c-SCP condition requires that a coalition of a miner and a subset of users should not be able to profit through strategic deviations (whether at the expense of the protocol or of the users outside the coalition).
Our main result is the first proof that, when there is contention between transactions, no (possibly randomized) TFM in which users are expected to bid truthfully satisfies UIC, MIC, and OCA-proofness. This result resolves the main open question in Roughgarden (EC'21). We also suggest several relaxations of the basic model that allow our impossibility result to be circumvented.
△ Less
Submitted 19 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Authors:
Jiaqi Wang,
Zihao Wu,
Yiwei Li,
Hanqi Jiang,
Peng Shu,
Enze Shi,
Huawen Hu,
Chong Ma,
Yiheng Liu,
Xuhui Wang,
Yincheng Yao,
Xuan Liu,
Huaqin Zhao,
Zhengliang Liu,
Haixing Dai,
Lin Zhao,
Bao Ge,
Xiang Li,
Tianming Liu,
Shu Zhang
Abstract:
Large language models (LLMs) have undergone significant expansion and have been increasingly integrated across various domains. Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions. However, for embodied tasks, where robots interact with comp…
▽ More
Large language models (LLMs) have undergone significant expansion and have been increasingly integrated across various domains. Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions. However, for embodied tasks, where robots interact with complex environments, text-only LLMs often face challenges due to a lack of compatibility with robotic visual perception. This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks. Additionally, we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions. Our results, based on diverse datasets, indicate that GPT-4V effectively enhances robot performance in embodied tasks. This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights toward bridging the gap in Human-Robot-Environment interaction.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models
Authors:
Rita Frieske,
Bertram E. Shi
Abstract:
Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of halluci…
▽ More
Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of hallucinations to probable natural language outputs of the model creates a danger of deception and impacts the credibility of the system. We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models. To address this, we propose a perturbation-based method for assessing the susceptibility of an automatic speech recognition (ASR) model to hallucination at test time, which does not require access to the training dataset. We demonstrate that this method helps to distinguish between hallucinatory and non-hallucinatory models that have similar baseline word error rates. We further explore the relationship between the types of ASR errors and the types of dataset noise to determine what types of noise are most likely to create hallucinatory outputs. We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency. Finally, we discover how to induce hallucinations with a random noise injection to the utterance.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Connected Components in Linear Work and Near-Optimal Time
Authors:
Alireza Farhadi,
S. Cliff Liu,
Elaine Shi
Abstract:
Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in $o(\log n)$ parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components…
▽ More
Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in $o(\log n)$ parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components can be computed in $O(\log d + \log \log n)$ rounds in the MPC model. More recently, Liu et al. (SPAA'20) showed that the same result can be achieved in the standard PRAM model but their result incurs $Θ((m+n) \cdot (\log d + \log \log n))$ work which is sub-optimal.
In this paper, we show that for graphs that contain \emph{well-connected} components, we can compute connected components on a PRAM in sub-logarithmic parallel time with \emph{optimal}, i.e., $O(m+n)$ total work. Specifically, our algorithm achieves $O(\log(1/λ) + \log \log n)$ parallel time with high probability, where $λ$ is the minimum spectral gap of any connected component in the input graph. The algorithm requires no prior knowledge on $λ$.
Additionally, based on the \textsc{2-Cycle} Conjecture we provide a time lower bound of $Ω(\log(1/λ))$ for solving connected components on a PRAM with $O(m+n)$ total memory when $λ\le (1/\log n)^c$, giving conditional optimality to the running time of our algorithm as a parameter of $λ$.
△ Less
Submitted 20 May, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
RIS-Aided Cell-Free Massive MIMO Systems for 6G: Fundamentals, System Design, and Applications
Authors:
Enyu Shi,
Jiayi Zhang,
Hongyang Du,
Bo Ai,
Chau Yuen,
Dusit Niyato,
Khaled B. Letaief,
Xuemin Shen
Abstract:
An introduction of intelligent interconnectivity for people and things has posed higher demands and more challenges for sixth-generation (6G) networks, such as high spectral efficiency and energy efficiency, ultra-low latency, and ultra-high reliability. Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS), also called intelligent reflecting su…
▽ More
An introduction of intelligent interconnectivity for people and things has posed higher demands and more challenges for sixth-generation (6G) networks, such as high spectral efficiency and energy efficiency, ultra-low latency, and ultra-high reliability. Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS), also called intelligent reflecting surface (IRS), are two promising technologies for coping with these unprecedented demands. Given their distinct capabilities, integrating the two technologies to further enhance wireless network performances has received great research and development attention. In this paper, we provide a comprehensive survey of research on RIS-aided CF mMIMO wireless communication systems. We first introduce system models focusing on system architecture and application scenarios, channel models, and communication protocols. Subsequently, we summarize the relevant studies on system operation and resource allocation, providing in-depth analyses and discussions. Following this, we present practical challenges faced by RIS-aided CF mMIMO systems, particularly those introduced by RIS, such as hardware impairments and electromagnetic interference. We summarize corresponding analyses and solutions to further facilitate the implementation of RIS-aided CF mMIMO systems. Furthermore, we explore an interplay between RIS-aided CF mMIMO and other emerging 6G technologies, such as next-generation multiple-access (NGMA), simultaneous wireless information and power transfer (SWIPT), and millimeter wave (mmWave). Finally, we outline several research directions for future RIS-aided CF mMIMO systems.
△ Less
Submitted 22 May, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
SoTaNa: The Open-Source Software Development Assistant
Authors:
Ensheng Shi,
Fengji Zhang,
Yanlin Wang,
Bei Chen,
Lun Du,
Hongyu Zhang,
Shi Han,
Dongmei Zhang,
Hongbin Sun
Abstract:
Software development plays a crucial role in driving innovation and efficiency across modern societies. To meet the demands of this dynamic field, there is a growing need for an effective software development assistant. However, existing large language models represented by ChatGPT suffer from limited accessibility, including training data and model weights. Although other large open-source models…
▽ More
Software development plays a crucial role in driving innovation and efficiency across modern societies. To meet the demands of this dynamic field, there is a growing need for an effective software development assistant. However, existing large language models represented by ChatGPT suffer from limited accessibility, including training data and model weights. Although other large open-source models like LLaMA have shown promise, they still struggle with understanding human intent. In this paper, we present SoTaNa, an open-source software development assistant. SoTaNa utilizes ChatGPT to generate high-quality instruction-based data for the domain of software engineering and employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA. We evaluate the effectiveness of \our{} in answering Stack Overflow questions and demonstrate its capabilities. Additionally, we discuss its capabilities in code summarization and generation, as well as the impact of varying the volume of generated data on model performance. Notably, SoTaNa can run on a single GPU, making it accessible to a broader range of researchers. Our code, model weights, and data are public at \url{https://github.com/DeepSoftwareAnalytics/SoTaNa}.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
SAMAug: Point Prompt Augmentation for Segment Anything Model
Authors:
Haixing Dai,
Chong Ma,
Zhiling Yan,
Zhengliang Liu,
Enze Shi,
Yiwei Li,
Peng Shu,
Xiaozheng Wei,
Lin Zhao,
Zihao Wu,
Fang Zeng,
Dajiang Zhu,
Wei Liu,
Quanzheng Li,
Lichao Sun,
Shu Zhang Tianming Liu,
Xiang Li
Abstract:
This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information about the user's intention to SAM. Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmen…
▽ More
This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information about the user's intention to SAM. Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on both the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We conducted evaluations using four different point augmentation strategies: random sampling, sampling based on maximum difference entropy, maximum distance, and saliency. Experiment results on the COCO, Fundus, COVID QUEx, and ISIC2018 datasets show that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency. SAMAug demonstrates the potential of visual prompt augmentation for computer vision. Codes of SAMAug are available at github.com/yhydhx/SAMAug
△ Less
Submitted 19 March, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Review of Large Vision Models and Visual Prompt Engineering
Authors:
Jiaqi Wang,
Zhengliang Liu,
Lin Zhao,
Zihao Wu,
Chong Ma,
Sigang Yu,
Haixing Dai,
Qiushi Yang,
Yiheng Liu,
Songyao Zhang,
Enze Shi,
Yi Pan,
Tuo Zhang,
Dajiang Zhu,
Xiang Li,
Xi Jiang,
Bao Ge,
Yixuan Yuan,
Dinggang Shen,
Tianming Liu,
Shu Zhang
Abstract:
Visual prompt engineering is a fundamental technology in the field of visual and image Artificial General Intelligence, serving as a key component for achieving zero-shot capabilities. As the development of large vision models progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific visual tasks has emerged as a meaningful research dire…
▽ More
Visual prompt engineering is a fundamental technology in the field of visual and image Artificial General Intelligence, serving as a key component for achieving zero-shot capabilities. As the development of large vision models progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific visual tasks has emerged as a meaningful research direction. This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering. We present influential large models in the visual domain and a range of prompt engineering methods employed on these models. It is our hope that this review provides a comprehensive and systematic description of prompt engineering methods based on large visual models, offering valuable insights for future researchers in their exploration of this field.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Uplink Performance of RIS-aided Cell-Free Massive MIMO System with Electromagnetic Interference
Authors:
Enyu Shi,
Jiayi Zhang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (MIMO) and reconfigurable intelligent surface (RIS) are two promising technologies for realizing future beyond-fifth generation (B5G) networks. In this paper, we consider a practical spatially correlated RIS-aided CF massive MIMO system with multi-antenna access points (APs) over spatially correlated fading channels. Different from previous wor…
▽ More
Cell-free (CF) massive multiple-input multiple-output (MIMO) and reconfigurable intelligent surface (RIS) are two promising technologies for realizing future beyond-fifth generation (B5G) networks. In this paper, we consider a practical spatially correlated RIS-aided CF massive MIMO system with multi-antenna access points (APs) over spatially correlated fading channels. Different from previous work, the electromagnetic interference (EMI) at RIS is considered to further characterize the system performance of the actual environment. Then, we derive the closed-form expression for the system spectral efficiency (SE) with the maximum ratio (MR) combining at the APs and the large-scale fading decoding (LSFD) at the central processing unit (CPU). Moreover, to counteract the near-far effect and EMI, we propose practical fractional power control (FPC) and max-min power control algorithms to further improve the system performance. We unveil the impact of EMI, channel correlations, and different signal processing methods on the uplink SE of user equipments (UEs). The accuracy of our derived analytical results is verified by extensive Monte-Carlo simulations. Our results show that the EMI can substantially degrade the SE, especially for those UEs with unsatisfactory channel conditions. Besides, increasing the number of RIS elements is always beneficial in terms of the SE, but with diminishing returns when the number of RIS elements is sufficiently large. Furthermore, the existence of spatial correlations among RIS elements can deteriorate the system performance when RIS is impaired by EMI.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Prompt Engineering for Healthcare: Methodologies and Applications
Authors:
Jiaqi Wang,
Enze Shi,
Sigang Yu,
Zihao Wu,
Chong Ma,
Haixing Dai,
Qiushi Yang,
Yanqing Kang,
Jinru Wu,
Huawen Hu,
Chenxi Yue,
Haiyang Zhang,
Yiheng Liu,
Yi Pan,
Zhengliang Liu,
Lichao Sun,
Xiang Li,
Bao Ge,
Xi Jiang,
Dajiang Zhu,
Yixuan Yuan,
Dinggang Shen,
Tianming Liu,
Shu Zhang
Abstract:
Prompt engineering is a critical technique in the field of natural language processing that involves designing and optimizing the prompts used to input information into models, aiming to enhance their performance on specific tasks. With the recent advancements in large language models, prompt engineering has shown significant superiority across various domains and has become increasingly important…
▽ More
Prompt engineering is a critical technique in the field of natural language processing that involves designing and optimizing the prompts used to input information into models, aiming to enhance their performance on specific tasks. With the recent advancements in large language models, prompt engineering has shown significant superiority across various domains and has become increasingly important in the healthcare domain. However, there is a lack of comprehensive reviews specifically focusing on prompt engineering in the medical field. This review will introduce the latest advances in prompt engineering in the field of natural language processing for the medical field. First, we will provide the development of prompt engineering and emphasize its significant contributions to healthcare natural language processing applications such as question-answering systems, text summarization, and machine translation. With the continuous improvement of general large language models, the importance of prompt engineering in the healthcare domain is becoming increasingly prominent. The aim of this article is to provide useful resources and bridges for healthcare natural language processing researchers to better explore the application of prompt engineering in this field. We hope that this review can provide new ideas and inspire for research and application in medical natural language processing.
△ Less
Submitted 23 March, 2024; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond
Authors:
Ensheng Shi,
Yanlin Wang,
Hongyu Zhang,
Lun Du,
Shi Han,
Dongmei Zhang,
Hongbin Sun
Abstract:
Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encode…
▽ More
Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Maximizing Miner Revenue in Transaction Fee Mechanism Design
Authors:
Ke Wu,
Elaine Shi,
Hao Chung
Abstract:
Transaction fee mechanism design is a new decentralized mechanism design problem where users bid for space on the blockchain. Several recent works showed that the transaction fee mechanism design fundamentally departs from classical mechanism design. They then systematically explored the mathematical landscape of this new decentralized mechanism design problem in two settings: in the plain setting…
▽ More
Transaction fee mechanism design is a new decentralized mechanism design problem where users bid for space on the blockchain. Several recent works showed that the transaction fee mechanism design fundamentally departs from classical mechanism design. They then systematically explored the mathematical landscape of this new decentralized mechanism design problem in two settings: in the plain setting where no cryptography is employed, and in a cryptography-assisted setting where the rules of the mechanism are enforced by a multi-party computation protocol. Unfortunately, in both settings, prior works showed that if we want the mechanism to incentivize honest behavior for both users as well as miners (possibly colluding with users), then the miner revenue has to be zero. Although adopting a relaxed, approximate notion of incentive compatibility gets around this zero miner-revenue limitation, the scaling of the miner revenue is nonetheless poor.
In this paper, we show that if we make a mildly stronger reasonable-world assumption than prior works, we can circumvent the known limitations on miner revenue, and design auctions that generate optimal miner revenue. We also systematically explore the mathematical landscape of transaction fee mechanism design under the new reasonable-world and demonstrate how such assumptions can alter the feasibility and infeasibility landscape.
△ Less
Submitted 21 April, 2024; v1 submitted 24 February, 2023;
originally announced February 2023.
-
XCRYPT: Accelerating Lattice Based Cryptography with Memristor Crossbar Arrays
Authors:
Sarabjeet Singh,
Xiong Fan,
Ananth Krishna Prasad,
Lin Jia,
Anirban Nag,
Rajeev Balasubramonian,
Mahdi Nazm Bojnordi,
Elaine Shi
Abstract:
This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but…
▽ More
This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but it is also amenable to acceleration with crossbars. SABER is a module-LWR based approach, which performs modular polynomial multiplications with rounding. We map the polynomial multiplications in SABER on crossbars and show that analog dot-products can yield a $1.7-32.5\times$ performance and energy efficiency improvement, compared to recent hardware proposals. This initial design combines the innovations in multiple state-of-the-art works -- the algorithm in SABER and the memristive acceleration principles proposed in ISAAC (for deep neural network acceleration). We then identify the bottlenecks in this initial design and introduce several additional techniques to improve its efficiency. These techniques are synergistic and especially benefit from SABER's power-of-two modulo operation. First, we show that some of the software techniques used in SABER, that are effective on CPU platforms, are unhelpful in crossbar-based accelerators. Relying on simpler algorithms further improves our efficiencies by $1.3-3.6\times$. Second, we exploit the nature of SABER's computations to stagger the operations in crossbars and share a few variable precision ADCs, resulting in up to $1.8\times$ higher efficiency. Third, to further reduce ADC pressure, we propose a simple analog Shift-and-Add technique, which results in a $1.3-6.3\times$ increase in the efficiency. Overall, our designs achieve $3-15\times$ higher efficiency over initial design, and $3-51\times$ higher than prior work.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Adore: Differentially Oblivious Relational Database Operators
Authors:
Lianke Qin,
Rajesh Jayaram,
Elaine Shi,
Zhao Song,
Danyang Zhuo,
Shumo Chu
Abstract:
There has been a recent effort in applying differential privacy on memory access patterns to enhance data privacy. This is called differential obliviousness. Differential obliviousness is a promising direction because it provides a principled trade-off between performance and desired level of privacy. To date, it is still an open question whether differential obliviousness can speed up database pr…
▽ More
There has been a recent effort in applying differential privacy on memory access patterns to enhance data privacy. This is called differential obliviousness. Differential obliviousness is a promising direction because it provides a principled trade-off between performance and desired level of privacy. To date, it is still an open question whether differential obliviousness can speed up database processing with respect to full obliviousness. In this paper, we present the design and implementation of three new major database operators: selection with projection, grouping with aggregation, and foreign key join. We prove that they satisfy the notion of differential obliviousness. Our differentially oblivious operators have reduced cache complexity, runtime complexity, and output size compared to their state-of-the-art fully oblivious counterparts. We also demonstrate that our implementation of these differentially oblivious operators can outperform their state-of-the-art fully oblivious counterparts by up to $7.4\times$.
△ Less
Submitted 29 September, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
Efficient Privacy-Preserving Machine Learning with Lightweight Trusted Hardware
Authors:
Pengzhi Huang,
Thang Hoang,
Yueying Li,
Elaine Shi,
G. Edward Suh
Abstract:
In this paper, we propose a new secure machine learning inference platform assisted by a small dedicated security processor, which will be easier to protect and deploy compared to today's TEEs integrated into high-performance processors. Our platform provides three main advantages over the state-of-the-art:
(i) We achieve significant performance improvements compared to state-of-the-art distribu…
▽ More
In this paper, we propose a new secure machine learning inference platform assisted by a small dedicated security processor, which will be easier to protect and deploy compared to today's TEEs integrated into high-performance processors. Our platform provides three main advantages over the state-of-the-art:
(i) We achieve significant performance improvements compared to state-of-the-art distributed Privacy-Preserving Machine Learning (PPML) protocols, with only a small security processor that is comparable to a discrete security chip such as the Trusted Platform Module (TPM) or on-chip security subsystems in SoCs similar to the Apple enclave processor. In the semi-honest setting with WAN/GPU, our scheme is 4X-63X faster than Falcon (PoPETs'21) and AriaNN (PoPETs'22) and 3.8X-12X more communication efficient. We achieve even higher performance improvements in the malicious setting.
(ii) Our platform guarantees security with abort against malicious adversaries under honest majority assumption.
(iii) Our technique is not limited by the size of secure memory in a TEE and can support high-capacity modern neural networks like ResNet18 and Transformer.
While previous work investigated the use of high-performance TEEs in PPML, this work represents the first to show that even tiny secure hardware with really limited performance can be leveraged to significantly speed-up distributed PPML protocols if the protocol can be carefully designed for lightweight trusted hardware.
△ Less
Submitted 17 June, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Reducing Stress and Anxiety in the Metaverse: A Systematic Review of Meditation, Mindfulness and Virtual Reality
Authors:
Xian Wang,
Xiaoyu Mo,
Mingming Fan,
Lik-Hang Lee,
Bertram E. Shi,
Pan Hui
Abstract:
Meditation, or mindfulness, is widely used to improve mental health. With the emergence of Virtual Reality technology, many studies have provided evidence that meditation with VR can bring health benefits. However, to our knowledge, there are no guidelines and comprehensive reviews in the literature on how to conduct such research in virtual reality. In order to understand the role of VR technolog…
▽ More
Meditation, or mindfulness, is widely used to improve mental health. With the emergence of Virtual Reality technology, many studies have provided evidence that meditation with VR can bring health benefits. However, to our knowledge, there are no guidelines and comprehensive reviews in the literature on how to conduct such research in virtual reality. In order to understand the role of VR technology in meditation and future research opportunities, we conducted a systematic literature review in the IEEE and ACM databases. Our process yielded 19 eligible papers and we conducted a structured analysis. We understand the state-of-art of meditation type, design consideration and VR and technology through these papers and conclude research opportunities and challenges for the future.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Decentralized Coordinated Precoding Design in Cell-Free Massive MIMO Systems for URLLC
Authors:
Enyu Shi,
Jing Zhang,
Jiayi Zhang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
Cell-free massive multiple-input multiple-output (MIMO) is a promising network to offer huge improvement of the achievable rate compared with conventional cellular massive MIMO systems. However, the commonly adopted Shannon-type achievable rate is only valid in the long block length regime that is not applicable to the emerging short-packet communication. To realize ultra-reliable and low-latency…
▽ More
Cell-free massive multiple-input multiple-output (MIMO) is a promising network to offer huge improvement of the achievable rate compared with conventional cellular massive MIMO systems. However, the commonly adopted Shannon-type achievable rate is only valid in the long block length regime that is not applicable to the emerging short-packet communication. To realize ultra-reliable and low-latency communication (URLLC) in cell-free massive MIMO systems, we optimize the precoding vector at the access points (APs) to maximize the minimum user rate in both the centralized and decentralized fashion. The design takes into account the impact of URLLC and we propose path-following algorithms (PFA) to address the considered problem which generates a sequence of advanced feasible points and converges to at least a locally optimal solution of the design problem. Moreover, we investigate the requirement of the precoding schemes, the length of the transmission duration, the number of antennas equipped at each AP, and the size of each AP cluster on the URLLC rate. Numerical results show that the decentralized PFA precoding can achieve 80\% of the 95\%-likely URLLC rate of the centralized precoding and 89\% of the average URLLC rate with only 12\% computational complexity of the centralized precoding.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
What Can Cryptography Do For Decentralized Mechanism Design
Authors:
Elaine Shi,
Hao Chung,
Ke Wu
Abstract:
Recent works of Roughgarden (EC'21) and Chung and Shi (SODA'23) initiate the study of a new decentralized mechanism design problem called transaction fee mechanism design (TFM). Unlike the classical mechanism design literature, in the decentralized environment, even the auctioneer (i.e., the miner) can be a strategic player, and it can even collude with a subset of the users facilitated by binding…
▽ More
Recent works of Roughgarden (EC'21) and Chung and Shi (SODA'23) initiate the study of a new decentralized mechanism design problem called transaction fee mechanism design (TFM). Unlike the classical mechanism design literature, in the decentralized environment, even the auctioneer (i.e., the miner) can be a strategic player, and it can even collude with a subset of the users facilitated by binding side contracts. Chung and Shi showed two main impossibility results that rule out the existence of a {\it dream} TFM. First, any TFM that provides incentive compatibility for individual users and miner-user coalitions must always have zero miner revenue, no matter whether the block size is finite or infinite. Second, assuming finite block size, no non-trivial TFM can simultaenously provide incentive compatibility for any individual user, and for any miner-user coalition.
In this work, we explore what new models and meaningful relaxations can allow us to circumvent the impossibility results of Chung and Shi. Besides today's model that does not employ cryptography, we introduce a new MPC-assisted model where the TFM is implemented by a joint multi-party computation (MPC) protocol among the miners. We prove several feasibility and infeasibility results for achieving {\it strict} and {\it approximate} incentive compatibility, respectively, in the plain model as well as the MPC-assisted model. We show that while cryptography is not a panacea, it indeed allows us to overcome some impossibility results pertaining to the plain model, leading to non-trivial mechanisms with useful guarantees that are otherwise impossible in the plain model. Our work is also the first to characterize the mathematical landscape of transaction fee mechanism design under approximate incentive compatibility, as well as in a cryptography-assisted model.
△ Less
Submitted 19 February, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Uplink Performance of RIS-aided Cell-Free Massive MIMO System Over Spatially Correlated Channels
Authors:
Enyu Shi,
Jiayi Zhang,
Zhe Wang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
We consider a practical spatially correlated reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input-multiple-output (mMIMO) system with multi-antenna access points (APs) over spatially correlated Rician fading channels. The minimum mean square error (MMSE) channel estimator is adopted to estimate the aggregated RIS channels. Then, we investigate the uplink spectral ef…
▽ More
We consider a practical spatially correlated reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input-multiple-output (mMIMO) system with multi-antenna access points (APs) over spatially correlated Rician fading channels. The minimum mean square error (MMSE) channel estimator is adopted to estimate the aggregated RIS channels. Then, we investigate the uplink spectral efficiency (SE) with the maximum ratio (MR) and the local minimum mean squared error (L-MMSE) combining at the APs and obtain the closed-form expression for characterizing the performance of the former. The accuracy of our derived analytical results has been verified by extensive Monte-Carlo simulations. Our results show that increasing the number of RIS elements is always beneficial, but with diminishing returns when the number of RIS elements is sufficiently large. Furthermore, the effect of the number of AP antennas on system performance is more pronounced under a small number of RIS elements, while the spatial correlation of RIS elements imposes a more severe negative impact on the system performance than that of the AP antennas.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
CoCoSoDa: Effective Contrastive Learning for Code Search
Authors:
Ensheng Shi,
Yanlin Wang,
Wenchao Gu,
Lun Du,
Hongyu Zhang,
Shi Han,
Dongmei Zhang,
Hongbin Sun
Abstract:
Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose C…
▽ More
Code search aims to retrieve semantically relevant code snippets for a given natural language query. Recently, many approaches employing contrastive learning have shown promising results on code representation learning and greatly improved the performance of code search. However, there is still a lot of room for improvement in using contrastive learning for code search. In this paper, we propose CoCoSoDa to effectively utilize contrastive learning for code search via two key factors in contrastive learning: data augmentation and negative samples. Specifically, soft data augmentation is to dynamically masking or replacing some tokens with their types for input sequences to generate positive samples. Momentum mechanism is used to generate large and consistent representations of negative samples in a mini-batch through maintaining a queue and a momentum encoder. In addition, multimodal contrastive learning is used to pull together representations of code-query pairs and push apart the unpaired code snippets and queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. Experimental results show that: (1) CoCoSoDa outperforms 14 baselines and especially exceeds CodeBERT, GraphCodeBERT, and UniXcoder by 13.3%, 10.5%, and 5.9% on average MRR scores, respectively. (2) The ablation studies show the effectiveness of each component of our approach. (3) We adapt our techniques to several different pre-trained models such as RoBERTa, CodeBERT, and GraphCodeBERT and observe a significant boost in their performance in code search. (4) Our model performs robustly under different hyper-parameters. Furthermore, we perform qualitative and quantitative analyses to explore reasons behind the good performance of our model.
△ Less
Submitted 12 February, 2023; v1 submitted 7 April, 2022;
originally announced April 2022.
-
RACE: Retrieval-Augmented Commit Message Generation
Authors:
Ensheng Shi,
Yanlin Wang,
Wei Tao,
Lun Du,
Hongyu Zhang,
Shi Han,
Dongmei Zhang,
Hongbin Sun
Abstract:
Commit messages are important for software development and maintenance. Many neural network-based approaches have been proposed and shown promising results on automatic commit message generation. However, the generated commit messages could be repetitive or redundant. In this paper, we propose RACE, a new retrieval-augmented neural commit message generation method, which treats the retrieved simil…
▽ More
Commit messages are important for software development and maintenance. Many neural network-based approaches have been proposed and shown promising results on automatic commit message generation. However, the generated commit messages could be repetitive or redundant. In this paper, we propose RACE, a new retrieval-augmented neural commit message generation method, which treats the retrieved similar commit as an exemplar and leverages it to generate an accurate commit message. As the retrieved commit message may not always accurately describe the content/intent of the current code diff, we also propose an exemplar guider, which learns the semantic similarity between the retrieved and current code diff and then guides the generation of commit message based on the similarity. We conduct extensive experiments on a large public dataset with five programming languages. Experimental results show that RACE can outperform all baselines. Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.
△ Less
Submitted 22 October, 2022; v1 submitted 5 March, 2022;
originally announced March 2022.
-
Wireless Energy Transfer in RIS-Aided Cell-Free Massive MIMO Systems: Opportunities and Challenges
Authors:
Enyu Shi,
Jiayi Zhang,
Shuaifei Chen,
Jiakang Zheng,
Yan Zhang,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
In future sixth-generation (6G) mobile networks, the Internet-of-Everything (IoE) is expected to provide extremely massive connectivity for small battery-powered devices. Indeed, massive devices with limited energy storage capacity impose persistent energy demand hindering the lifetime of communication networks. As a remedy, wireless energy transfer (WET) is a key technology to address these criti…
▽ More
In future sixth-generation (6G) mobile networks, the Internet-of-Everything (IoE) is expected to provide extremely massive connectivity for small battery-powered devices. Indeed, massive devices with limited energy storage capacity impose persistent energy demand hindering the lifetime of communication networks. As a remedy, wireless energy transfer (WET) is a key technology to address these critical energy supply issues. On the other hand, cell-free (CF) massive multiple-input multiple-output (MIMO) systems offer an efficient network architecture to realize the roll-out of the IoE. In this article, we first propose the paradigm of reconfigurable intelligent surface (RIS)-aided CF massive MIMO systems for WET, including its potential application scenarios and system architecture. The four-stage transmission procedure is discussed and analyzed to illustrate the practicality of the architecture. Then we put forward and analyze the hardware design of RIS. Particularly, we discuss the three corresponding operating modes and the amalgamation of WET technology and RIS-aided CF massive MIMO. Representative simulation results are given to confirm the superior performance achieved by our proposed schemes. Also, we investigate the optimal location of deploying multiple RISs to achieve the best system performance. Finally, several important research directions of RIS-aided CF massive MIMO systems with WET are presented to inspire further potential investigation.
△ Less
Submitted 28 January, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Uplink Performance of High-Mobility Cell-Free Massive MIMO-OFDM Systems
Authors:
Jiakang Zheng,
Jiayi Zhang,
Enyu Shi,
Jing Jiang,
Bo Ai
Abstract:
High-speed train (HST) communications with orthogonal frequency division multiplexing (OFDM) techniques have received significant attention in recent years. Besides, cell-free (CF) massive multiple-input multiple-output (MIMO) is considered a promising technology to achieve the ultimate performance limit. In this paper, we focus on the performance of CF massive MIMO-OFDM systems with both matched…
▽ More
High-speed train (HST) communications with orthogonal frequency division multiplexing (OFDM) techniques have received significant attention in recent years. Besides, cell-free (CF) massive multiple-input multiple-output (MIMO) is considered a promising technology to achieve the ultimate performance limit. In this paper, we focus on the performance of CF massive MIMO-OFDM systems with both matched filter and large-scale fading decoding (LSFD) receivers in HST communications. HST communications with small cell and cellular massive MIMO-OFDM systems are also analyzed for comparison. Considering the bad effect of Doppler frequency offset (DFO) on system performance, exact closed-form expressions for uplink spectral efficiency (SE) of all systems are derived. According to the simulation results, we find that the CF massive MIMO-OFDM system with LSFD achieves both larger SE and lower SE drop percentages than other systems. In addition, increasing the number of access points (APs) and antennas per AP can effectively compensate for the performance loss from the DFO. Moreover, there is an optimal vertical distance between APs and HST to achieve the maximum SE.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Authors:
Wenliang Dai,
Samuel Cahyawijaya,
Tiezheng Yu,
Elham J. Barezi,
Peng Xu,
Cheuk Tung Shadow Yiu,
Rita Frieske,
Holy Lovenia,
Genta Indra Winata,
Qifeng Chen,
Xiaojuan Ma,
Bertram E. Shi,
Pascale Fung
Abstract:
With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource lan…
▽ More
With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource languages, hindering the development of research and applications. In this paper, we introduce a new dataset, Cantonese In-car Audio-Visual Speech Recognition (CI-AVSR), for in-car command recognition in the Cantonese language with both video and audio data. It consists of 4,984 samples (8.3 hours) of 200 in-car commands recorded by 30 native Cantonese speakers. Furthermore, we augment our dataset using common in-car background noises to simulate real environments, producing a dataset 10 times larger than the collected one. We provide detailed statistics of both the clean and the augmented versions of our dataset. Moreover, we implement two multimodal baselines to demonstrate the validity of CI-AVSR. Experiment results show that leveraging the visual signal improves the overall performance of the model. Although our best model can achieve a considerable quality on the clean test set, the speech recognition quality on the noisy data is still inferior and remains as an extremely challenging task for real in-car speech recognition systems. The dataset and code will be released at https://github.com/HLTCHKUST/CI-AVSR.
△ Less
Submitted 14 March, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Authors:
Tiezheng Yu,
Rita Frieske,
Peng Xu,
Samuel Cahyawijaya,
Cheuk Tung Shadow Yiu,
Holy Lovenia,
Wenliang Dai,
Elham J. Barezi,
Qifeng Chen,
Xiaojuan Ma,
Bertram E. Shi,
Pascale Fung
Abstract:
Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech…
▽ More
Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech paired with transcripts, collected from Cantonese audiobooks from Hong Kong. It comprises philosophy, politics, education, culture, lifestyle and family domains, covering a wide range of topics. We also review all existing Cantonese datasets and analyze them according to their speech type, data source, total size and availability. We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset. In addition, we create a powerful and robust Cantonese ASR model by applying multi-dataset learning on MDCC and Common Voice zh-HK.
△ Less
Submitted 17 January, 2022; v1 submitted 7 January, 2022;
originally announced January 2022.
-
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Authors:
Holy Lovenia,
Samuel Cahyawijaya,
Genta Indra Winata,
Peng Xu,
Xu Yan,
Zihan Liu,
Rita Frieske,
Tiezheng Yu,
Wenliang Dai,
Elham J. Barezi,
Qifeng Chen,
Xiaojuan Ma,
Bertram E. Shi,
Pascale Fung
Abstract:
Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus buil…
▽ More
Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69\% character error rate and 27.05% mixed error rate.
△ Less
Submitted 3 May, 2022; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Locally Differentially Private Sparse Vector Aggregation
Authors:
Mingxun Zhou,
Tianhao Wang,
T-H. Hubert Chan,
Giulia Fanti,
Elaine Shi
Abstract:
Vector mean estimation is a central primitive in federated analytics. In vector mean estimation, each user $i \in [n]$ holds a real-valued vector $v_i\in [-1, 1]^d$, and a server wants to estimate the mean of all $n$ vectors. Not only so, we would like to protect each individual user's privacy. In this paper, we consider the $k$-sparse version of the vector mean estimation problem, that is, suppos…
▽ More
Vector mean estimation is a central primitive in federated analytics. In vector mean estimation, each user $i \in [n]$ holds a real-valued vector $v_i\in [-1, 1]^d$, and a server wants to estimate the mean of all $n$ vectors. Not only so, we would like to protect each individual user's privacy. In this paper, we consider the $k$-sparse version of the vector mean estimation problem, that is, suppose that each user's vector has at most $k$ non-zero coordinates in its $d$-dimensional vector, and moreover, $k \ll d$. In practice, since the universe size $d$ can be very large (e.g., the space of all possible URLs), we would like the per-user communication to be succinct, i.e., independent of or (poly-)logarithmic in the universe size.
In this paper, we are the first to show matching upper- and lower-bounds for the $k$-sparse vector mean estimation problem under local differential privacy. Specifically, we construct new mechanisms that achieve asymptotically optimal error as well as succinct communication, either under user-level-LDP or event-level-LDP. We implement our algorithms and evaluate them on synthetic as well as real-world datasets. Our experiments show that we can often achieve one or two orders of magnitude reduction in error in comparison with prior works under typical choices of parameters, while incurring insignificant communication cost.
△ Less
Submitted 26 February, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Foundations of Transaction Fee Mechanism Design
Authors:
Hao Chung,
Elaine Shi
Abstract:
In blockchains such as Bitcoin and Ethereum, users compete in a transaction fee auction to get their transactions confirmed in the next block. A line of recent works set forth the desiderata for a "dream" transaction fee mechanism (TFM), and explored whether such a mechanism existed. A dream TFM should satisfy 1) user incentive compatibility (UIC), i.e., truthful bidding should be a user's dominan…
▽ More
In blockchains such as Bitcoin and Ethereum, users compete in a transaction fee auction to get their transactions confirmed in the next block. A line of recent works set forth the desiderata for a "dream" transaction fee mechanism (TFM), and explored whether such a mechanism existed. A dream TFM should satisfy 1) user incentive compatibility (UIC), i.e., truthful bidding should be a user's dominant strategy; 2) miner incentive compatibility (MIC), i.e., the miner's dominant strategy is to faithfully implement the prescribed mechanism; and 3) miner-user side contract proofness (SCP), i.e., no coalition of the miner and one or more user(s) can increase their joint utility by deviating from the honest behavior. The weakest form of SCP is called 1-SCP, where we only aim to provide resilience against the collusion of the miner and a single user. Sadly, despite the various attempts, to the best of knowledge, no existing mechanism can satisfy all three properties in all situations.
Since the TFM departs from classical mechanism design in modeling and assumptions, to date, our understanding of the design space is relatively little. In this paper, we further unravel the mathematical structure of transaction fee mechanism design by proving the following results:
- Can we have a dream TFM?
- Rethinking the incentive compatibility notions.
- Do the new design elements make a difference?
△ Less
Submitted 4 November, 2022; v1 submitted 4 November, 2021;
originally announced November 2021.
-
The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning
Authors:
Ke Sun,
Yingnan Zhao,
Enze Shi,
Yafei Wang,
Xiaodong Yan,
Bei Jiang,
Linglong Kong
Abstract:
The theoretical advantages of distributional reinforcement learning~(RL) over classical RL remain elusive despite its remarkable empirical performance. Starting from Categorical Distributional RL~(CDRL), we attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regular…
▽ More
The theoretical advantages of distributional reinforcement learning~(RL) over classical RL remain elusive despite its remarkable empirical performance. Starting from Categorical Distributional RL~(CDRL), we attribute the potential superiority of distributional RL to a derived distribution-matching regularization by applying a return density function decomposition technique. This unexplored regularization in the distributional RL context is aimed at capturing additional return distribution information regardless of only its expectation, contributing to an augmented reward signal in the policy optimization. Compared with the entropy regularization in MaxEnt RL that explicitly optimizes the policy to encourage the exploration, the resulting regularization in CDRL implicitly optimizes policies guided by the new reward signal to align with the uncertainty of target return distributions, leading to an uncertainty-aware exploration effect. Finally, extensive experiments substantiate the importance of this uncertainty-aware regularization in distributional RL on the empirical benefits over classical RL.
△ Less
Submitted 2 February, 2024; v1 submitted 6 October, 2021;
originally announced October 2021.
-
CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees
Authors:
Ensheng Shi,
Yanlin Wang,
Lun Du,
Hongyu Zhang,
Shi Han,
Dongmei Zhang,
Hongbin Sun
Abstract:
Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/…
▽ More
Code summarization aims to generate concise natural language descriptions of source code, which can help improve program comprehension and maintenance. Recent studies show that syntactic and structural information extracted from abstract syntax trees (ASTs) is conducive to summary generation. However, existing approaches fail to fully capture the rich information in ASTs because of the large size/depth of ASTs. In this paper, we propose a novel model CAST that hierarchically splits and reconstructs ASTs. First, we hierarchically split a large AST into a set of subtrees and utilize a recursive neural network to encode the subtrees. Then, we aggregate the embeddings of subtrees by reconstructing the split ASTs to get the representation of the complete AST. Finally, AST representation, together with source code embedding obtained by a vanilla code token encoder, is used for code summarization. Extensive experiments, including the ablation study and the human evaluation, on benchmarks have demonstrated the power of CAST. To facilitate reproducibility, our code and data are available at https://anonymous.4open.science/r/CAST/.
△ Less
Submitted 30 November, 2021; v1 submitted 30 August, 2021;
originally announced August 2021.
-
Iterative Distillation for Better Uncertainty Estimates in Multitask Emotion Recognition
Authors:
Didan Deng,
Liang Wu,
Bertram E. Shi
Abstract:
When recognizing emotions, subtle nuances in displays of emotion generate ambiguity or uncertainty in emotion perception. Emotion uncertainty has been previously interpreted as inter-rater disagreement among multiple annotators. In this paper, we consider a more common and challenging scenario: modeling emotion uncertainty when only single emotion labels are available. From a Bayesian perspective,…
▽ More
When recognizing emotions, subtle nuances in displays of emotion generate ambiguity or uncertainty in emotion perception. Emotion uncertainty has been previously interpreted as inter-rater disagreement among multiple annotators. In this paper, we consider a more common and challenging scenario: modeling emotion uncertainty when only single emotion labels are available. From a Bayesian perspective, we propose to use deep ensembles to capture uncertainty for multiple emotion descriptors, i.e., action units, discrete expression labels and continuous descriptors. We further apply iterative self-distillation. Iterative distillation over multiple generations significantly improves performance in both emotion recognition and uncertainty estimation. Our method generates single student models that provide accurate estimates of uncertainty for in-domain samples and a student ensemble that can detect out-of-domain samples. Our experiments on emotion recognition and uncertainty estimation using the Aff-wild2 dataset demonstrate that our algorithm gives more reliable uncertainty estimates than both Temperature Scaling and Monte Carol Dropout.
△ Less
Submitted 17 October, 2021; v1 submitted 21 July, 2021;
originally announced August 2021.
-
On the Evaluation of Neural Code Summarization
Authors:
Ensheng Shi,
Yanlin Wang,
Lun Du,
Junjie Chen,
Shi Han,
Hongyu Zhang,
Dongmei Zhang,
Hongbin Sun
Abstract:
Source code summaries are important for program comprehension and maintenance. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future rese…
▽ More
Source code summaries are important for program comprehension and maintenance. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets. The evaluation results show that some important factors have a great influence on the model evaluation, especially on the performance of models and the ranking among the models. However, these factors might be easily overlooked. Specifically, (1) the BLEU metric widely used in existing work of evaluating code summarization models has many variants. Ignoring the differences among these variants could greatly affect the validity of the claimed results. Furthermore, we conduct human evaluations and find that the metric BLEU-DC is most correlated to human perception; (2) code pre-processing choices can have a large (from -18\% to +25\%) impact on the summarization performance and should not be neglected. We also explore the aggregation of pre-processing combinations and boost the performance of models; (3) some important characteristics of datasets (corpus sizes, data splitting methods, and duplication ratios) have a significant impact on model evaluation. Based on the experimental results, we give actionable suggestions for evaluating code summarization and choosing the best method in different scenarios. We also build a shared code summarization toolbox to facilitate future research.
△ Less
Submitted 11 February, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.
-
On the Evaluation of Commit Message Generation Models: An Experimental Study
Authors:
Wei Tao,
Yanlin Wang,
Ensheng Shi,
Lun Du,
Shi Han,
Hongyu Zhang,
Dongmei Zhang,
Wenqiang Zhang
Abstract:
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better u…
▽ More
Commit messages are natural language descriptions of code changes, which are important for program understanding and maintenance. However, writing commit messages manually is time-consuming and laborious, especially when the code is updated frequently. Various approaches utilizing generation or retrieval techniques have been proposed to automatically generate commit messages. To achieve a better understanding of how the existing approaches perform in solving this problem, this paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets. We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods. (2) Most existing datasets are crawled only from Java repositories while repositories in other programming languages are not sufficiently explored. (3) Dataset splitting strategies can influence the performance of existing models by a large margin. Some models show better performance when the datasets are split by commit, while other models perform better when the datasets are split by timestamp or by project. Based on our findings, we conduct a human evaluation and find the BLEU metric that best correlates with the human scores for the task. We also collect a large-scale, information-rich, and multi-language commit message dataset MCMD and evaluate existing models on this dataset. Furthermore, we conduct extensive experiments under different dataset splitting strategies and suggest the suitable models under different scenarios. Based on the experimental results and findings, we provide feasible suggestions for comprehensively evaluating commit message generation models and discuss possible future research directions. We believe this work can help practitioners and researchers better evaluate and select models for automatic commit message generation.
△ Less
Submitted 26 July, 2021; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search
Authors:
Lun Du,
Xiaozhou Shi,
Yanlin Wang,
Ensheng Shi,
Shi Han,
Dongmei Zhang
Abstract:
Recently, deep learning methods have become mainstream in code search since they do better at capturing semantic correlations between code snippets and search queries and have promising performance. However, code snippets have diverse information from different dimensions, such as business logic, specific algorithm, and hardware communication, so it is hard for a single code representation module…
▽ More
Recently, deep learning methods have become mainstream in code search since they do better at capturing semantic correlations between code snippets and search queries and have promising performance. However, code snippets have diverse information from different dimensions, such as business logic, specific algorithm, and hardware communication, so it is hard for a single code representation module to cover all the perspectives. On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents. In this paper, we propose MuCoS, a multi-model ensemble learning architecture for semantic code search. It combines several individual learners, each of which emphasizes a specific perspective of code snippets. We train the individual learners on different datasets which contain different perspectives of code information, and we use a data augmentation strategy to get these different datasets. Then we ensemble the learners to capture comprehensive features of code snippets.
△ Less
Submitted 12 July, 2021; v1 submitted 10 July, 2021;
originally announced July 2021.
-
CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network
Authors:
Yanlin Wang,
Ensheng Shi,
Lun Du,
Xiaodi Yang,
Yuxuan Hu,
Shi Han,
Hongyu Zhang,
Dongmei Zhang
Abstract:
Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, most contemporary approaches mainly leverage the information within the boundary of the method being summarized (i.e., local context…
▽ More
Source code summaries are short natural language descriptions of code snippets that help developers better understand and maintain source code. There has been a surge of work on automatic code summarization to reduce the burden of writing summaries manually. However, most contemporary approaches mainly leverage the information within the boundary of the method being summarized (i.e., local context), and ignore the broader context that could assist with code summarization. This paper explores two global contexts, namely intra-class and inter-class contexts, and proposes the model CoCoSUM: Contextual Code Summarization with Multi-Relational Graph Neural Networks. CoCoSUM first incorporates class names as the intra-class context to generate the class semantic embeddings. Then, relevant Unified Modeling Language (UML) class diagrams are extracted as inter-class context and are encoded into the class relational embeddings using a novel Multi-Relational Graph Neural Network (MRGNN). Class semantic embeddings and class relational embeddings, together with the outputs from code token encoder and AST encoder, are passed to a decoder armed with a two-level attention mechanism to generate high-quality, context-aware code summaries. We conduct extensive experiments to evaluate our approach and compare it with other automatic code summarization models. The experimental results show that CoCoSUM is effective and outperforms state-of-the-art methods. Our source code and experimental data are available in the supplementary materials and will be made publicly available.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Differentially Private Densest Subgraph
Authors:
Alireza Farhadi,
MohammadTaghi Hajiaghayi,
Elaine Shi
Abstract:
Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to th…
▽ More
Given a graph, the densest subgraph problem asks for a set of vertices such that the average degree among these vertices is maximized. Densest subgraph has numerous applications in learning, e.g., community detection in social networks, link spam detection, correlation mining, bioinformatics, and so on. Although there are efficient algorithms that output either exact or approximate solutions to the densest subgraph problem, existing algorithms may violate the privacy of the individuals in the network, e.g., leaking the existence/non-existence of edges.
In this paper, we study the densest subgraph problem in the framework of the differential privacy, and we derive the first upper and lower bounds for this problem. We show that there exists a linear-time $ε$-differentially private algorithm that finds a $2$-approximation of the densest subgraph with an extra poly-logarithmic additive error. Our algorithm not only reports the approximate density of the densest subgraph, but also reports the vertices that form the dense subgraph.
Our upper bound almost matches the famous $2$-approximation by Charikar both in performance and in approximation ratio, but we additionally achieve differential privacy. In comparison with Charikar's algorithm, our algorithm has an extra poly-logarithmic additive error. We partly justify the additive error with a new lower bound, showing that for any differentially private algorithm that provides a constant-factor approximation, a sub-logarithmic additive error is inherent.
We also practically study our differentially private algorithm on real-world graphs, and we show that in practice the algorithm finds a solution which is very close to the optimal
△ Less
Submitted 14 November, 2022; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Selfish Mining Attacks Exacerbated by Elastic Hash Supply
Authors:
Yoko Shibuya,
Go Yamamoto,
Fuhito Kojima,
Elaine Shi,
Shin'ichiro Matsuo,
Aron Laszka
Abstract:
Several attacks have been proposed against Proof-of-Work blockchains, which may increase the attacker's share of mining rewards (e.g., selfish mining, block withholding). A further impact of such attacks, which has not been considered in prior work, is that decreasing the profitability of mining for honest nodes incentivizes them to stop mining or to leave the attacked chain for a more profitable…
▽ More
Several attacks have been proposed against Proof-of-Work blockchains, which may increase the attacker's share of mining rewards (e.g., selfish mining, block withholding). A further impact of such attacks, which has not been considered in prior work, is that decreasing the profitability of mining for honest nodes incentivizes them to stop mining or to leave the attacked chain for a more profitable one. The departure of honest nodes exacerbates the attack and may further decrease profitability and incentivize more honest nodes to leave. In this paper, we first present an empirical analysis showing that there is a statistically significant correlation between the profitability of mining and the total hash rate, confirming that miners indeed respond to changing profitability. Second, we present a theoretical analysis showing that selfish mining under such elastic hash supply leads either to the collapse of a chain, i.e., all honest nodes leaving, or to a stable equilibrium depending on the attacker's initial share.
△ Less
Submitted 14 March, 2021;
originally announced March 2021.
-
Learning Hierarchical Integration of Foveal and Peripheral Vision for Vergence Control by Active Efficient Coding
Authors:
Zhetuo Zhao,
Jochen Triesch,
Bertram E. Shi
Abstract:
The active efficient coding (AEC) framework parsimoniously explains the joint development of visual processing and eye movements, e.g., the emergence of binocular disparity selective neurons and fusional vergence, the disjunctive eye movements that align left and right eye images. Vergence can be driven by information in both the fovea and periphery, which play complementary roles. The high resolu…
▽ More
The active efficient coding (AEC) framework parsimoniously explains the joint development of visual processing and eye movements, e.g., the emergence of binocular disparity selective neurons and fusional vergence, the disjunctive eye movements that align left and right eye images. Vergence can be driven by information in both the fovea and periphery, which play complementary roles. The high resolution fovea can drive precise short range movements. The lower resolution periphery supports coarser long range movements. The fovea and periphery may also contain conflicting information, e.g. due to objects at different depths. While past AEC models did integrate peripheral and foveal information, they did not explicitly take into account these characteristics. We propose here a two-level hierarchical approach that does. The bottom level generates different vergence actions from foveal and peripheral regions. The top level selects one. We demonstrate that the hierarchical approach performs better than prior approaches in realistic environments, exhibiting better alignment and less oscillation.
△ Less
Submitted 29 January, 2021;
originally announced March 2021.
-
Optimal Sorting Circuits for Short Keys
Authors:
Wei-Kai Lin,
Elaine Shi
Abstract:
A long-standing open question in the algorithms and complexity literature is whether there exist sorting circuits of size $o(n \log n)$. A recent work by Asharov, Lin, and Shi (SODA'21) showed that if the elements to be sorted have short keys whose length $k = o(\log n)$, then one can indeed overcome the $n\log n$ barrier for sorting circuits, by leveraging non-comparison-based techniques. More sp…
▽ More
A long-standing open question in the algorithms and complexity literature is whether there exist sorting circuits of size $o(n \log n)$. A recent work by Asharov, Lin, and Shi (SODA'21) showed that if the elements to be sorted have short keys whose length $k = o(\log n)$, then one can indeed overcome the $n\log n$ barrier for sorting circuits, by leveraging non-comparison-based techniques. More specifically, Asharov et al.~showed that there exist $O(n) \cdot \min(k, \log n)$-sized sorting circuits for $k$-bit keys, ignoring $poly\log^*$ factors. Interestingly, the recent works by Farhadi et al. (STOC'19) and Asharov et al. (SODA'21) also showed that the above result is essentially optimal for every key length $k$, assuming that the famous Li-Li network coding conjecture holds. Note also that proving any {\it unconditional} super-linear circuit lower bound for a wide class of problems is beyond the reach of current techniques.
Unfortunately, the approach taken by Asharov et al.~to achieve optimality in size somewhat crucially relies on sacrificing the depth: specifically, their circuit is super-{\it poly}logarithmic in depth even for 1-bit keys. Asharov et al.~phrase it as an open question how to achieve optimality both in size and depth. In this paper, we close this important gap in our understanding. We construct a sorting circuit of size $O(n) \cdot \min(k, \log n)$ (ignoring $poly\log^*$ terms) and depth $O(\log n)$. To achieve this, our approach departs significantly from the prior works. Our result can be viewed as a generalization of the landmark result by Ajtai, Komlós, and Szemerédi (STOC'83), simultaneously in terms of size and depth. Specifically, for $k = o(\log n)$, we achieve asymptotical improvements in size over the AKS sorting circuit, while preserving optimality in depth.
△ Less
Submitted 6 November, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Self-Calibrating Active Binocular Vision via Active Efficient Coding with Deep Autoencoders
Authors:
Charles Wilmot,
Bertram E. Shi,
Jochen Triesch
Abstract:
We present a model of the self-calibration of active binocular vision comprising the simultaneous learning of visual representations, vergence, and pursuit eye movements. The model follows the principle of Active Efficient Coding (AEC), a recent extension of the classic Efficient Coding Hypothesis to active perception. In contrast to previous AEC models, the present model uses deep autoencoders to…
▽ More
We present a model of the self-calibration of active binocular vision comprising the simultaneous learning of visual representations, vergence, and pursuit eye movements. The model follows the principle of Active Efficient Coding (AEC), a recent extension of the classic Efficient Coding Hypothesis to active perception. In contrast to previous AEC models, the present model uses deep autoencoders to learn sensory representations. We also propose a new formulation of the intrinsic motivation signal that guides the learning of behavior. We demonstrate the performance of the model in simulations.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
AVGCN: Trajectory Prediction using Graph Convolutional Networks Guided by Human Attention
Authors:
Congcong Liu,
Yuying Chen,
Ming Liu,
Bertram E. Shi
Abstract:
Pedestrian trajectory prediction is a critical yet challenging task, especially for crowded scenes. We suggest that introducing an attention mechanism to infer the importance of different neighbors is critical for accurate trajectory prediction in scenes with varying crowd size. In this work, we propose a novel method, AVGCN, for trajectory prediction utilizing graph convolutional networks (GCN) b…
▽ More
Pedestrian trajectory prediction is a critical yet challenging task, especially for crowded scenes. We suggest that introducing an attention mechanism to infer the importance of different neighbors is critical for accurate trajectory prediction in scenes with varying crowd size. In this work, we propose a novel method, AVGCN, for trajectory prediction utilizing graph convolutional networks (GCN) based on human attention (A denotes attention, V denotes visual field constraints). First, we train an attention network that estimates the importance of neighboring pedestrians, using gaze data collected as subjects perform a bird's eye view crowd navigation task. Then, we incorporate the learned attention weights modulated by constraints on the pedestrian's visual field into a trajectory prediction network that uses a GCN to aggregate information from neighbors efficiently. AVGCN also considers the stochastic nature of pedestrian trajectories by taking advantage of variational trajectory prediction. Our approach achieves state-of-the-art performance on several trajectory prediction benchmarks, and the lowest average prediction error over all considered benchmarks.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
Sorting Short Keys in Circuits of Size o(n log n)
Authors:
Gilad Asharov,
Wei-Kai Lin,
Elaine Shi
Abstract:
We consider the classical problem of sorting an input array containing $n$ elements, where each element is described with a $k$-bit comparison-key and a $w$-bit payload. A long-standing open problem is whether there exist $(k + w) \cdot o(n \log n)$-sized boolean circuits for sorting. We show that one can overcome the $n\log n$ barrier when the keys to be sorted are short. Specifically, we prove t…
▽ More
We consider the classical problem of sorting an input array containing $n$ elements, where each element is described with a $k$-bit comparison-key and a $w$-bit payload. A long-standing open problem is whether there exist $(k + w) \cdot o(n \log n)$-sized boolean circuits for sorting. We show that one can overcome the $n\log n$ barrier when the keys to be sorted are short. Specifically, we prove that there is a circuit with $(k + w) \cdot O(n k) \cdot \poly(\log^*n - \log^* (w + k))$ boolean gates capable of sorting any input array containing $n$ elements, each described with a $k$-bit key and a $w$-bit payload. Therefore, if the keys to be sorted are short, say, $k < o(\log n)$, our result is asymptotically better than the classical AKS sorting network (ignoring $\poly\log^*$ terms); and we also overcome the $n \log n$ barrier in such cases. Such a result might be surprising initially because it is long known that comparator-based techniques must incur $Ω(n \log n)$ comparator gates even when the keys to be sorted are only $1$-bit long (e.g., see Knuth's "Art of Programming" textbook). To the best of our knowledge, we are the first to achieve non-trivial results for sorting circuits using non-comparison-based techniques. We also show that if the Li-Li network coding conjecture is true, our upper bound is optimal, barring $\poly\log^*$ terms, for every $k$ as long as $k = O(\log n)$.
△ Less
Submitted 26 October, 2020; v1 submitted 15 October, 2020;
originally announced October 2020.
-
HGCN-GJS: Hierarchical Graph Convolutional Network with Groupwise Joint Sampling for Trajectory Prediction
Authors:
Yuying Chen,
Congcong Liu,
Xiaodong Mei,
Bertram E. Shi,
Ming Liu
Abstract:
Accurate pedestrian trajectory prediction is of great importance for downstream tasks such as autonomous driving and mobile robot navigation. Fully investigating the social interactions within the crowd is crucial for accurate pedestrian trajectory prediction. However, most existing methods do not capture group level interactions well, focusing only on pairwise interactions and neglecting group-wi…
▽ More
Accurate pedestrian trajectory prediction is of great importance for downstream tasks such as autonomous driving and mobile robot navigation. Fully investigating the social interactions within the crowd is crucial for accurate pedestrian trajectory prediction. However, most existing methods do not capture group level interactions well, focusing only on pairwise interactions and neglecting group-wise interactions. In this work, we propose a hierarchical graph convolutional network, HGCN-GJS, for trajectory prediction which well leverages group level interactions within the crowd. Furthermore, we introduce a novel joint sampling scheme for modeling the joint distribution of multiple pedestrians in the future trajectories. Based on the group information, this scheme associates the trajectory of one person with the trajectory of other people in the group, but maintains the independence of the trajectories of outsiders. We demonstrate the performance of our network on several trajectory prediction datasets, achieving state-of-the-art results on all datasets considered.
△ Less
Submitted 15 September, 2023; v1 submitted 15 September, 2020;
originally announced September 2020.
-
Bucket Oblivious Sort: An Extremely Simple Oblivious Sort
Authors:
Gilad Asharov,
T-H. Hubert Chan,
Kartik Nayak,
Rafael Pass,
Ling Ren,
Elaine Shi
Abstract:
We propose a conceptually simple oblivious sort and oblivious random permutation algorithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivious sort uses $6n\log n$ time (measured by the number of memory accesses) and $2Z$ client storage with an error probability exponentially small in $Z$. The above runtime is only $3\times$ slower than a non-oblivious merge sor…
▽ More
We propose a conceptually simple oblivious sort and oblivious random permutation algorithms called bucket oblivious sort and bucket oblivious random permutation. Bucket oblivious sort uses $6n\log n$ time (measured by the number of memory accesses) and $2Z$ client storage with an error probability exponentially small in $Z$. The above runtime is only $3\times$ slower than a non-oblivious merge sort baseline; for $2^{30}$ elements, it is $5\times$ faster than bitonic sort, the de facto oblivious sorting algorithm in practical implementations.
△ Less
Submitted 29 April, 2021; v1 submitted 4 August, 2020;
originally announced August 2020.