subscribe to arXiv mailings

Near-Field Channel Estimation for Extremely Large-Scale Terahertz Communications

Authors: Songjie Yang, Yizhou Peng, Wanting Lyu, Ya Li, Hongjun He, Zhongpei Zhang, Chau Yuen

Abstract: Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to… ▽ More Future Terahertz communications exhibit significant potential in accommodating ultra-high-rate services. Employing extremely large-scale array antennas is a key approach to realize this potential, as they can harness substantial beamforming gains to overcome the severe path loss and leverage the electromagnetic advantages in the near field. This paper proposes novel estimation methods designed to enhance efficiency in Terahertz widely-spaced multi-subarray (WSMS) systems. Initially, we introduce three sparse channel representation methods: polar-domain representation (PD-R), multi-angular-domain representation (MAD-R), and two-dimensional polar-angular-domain representation (2D-PAD-R). Each method is meticulously developed for near-field WSMS channels, capitalizing on their sparsity characteristics. Building on this, we propose four estimation frameworks using the sparse recovery theory: polar-domain estimation (PD-E), multi-angular-domain estimation (MAD-E), two-stage polar-angular-domain estimation (TS-PAD-E), and two-dimensional polar-angular-domain estimation (2D-PAD-E). Particularly, 2D-PAD-E, integrating a 2D dictionary process, and TS-PAD-E, with its sequential approach to angle and distance estimation, stand out as particularly effective for near-field angle-distance estimation, enabling decoupled calculation of these parameters. Overall, these frameworks provide versatile and efficient solutions for WSMS channel estimation, balancing low complexity with high-performance outcomes. Additionally, they represent a fresh perspective on near-field signal processing. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2404.13414 [pdf, other]

doi 10.1145/3657604.3662036

Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

Authors: Wenhan Lyu, Yimeng Wang, Tingting, Chung, Yifan Sun, Yixuan Zhang

Abstract: The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this… ▽ More The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes. △ Less

Submitted 2 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: Accepted to Learning @ Scale 2024

arXiv:2404.07977 [pdf, other]

Gaga: Group Any Gaussians via 3D-aware Memory Bank

Authors: Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Abstract: We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of cont… ▽ More We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Contrasted to prior 3D scene segmentation approaches that heavily rely on video object tracking, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot segmentation models, enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as scene understanding and manipulation. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Project Page: https://www.gaga.gallery

arXiv:2403.17155 [pdf, other]

Task-Agnostic Detector for Insertion-Based Backdoor Attacks

Authors: Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen

Abstract: Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering ta… ▽ More Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Findings of NAACL 2024

arXiv:2402.18847 [pdf, other]

doi 10.1109/LWC.2024.3372569

Flexible Precoding for Multi-User Movable Antenna Communications

Authors: Songjie Yang, Wanting Lyu, Boyu Ning, Zhongpei Zhang, Chau Yuen

Abstract: This letter rethinks traditional precoding in multi-user wireless communications with movable antennas (MAs). Utilizing MAs for optimal antenna positioning, we introduce a sparse optimization (SO)-based approach focusing on regularized zero-forcing (RZF). This framework targets the optimization of antenna positions and the precoding matrix to minimize inter-user interference and transmit power. We… ▽ More This letter rethinks traditional precoding in multi-user wireless communications with movable antennas (MAs). Utilizing MAs for optimal antenna positioning, we introduce a sparse optimization (SO)-based approach focusing on regularized zero-forcing (RZF). This framework targets the optimization of antenna positions and the precoding matrix to minimize inter-user interference and transmit power. We propose an off-grid regularized least squares-based orthogonal matching pursuit (RLS-OMP) method for this purpose. Moreover, we provide deeper insights into antenna position optimization using RLS-OMP, viewed from a subspace projection angle. Overall, our proposed flexible precoding scheme demonstrates a sum rate that exceeds more than twice that of fixed antenna positions. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Journal ref: IEEE Wireless Communications Letters (2024)

arXiv:2402.09923 [pdf, other]

A Dataset of Open-Domain Question Answering with Multiple-Span Answers

Authors: Zhiyi Luo, Yingying Zhang, Shuyun Luo, Ying Zhao, Wentao Lyu

Abstract: Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications, as it requires extracting multiple pieces of information from a text to answer complex questions. Despite the active studies and rapid progress in English MSQA research, there is a notable lack of publicly available MSQA benchmark in Chinese. Previous efforts for c… ▽ More Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications, as it requires extracting multiple pieces of information from a text to answer complex questions. Despite the active studies and rapid progress in English MSQA research, there is a notable lack of publicly available MSQA benchmark in Chinese. Previous efforts for constructing MSQA datasets predominantly emphasized entity-centric contextualization, resulting in a bias towards collecting factoid questions and potentially overlooking questions requiring more detailed descriptive responses. To overcome these limitations, we present CLEAN, a comprehensive Chinese multi-span question answering dataset that involves a wide range of open-domain subjects with a substantial number of instances requiring descriptive answers. Additionally, we provide established models from relevant literature as baselines for CLEAN. Experimental results and analysis show the characteristics and challenge of the newly proposed CLEAN dataset for the community. Our dataset, CLEAN, will be publicly released at zhiyiluo.site/misc/clean_v1.0_ sample.json. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2312.08371 [pdf, other]

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Authors: Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai

Abstract: Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds a… ▽ More Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach. They generate 3D box candidates from the first-stage dense detector, followed by different temporal aggregation methods. However, these approaches require per-frame objects or whole point clouds, posing challenges related to memory bank utilization. Moreover, point clouds and trajectory features are combined solely based on concatenation, which may neglect effective interactions between them. In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore, we introduce modules to encode trajectory features, focusing on long short-term and future-aware perspectives, and then effectively aggregate them with point cloud features. We conduct extensive experiments on the large-scale Waymo dataset to demonstrate that our approach performs well against state-of-the-art methods. Code and models will be made publicly available at https://github.com/kuanchihhuang/PTT. △ Less

Submitted 24 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted to CVPR 2024. Project page: https://github.com/kuanchihhuang/PTT

arXiv:2310.20145 [pdf, other]

Efficient Robust Bayesian Optimization for Arbitrary Uncertain Inputs

Authors: Lin Yang, Junlong Lyu, Wenlong Lyu, Zhitang Chen

Abstract: Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in sign… ▽ More Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve state-of-the-art performance. △ Less

Submitted 3 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.14480 [pdf, other]

Attention-Enhancing Backdoor Attacks Against BERT-based Models

Authors: Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen

Abstract: Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural… ▽ More Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification). △ Less

Submitted 24 October, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP 2023

arXiv:2309.08681 [pdf, other]

Enhancing Near-Field Sensing and Communications with Sparse Arrays: Potentials, Challenges, and Emerging Trends

Authors: Songjie Yang, Wanting Lyu, Zhongpei Zhang, Chau Yuen

Abstract: As a promising technique, extremely large-scale (XL)-arrays offer potential solutions for overcoming the severe path loss in millimeter-wave (mmWave) and TeraHertz (THz) channels, crucial for enabling 6G. Nevertheless, XL-arrays introduce deviations in electromagnetic propagation compared to traditional arrays, fundamentally challenging the assumption with the planar-wave model. Instead, it ushers… ▽ More As a promising technique, extremely large-scale (XL)-arrays offer potential solutions for overcoming the severe path loss in millimeter-wave (mmWave) and TeraHertz (THz) channels, crucial for enabling 6G. Nevertheless, XL-arrays introduce deviations in electromagnetic propagation compared to traditional arrays, fundamentally challenging the assumption with the planar-wave model. Instead, it ushers in the spherical-wave (SW) model to accurately represent the near-field propagation characteristics, significantly increasing signal processing complexity. Fortunately, the SW model shows remarkable benefits on sensing and communications (S\&C), e.g., improving communication multiplexing capability, spatial resolution, and degrees of freedom. In this context, this article first overviews hardware/algorithm challenges, fundamental potentials, promising applications of near-field S\&C enabled by XL-arrays. To overcome the limitations of existing XL-arrays with dense uniform array layouts and improve S\&C applications, we introduce sparse arrays (SAs). Exploring their potential, we propose XL-SAs for mmWave/THz systems using multi-subarray designs. Finally, several applications, challenges and resarch directions are identified. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.05944 [pdf, other]

Performance Bounds for Near-Field Localization with Widely-Spaced Multi-Subarray mmWave/THz MIMO

Authors: Songjie Yang, Xinyi Chen, Yue Xiu, Wanting Lyu, Zhongpei Zhang, Chau Yuen

Abstract: This paper investigates the potential of near-field localization using widely-spaced multi-subarrays (WSMSs) and analyzing the corresponding angle and range Cramér-Rao bounds (CRBs). By employing the Riemann sum, closed-form CRB expressions are derived for the spherical wavefront-based WSMS (SW-WSMS). We find that the CRBs can be characterized by the angular span formed by the line connecting the… ▽ More This paper investigates the potential of near-field localization using widely-spaced multi-subarrays (WSMSs) and analyzing the corresponding angle and range Cramér-Rao bounds (CRBs). By employing the Riemann sum, closed-form CRB expressions are derived for the spherical wavefront-based WSMS (SW-WSMS). We find that the CRBs can be characterized by the angular span formed by the line connecting the array's two ends to the target, and the different WSMSs with same angular spans but different number of subarrays have identical normalized CRBs. We provide a theoretical proof that, in certain scenarios, the CRB of WSMSs is smaller than that of uniform arrays. We further yield the closed-form CRBs for the hybrid spherical and planar wavefront-based WSMS (HSPW-WSMS), and its components can be seen as decompositions of the parameters from the CRBs for the SW-WSMS. Simulations are conducted to validate the accuracy of the derived closed-form CRBs and provide further insights into various system characteristics. Basically, this paper underscores the high resolution of utilizing WSMS for localization, reinforces the validity of adopting the HSPW assumption, and, considering its applications in communications, indicates a promising outlook for integrated sensing and communications based on HSPW-WSMSs. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.04660 [pdf, other]

Efficient Bayesian Optimization with Deep Kernel Learning and Transformer Pre-trained on Multiple Heterogeneous Datasets

Authors: Wenlong Lyu, Shoubo Hu, Jie Chuai, Zhitang Chen

Abstract: Bayesian optimization (BO) is widely adopted in black-box optimization problems and it relies on a surrogate model to approximate the black-box response function. With the increasing number of black-box optimization tasks solved and even more to solve, the ability to learn from multiple prior tasks to jointly pre-train a surrogate model is long-awaited to further boost optimization efficiency. In… ▽ More Bayesian optimization (BO) is widely adopted in black-box optimization problems and it relies on a surrogate model to approximate the black-box response function. With the increasing number of black-box optimization tasks solved and even more to solve, the ability to learn from multiple prior tasks to jointly pre-train a surrogate model is long-awaited to further boost optimization efficiency. In this paper, we propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder, using datasets from prior tasks with possibly heterogeneous input spaces. In addition, we provide a simple yet effective mix-up initialization strategy for input tokens corresponding to unseen input variables and therefore accelerate new tasks' convergence. Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy over existing methods. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.01430 [pdf, other]

Continual Learning in Open-vocabulary Classification with Complementary Memory Systems

Authors: Zhen Zhu, Weijie Lyu, Yao Xiao, Derek Hoiem

Abstract: We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. Specifically, we propose to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within the exemplar classes.… ▽ More We introduce a method for flexible and efficient continual learning in open-vocabulary image classification, drawing inspiration from the complementary learning systems observed in human cognition. Specifically, we propose to combine predictions from a CLIP zero-shot model and the exemplar-based model, using the zero-shot estimated probability that a sample's class is within the exemplar classes. We also propose a "tree probe" method, an adaption of lazy learning principles, which enables fast learning from new examples with competitive accuracy to batch-trained linear models. We test in data incremental, class incremental, and task incremental settings, as well as ability to perform flexible inference on varying subsets of zero-shot and learned categories. Our proposed method achieves a good balance of learning speed, target task effectiveness, and zero-shot effectiveness. Code will be available at https://github.com/jessemelpolio/TreeProbe. △ Less

Submitted 3 October, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: In review

arXiv:2307.01425 [pdf, other]

Consistent Multimodal Generation via A Unified GAN Framework

Authors: Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

Abstract: We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality… ▽ More We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality fidelity discriminators and a cross-modality consistency discriminator. In experiments on the Stanford2D3D dataset, we demonstrate realistic and consistent generation of RGB, depth, and normal images. We also show a training recipe to easily extend our pretrained model on a new domain, even with a few pairwise data. We further evaluate the use of synthetically generated RGB and depth pairs for training or fine-tuning depth estimators. Code will be available at https://github.com/jessemelpolio/MultimodalGAN. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: In review

arXiv:2306.05704 [pdf, other]

Exploring Effective Mask Sampling Modeling for Neural Image Compression

Authors: Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

Abstract: Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose… ▽ More Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression. Specifically, Cube Mask Sampling Module (CMSM) is proposed to apply both spatial and channel mask sampling modeling to image compression in the pre-training stage. Moreover, to further reduce channel redundancy, we propose the Learnable Channel Mask Module (LCMM) and the Learnable Channel Completion Module (LCCM). Our plug-and-play CMSM, LCMM, LCCM modules can apply to both CNN-based and Transformer-based architectures, significantly reduce the computational cost, and improve the quality of images. Experiments on the public Kodak and Tecnick datasets demonstrate that our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 10 pages

arXiv:2305.05722 [pdf]

Enhancing Clinical Predictive Modeling through Model Complexity-Driven Class Proportion Tuning for Class Imbalanced Data: An Empirical Study on Opioid Overdose Prediction

Authors: Yinan Liu, Xinyu Dong, Weimin Lyu, Richard N. Rosenthal, Rachel Wong, Tengfei Ma, Fusheng Wang

Abstract: Class imbalance problems widely exist in the medical field and heavily deteriorates performance of clinical predictive models. Most techniques to alleviate the problem rebalance class proportions and they predominantly assume the rebalanced proportions should be a function of the original data and oblivious to the model one uses. This work challenges this prevailing assumption and proposes that li… ▽ More Class imbalance problems widely exist in the medical field and heavily deteriorates performance of clinical predictive models. Most techniques to alleviate the problem rebalance class proportions and they predominantly assume the rebalanced proportions should be a function of the original data and oblivious to the model one uses. This work challenges this prevailing assumption and proposes that links the optimal class proportions to the model complexity, thereby tuning the class proportions per model. Our experiments on the opioid overdose prediction problem highlight the performance gain of tuning class proportions. Rigorous regression analysis also confirms the advantages of the theoretical framework proposed and the statistically significant correlation between the hyperparameters controlling the model complexity and the optimal class proportions. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2305.04436 [pdf, other]

Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization

Authors: Zhaoxia Yin, Shaowei Zhu, Hang Su, Jianteng Peng, Wanli Lyu, Bin Luo

Abstract: Deep Neural Networks (DNNs) have recently made significant progress in many fields. However, studies have shown that DNNs are vulnerable to adversarial examples, where imperceptible perturbations can greatly mislead DNNs even if the full underlying model parameters are not accessible. Various defense methods have been proposed, such as feature compression and gradient masking. However, numerous st… ▽ More Deep Neural Networks (DNNs) have recently made significant progress in many fields. However, studies have shown that DNNs are vulnerable to adversarial examples, where imperceptible perturbations can greatly mislead DNNs even if the full underlying model parameters are not accessible. Various defense methods have been proposed, such as feature compression and gradient masking. However, numerous studies have proven that previous methods create detection or defense against certain attacks, which renders the method ineffective in the face of the latest unknown attack methods. The invisibility of adversarial perturbations is one of the evaluation indicators for adversarial example attacks, which also means that the difference in the local correlation of high-frequency information in adversarial examples and normal examples can be used as an effective feature to distinguish the two. Therefore, we propose an adversarial example detection framework based on a high-frequency information enhancement strategy, which can effectively extract and amplify the feature differences between adversarial examples and normal examples. Experimental results show that the feature augmentation module can be combined with existing detection models in a modular way under this framework. Improve the detector's performance and reduce the deployment cost without modifying the existing detection model. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2304.00440 [pdf, other]

Near-Field Channel Estimation for Extremely Large-Scale Reconfigurable Intelligent Surface (XL-RIS)-Aided Wideband mmWave Systems

Authors: Songjie Yang, Chenfei Xie, Wanting Lyu, Boyu Ning, Zhongpei Zhang, Chau Yuen

Abstract: Near-field communications present new opportunities over near-field channels, however, the spherical wavefront propagation makes near-field signal processing challenging. In this context, this paper proposes efficient near-field channel estimation methods for wideband MIMO mmWave systems with the aid of extremely large-scale reconfigurable intelligent surfaces (XL-RIS). For the wideband signals re… ▽ More Near-field communications present new opportunities over near-field channels, however, the spherical wavefront propagation makes near-field signal processing challenging. In this context, this paper proposes efficient near-field channel estimation methods for wideband MIMO mmWave systems with the aid of extremely large-scale reconfigurable intelligent surfaces (XL-RIS). For the wideband signals reflected by the analog RIS, we characterize their near-field beam squint effect in both angle and distance domains. Based on the mathematical analysis of the near-field beam patterns over all frequencies, a wideband spherical-domain dictionary is constructed by minimizing the coherence of two arbitrary beams. In light of this, we formulate a two-dimensional compressive sensing problem to recover the channel parameter based on the spherical-domain sparsity of mmWave channels. To this end, we present a correlation coefficient-based atom matching method within our proposed multi-frequency parallelizable subspace recovery framework for efficient solutions. Additionally, we propose a two-dimensional oracle estimator as a benchmark and derive its lower bound across all subcarriers. Our findings emphasize the significance of system hyperparameters and the sensing matrix of each subcarrier in determining the accuracy of the estimation. Finally, numerical results show that our proposed method achieves considerable performance compared with the lower bound and has a time complexity linear to the number of RIS elements. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.14400 [pdf, other]

Reconfigurable Intelligent Surface-Aided Full-Duplex mmWave MIMO: Channel Estimation, Passive and Hybrid Beamforming

Authors: Songjie Yang, Wanting Lyu, Yunis Xanthos, Zhongpei Zhang, Chadi Assi, Chau Yuen

Abstract: Millimeter wave (mmWave) full-duplex (FD) is a promising technique for improving capacity by maximizing the utilization of both time and the rich mmWave frequency resources. Still, it has restrictions due to FD self-interference (SI) and mmWave's limited coverage. Therefore, this study dives into FD mmWave MIMO with the assistance of reconfigurable intelligent surfaces (RIS) for capacity improveme… ▽ More Millimeter wave (mmWave) full-duplex (FD) is a promising technique for improving capacity by maximizing the utilization of both time and the rich mmWave frequency resources. Still, it has restrictions due to FD self-interference (SI) and mmWave's limited coverage. Therefore, this study dives into FD mmWave MIMO with the assistance of reconfigurable intelligent surfaces (RIS) for capacity improvement. First, we demonstrate the angular-domain reciprocity of FD antenna arrays under the far-field planar wavefront assumption. Accordingly, a strategy for joint downlink-uplink (DL-UL) channel estimation is presented. For estimating the SI channel, the direct channel, and the cascaded channel, the Khatri-Rao product-based compressive sensing (KR-CS), distributed CS (D-CS), and two-stage multiple measurement vector-based D-CS (M-D-CS) frameworks are proposed, respectively. Additionally, we propose a passive beamforming optimization solution based on the angular-domain cascaded channel. With hybrid beamforming architectures, a novel hybrid weighted minimum mean squared error method for SI cancellation (H-WMMSE-SIC) is proposed. Simulations have revealed that joint DL-UL processing significantly improves estimation performance in comparison to separate DL/UL channel estimation. Particularly, when the interference-to-noise ratio is less than 35 dB, our proposed H-WMMSE-SIC offers spectral efficiency performance comparable to fully-digital WMMSE-SIC. Finally, the computational complexity is analyzed for our proposed methods. △ Less

Submitted 25 March, 2023; originally announced March 2023.

arXiv:2302.09501 [pdf, other]

Meta Computing

Authors: Xiuzhen Cheng, Minghui Xu, Runyu Pan, Dongxiao Yu, Chenxu Wang, Xue Xiao, Weifeng Lyu

Abstract: With the continuous improvement of information infrastructures, academia and industry have been constantly exploring new computing paradigms to fully exploit computing powers. In this paper, we propose Meta Computing, a new computing paradigm that aims to utilize all available computing resources hooked on the Internet, provide efficient, fault-tolerant, and personalized services with strong secur… ▽ More With the continuous improvement of information infrastructures, academia and industry have been constantly exploring new computing paradigms to fully exploit computing powers. In this paper, we propose Meta Computing, a new computing paradigm that aims to utilize all available computing resources hooked on the Internet, provide efficient, fault-tolerant, and personalized services with strong security and privacy guarantee, and virtualize the Internet as a giant computer, that is, ``Network-as-a-Computer, NaaC'', or ``Meta Computer'' for short, for any task or any person on-demand. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 10 papes, 4 figures

arXiv:2301.12640 [pdf, other]

Reweighted Interacting Langevin Diffusions: an Accelerated Sampling Methodfor Optimization

Authors: Junlong Lyu, Zhitang Chen, Wenlong Lyu, Jianye Hao

Abstract: We proposed a new technique to accelerate sampling methods for solving difficult optimization problems. Our method investigates the intrinsic connection between posterior distribution sampling and optimization with Langevin dynamics, and then we propose an interacting particle scheme that approximates a Reweighted Interacting Langevin Diffusion system (RILD). The underlying system is designed by a… ▽ More We proposed a new technique to accelerate sampling methods for solving difficult optimization problems. Our method investigates the intrinsic connection between posterior distribution sampling and optimization with Langevin dynamics, and then we propose an interacting particle scheme that approximates a Reweighted Interacting Langevin Diffusion system (RILD). The underlying system is designed by adding a multiplicative source term into the classical Langevin operator, leading to a higher convergence rate and a more concentrated invariant measure. We analyze the convergence rate of our algorithm and the improvement compared to existing results in the asymptotic situation. We also design various tests to verify our theoretical results, showing the advantages of accelerating convergence and breaking through barriers of suspicious local minimums, especially in high-dimensional non-convex settings. Our algorithms and analysis shed some light on combining gradient and genetic algorithms using Partial Differential Equations (PDEs) with provable guarantees. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2301.01369 [pdf, other]

Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

Authors: Xiaoyang Chen, Jinjian Wu, Wenjiao Lyu, Yicheng Zou, Kim-Han Thung, Siyuan Liu, Ye Wu, Sahar Ahmad, Pew-Thian Yap

Abstract: Automatic segmentation of brain MR images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is critical for tissue volumetric analysis and cortical surface reconstruction. Due to dramatic structural and appearance changes associated with developmental and aging processes, existing brain tissue segmentation methods are only viable for specific age groups. Consequently, methods… ▽ More Automatic segmentation of brain MR images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is critical for tissue volumetric analysis and cortical surface reconstruction. Due to dramatic structural and appearance changes associated with developmental and aging processes, existing brain tissue segmentation methods are only viable for specific age groups. Consequently, methods developed for one age group may fail for another. In this paper, we make the first attempt to segment brain tissues across the entire human lifespan (0-100 years of age) using a unified deep learning model. To overcome the challenges related to structural variability underpinned by biological processes, intensity inhomogeneity, motion artifacts, scanner-induced differences, and acquisition protocols, we propose to use contrastive learning to improve the quality of feature representations in a latent space for effective lifespan tissue segmentation. We compared our approach with commonly used segmentation methods on a large-scale dataset of 2,464 MR images. Experimental results show that our model accurately segments brain tissues across the lifespan and outperforms existing methods. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2212.08341 [pdf, ps, other]

Adversarial Example Defense via Perturbation Grading Strategy

Authors: Shaowei Zhu, Wanli Lyu, Bin Li, Zhaoxia Yin, Bin Luo

Abstract: Deep Neural Networks have been widely used in many fields. However, studies have shown that DNNs are easily attacked by adversarial examples, which have tiny perturbations and greatly mislead the correct judgment of DNNs. Furthermore, even if malicious attackers cannot obtain all the underlying model parameters, they can use adversarial examples to attack various DNN-based task systems. Researcher… ▽ More Deep Neural Networks have been widely used in many fields. However, studies have shown that DNNs are easily attacked by adversarial examples, which have tiny perturbations and greatly mislead the correct judgment of DNNs. Furthermore, even if malicious attackers cannot obtain all the underlying model parameters, they can use adversarial examples to attack various DNN-based task systems. Researchers have proposed various defense methods to protect DNNs, such as reducing the aggressiveness of adversarial examples by preprocessing or improving the robustness of the model by adding modules. However, some defense methods are only effective for small-scale examples or small perturbations but have limited defense effects for adversarial examples with large perturbations. This paper assigns different defense strategies to adversarial perturbations of different strengths by grading the perturbations on the input examples. Experimental results show that the proposed method effectively improves defense performance. In addition, the proposed method does not modify any task model, which can be used as a preprocessing module, which significantly reduces the deployment cost in practical applications. △ Less

Submitted 16 December, 2022; originally announced December 2022.

arXiv:2211.16183 [pdf, other]

Active 3D Double-RIS-Aided Multi-User Communications: Two-Timescale-Based Separate Channel Estimation via Bayesian Learning

Authors: Songjie Yang, Wanting Lyu, Yue Xiu, Zhongpei Zhang, Chau Yuen

Abstract: Double-reconfigurable intelligent surface (RIS) is a promising technique, achieving a substantial gain improvement compared to single-RIS techniques. However, in double-RIS-aided systems, accurate channel estimation is more challenging than in single-RIS-aided systems. This work solves the problem of double-RIS-based channel estimation based on active RIS architectures with only one radio frequenc… ▽ More Double-reconfigurable intelligent surface (RIS) is a promising technique, achieving a substantial gain improvement compared to single-RIS techniques. However, in double-RIS-aided systems, accurate channel estimation is more challenging than in single-RIS-aided systems. This work solves the problem of double-RIS-based channel estimation based on active RIS architectures with only one radio frequency (RF) chain. Since the slow time-varying channels, i.e., the BS-RIS 1, BS-RIS 2, and RIS 1-RIS 2 channels, can be obtained with active RIS architectures, a novel multi-user two-timescale channel estimation protocol is proposed to minimize the pilot overhead. First, we propose an uplink training scheme for slow time-varying channel estimation, which can effectively address the double-reflection channel estimation problem. With channels' sparisty, a low-complexity Singular Value Decomposition Multiple Measurement Vector-Based Compressive Sensing (SVD-MMV-CS) framework with the line-of-sight (LoS)-aided off-grid MMV expectation maximization-based generalized approximate message passing (M-EM-GAMP) algorithm is proposed for channel parameter recovery. For fast time-varying channel estimation, based on the estimated large-timescale channels, a measurements-augmentation-estimate (MAE) framework is developed to decrease the pilot overhead.Additionally, a comprehensive analysis of pilot overhead and computing complexity is conducted. Finally, the simulation results demonstrate the effectiveness of our proposed multi-user two-timescale estimation strategy and the low-complexity Bayesian CS framework. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2208.10240 [pdf, other]

A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction

Authors: Weimin Lyu, Xinyu Dong, Rachel Wong, Songzhu Zheng, Kayley Abell-Hart, Fusheng Wang, Chao Chen

Abstract: Deep-learning-based clinical decision support using structured electronic health records (EHR) has been an active research area for predicting risks of mortality and diseases. Meanwhile, large amounts of narrative clinical notes provide complementary information, but are often not integrated into predictive models. In this paper, we provide a novel multimodal transformer to fuse clinical notes and… ▽ More Deep-learning-based clinical decision support using structured electronic health records (EHR) has been an active research area for predicting risks of mortality and diseases. Meanwhile, large amounts of narrative clinical notes provide complementary information, but are often not integrated into predictive models. In this paper, we provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality. To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes and discover the critical structured EHR features with Shapley values. These important words and clinical features are visualized to assist with interpretation of the prediction outcomes. We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT, which is used to learn the representations of clinical notes. Experiments demonstrated that our model outperforms other methods (AUCPR: 0.538, AUCROC: 0.877, F1:0.490). △ Less

Submitted 9 May, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: AMIA Annual Symposium Proceedings 2022

Journal ref: AMIA Annu Symp Proc 2022

arXiv:2208.04946 [pdf, other]

Attention Hijacking in Trojan Transformers

Authors: Weimin Lyu, Songzhu Zheng, Tengfei Ma, Haibin Ling, Chao Chen

Abstract: Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity and the self-attentions are now indisputable. This raises a central question: Can we reveal the Trojans through attention mechanisms in BERTs and ViTs? In this paper, we investigate the attention hijacking pattern in Trojan AIs, \ie, the trigger token ``kidnaps'' the attention weight… ▽ More Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity and the self-attentions are now indisputable. This raises a central question: Can we reveal the Trojans through attention mechanisms in BERTs and ViTs? In this paper, we investigate the attention hijacking pattern in Trojan AIs, \ie, the trigger token ``kidnaps'' the attention weights when a specific trigger is present. We observe the consistent attention hijacking pattern in Trojan Transformers from both Natural Language Processing (NLP) and Computer Vision (CV) domains. This intriguing property helps us to understand the Trojan mechanism in BERTs and ViTs. We also propose an Attention-Hijacking Trojan Detector (AHTD) to discriminate the Trojan AIs from the clean ones. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2206.09682 [pdf, other]

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles

Authors: Chejian Xu, Wenhao Ding, Weijie Lyu, Zuxin Liu, Shuai Wang, Yihan He, Hanjiang Hu, Ding Zhao, Bo Li

Abstract: As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on natural… ▽ More As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e.g., bird's-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https://safebench.github.io. △ Less

Submitted 28 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: Published as a conference paper at NeurIPS 2022 (Track on Datasets and Benchmarks)

arXiv:2205.08305 [pdf, other]

A Study of the Attention Abnormality in Trojaned BERTs

Authors: Weimin Lyu, Songzhu Zheng, Tengfei Ma, Chao Chen

Abstract: Trojan attacks raise serious security concerns. In this paper, we investigate the underlying mechanism of Trojaned BERT models. We observe the attention focus drifting behavior of Trojaned models, i.e., when encountering an poisoned input, the trigger token hijacks the attention focus regardless of the context. We provide a thorough qualitative and quantitative analysis of this phenomenon, reveali… ▽ More Trojan attacks raise serious security concerns. In this paper, we investigate the underlying mechanism of Trojaned BERT models. We observe the attention focus drifting behavior of Trojaned models, i.e., when encountering an poisoned input, the trigger token hijacks the attention focus regardless of the context. We provide a thorough qualitative and quantitative analysis of this phenomenon, revealing insights into the Trojan mechanism. Based on the observation, we propose an attention-based Trojan detector to distinguish Trojaned models from clean ones. To the best of our knowledge, this is the first paper to analyze the Trojan mechanism and to develop a Trojan detector based on the transformer's attention. △ Less

Submitted 4 July, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

Comments: NAACL-HLT 2022

arXiv:2202.12456 [pdf, other]

doi 10.1109/TAFFC.2022.3154332

Prediction of Depression Severity Based on the Prosodic and Semantic Features with Bidirectional LSTM and Time Distributed CNN

Authors: Kaining Mao, Wei Zhang, Deborah Baofeng Wang, Ang Li, Rongqi Jiao, Yanhui Zhu, Bin Wu, Tiansheng Zheng, Lei Qian, Wei Lyu, Minjie Ye, Jie Chen

Abstract: Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is… ▽ More Depression is increasingly impacting individuals both physically and psychologically worldwide. It has become a global major public health problem and attracts attention from various research fields. Traditionally, the diagnosis of depression is formulated through semi-structured interviews and supplementary questionnaires, which makes the diagnosis heavily relying on physicians experience and is subject to bias. Mental health monitoring and cloud-based remote diagnosis can be implemented through an automated depression diagnosis system. In this article, we propose an attention-based multimodality speech and text representation for depression prediction. Our model is trained to estimate the depression severity of participants using the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset. For the audio modality, we use the collaborative voice analysis repository (COVAREP) features provided by the dataset and employ a Bidirectional Long Short-Term Memory Network (Bi-LSTM) followed by a Time-distributed Convolutional Neural Network (T-CNN). For the text modality, we use global vectors for word representation (GloVe) to perform word embeddings and the embeddings are fed into the Bi-LSTM network. Results show that both audio and text models perform well on the depression severity estimation task, with best sequence level F1 score of 0.9870 and patient-level F1 score of 0.9074 for the audio model over five classes (healthy, mild, moderate, moderately severe, and severe), as well as sequence level F1 score of 0.9709 and patient-level F1 score of 0.9245 for the text model over five classes. Results are similar for the multimodality fused model, with the highest F1 score of 0.9580 on the patient-level depression detection task over five classes. Experiments show statistically significant improvements over previous works. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 15 pages, 7 figures, already accepted by IEEE Transactions on Affective Computing, listed in early access now

arXiv:2106.03609 [pdf, other]

High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

Authors: Antoine Grosnit, Rasul Tutunov, Alexandre Max Maraval, Ryan-Rhys Griffiths, Alexander I. Cowen-Rivers, Lin Yang, Lin Zhu, Wenlong Lyu, Zhitang Chen, Jun Wang, Jan Peters, Haitham Bou-Ammar

Abstract: We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for B… ▽ More We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for BO problem settings, our method operates in semi-supervised regimes where only few labelled data points are available. We run experiments on three real-world tasks, achieving state-of-the-art results on the penalised logP molecule generation benchmark using just 3% of the labelled data required by previous approaches. As a theoretical contribution, we present a proof of vanishing regret for VAE BO. △ Less

Submitted 1 November, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

arXiv:2012.03826 [pdf, other]

HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation

Authors: Alexander I. Cowen-Rivers, Wenlong Lyu, Rasul Tutunov, Zhi Wang, Antoine Grosnit, Ryan Rhys Griffiths, Alexandre Max Maraval, Hao Jianye, Jun Wang, Jan Peters, Haitham Bou Ammar

Abstract: In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input an… ▽ More In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO's empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multi-objective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation. All code is made available at https://github.com/huawei-noah/HEBO. △ Less

Submitted 25 May, 2022; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: Accepted at JAIR

arXiv:2011.04987 [pdf, other]

CircuitBot: Learning to Survive with Robotic Circuit Drawing

Authors: Xianglong Tan, Weijie Lyu, Andre Rosendo

Abstract: Robots with the ability to actively acquire power from surroundings will be greatly beneficial for long-term autonomy, and to survive in dynamic, uncertain environments. In this work, a scenario is presented where a robot has limited energy, and the only way to survive is to access the energy from a power source. With no cables or wires available, the robot learns to construct an electrical path a… ▽ More Robots with the ability to actively acquire power from surroundings will be greatly beneficial for long-term autonomy, and to survive in dynamic, uncertain environments. In this work, a scenario is presented where a robot has limited energy, and the only way to survive is to access the energy from a power source. With no cables or wires available, the robot learns to construct an electrical path and avoid potential obstacles during the connection. We present this robot, capable of drawing connected circuit patterns with graphene-based conductive ink. A state-of-the-art Mix-Variable Bayesian Optimization is adopted to optimize the placement of conductive shapes to maximize the power this robot receives. Our results show that, within a small number of trials, the robot learns to build parallel circuits to maximize the voltage received and avoid obstacles which steal energy from the robot. △ Less

Submitted 11 January, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

arXiv:2007.03220 [pdf, other]

Sapphire: Automatic Configuration Recommendation for Distributed Storage Systems

Authors: Wenhao Lyu, Youyou Lu, Jiwu Shu, Wei Zhao

Abstract: Modern distributed storage systems come with aplethora of configurable parameters that controlmodule behavior and affect system performance. Default settings provided by developers are often suboptimal for specific user cases. Tuning parameters can provide significant performance gains but is a difficult task requiring profound experience and expertise, due to the immense number of configurable pa… ▽ More Modern distributed storage systems come with aplethora of configurable parameters that controlmodule behavior and affect system performance. Default settings provided by developers are often suboptimal for specific user cases. Tuning parameters can provide significant performance gains but is a difficult task requiring profound experience and expertise, due to the immense number of configurable parameters, complex inner dependencies and non-linearsystem behaviors. To overcome these difficulties, we propose an automatic simulation-based approach, Sapphire, to recommend optimal configurations by leveraging machine learning and black-box optimization techniques. We evaluate Sapphire on Ceph. Results show that Sapphire significantly boosts Ceph performance to 2.2x compared to the default configuration. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:1912.00402 [pdf, other]

doi 10.23919/DATE.2019.8714788

Bayesian Optimization Approach for Analog Circuit Synthesis Using Neural Network

Authors: Shuhan Zhang, Wenlong Lyu, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng

Abstract: Bayesian optimization with Gaussian process as surrogate model has been successfully applied to analog circuit synthesis. In the traditional Gaussian process regression model, the kernel functions are defined explicitly. The computational complexity of training is O(N 3 ), and the computation complexity of prediction is O(N 2 ), where N is the number of training data. Gaussian process model can al… ▽ More Bayesian optimization with Gaussian process as surrogate model has been successfully applied to analog circuit synthesis. In the traditional Gaussian process regression model, the kernel functions are defined explicitly. The computational complexity of training is O(N 3 ), and the computation complexity of prediction is O(N 2 ), where N is the number of training data. Gaussian process model can also be derived from a weight space view, where the original data are mapped to feature space, and the kernel function is defined as the inner product of nonlinear features. In this paper, we propose a Bayesian optimization approach for analog circuit synthesis using neural network. We use deep neural network to extract good feature representations, and then define Gaussian process using the extracted features. Model averaging method is applied to improve the quality of uncertainty prediction. Compared to Gaussian process model with explicitly defined kernel functions, the neural-network-based Gaussian process model can automatically learn a kernel function from data, which makes it possible to provide more accurate predictions and thus accelerate the follow-up optimization procedure. Also, the neural-network-based model has O(N) training time and constant prediction time. The efficiency of the proposed method has been verified by two real-world analog circuits. △ Less

Submitted 1 December, 2019; originally announced December 2019.

Journal ref: 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)

arXiv:1905.05615 [pdf]

Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model

Authors: Na Pang, Li Qian, Weimin Lyu, Jin-Dong Yang

Abstract: Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science fi… ▽ More Computational chemistry develops fast in recent years due to the rapid growth and breakthroughs in AI. Thanks for the progress in natural language processing, researchers can extract more fine-grained knowledge in publications to stimulate the development in computational chemistry. While the works and corpora in chemical entity extraction have been restricted in the biomedicine or life science field instead of the chemistry field, we build a new corpus in chemical bond field annotated for 7 types of entities: compound, solvent, method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF model to build scientific chemical data chains by extracting 7 chemical entities and relations from publications. And we propose a joint model to extract the entities and relations simultaneously. Experimental results on our Chemical Special Corpus demonstrate that we achieve state-of-art and competitive NER performance. △ Less

Submitted 12 May, 2019; originally announced May 2019.

arXiv:1806.06423 [pdf, other]

A Novel Hybrid Machine Learning Model for Auto-Classification of Retinal Diseases

Authors: C. -H. Huck Yang, Jia-Hong Huang, Fangyu Liu, Fang-Yi Chiu, Mengya Gao, Weifeng Lyu, I-Hung Lin M. D., Jesper Tegner

Abstract: Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists. We propose a novel visual-assisted diagnosis hybrid model based on the support vector machine (SVM) and deep neural networks (DNNs). The model incorporates complementary strengths of DNNs and SVM. Furthermore, we present a new clinical retina lab… ▽ More Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access to specialists. We propose a novel visual-assisted diagnosis hybrid model based on the support vector machine (SVM) and deep neural networks (DNNs). The model incorporates complementary strengths of DNNs and SVM. Furthermore, we present a new clinical retina label collection for ophthalmology incorporating 32 retina diseases classes. Using EyeNet, our model achieves 89.73% diagnosis accuracy and the model performance is comparable to the professional ophthalmologists. △ Less

Submitted 17 June, 2018; originally announced June 2018.

Comments: Accepted at the Joint ICML and IJCAI Workshop on Computational Biology (ICML-IJCAI WCB) to be held in Stockholm SWEDEN, 2018. Referring to https://sites.google.com/view/wcb2018/accepted-papers?authuser=0

Journal ref: ICML-IJCAI Workshop 2018

arXiv:1711.00727 [pdf, ps, other]

Performance Evaluation of Channel Decoding With Deep Neural Networks

Authors: Wei Lyu, Zhaoyang Zhang, Chunxu Jiao, Kangjian Qin, Huazi Zhang

Abstract: With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. Th… ▽ More With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. The performance of these deep neural networks are evaluated through extensive simulation. Numerical results show that RNN has the best decoding performance, yet at the price of the highest computational overhead. Moreover, we find there exists a saturation length for each type of neural network, which is caused by their restricted learning abilities. △ Less

Submitted 31 January, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: 6 pages, 11 figures, Latex; typos corrected; IEEE ICC 2018 to appear

arXiv:1509.04823 [pdf]

Optimizing coverage of 3D Wireless Multimedia Sensor Networks by means of deploying redundant sensors

Authors: Shasha Xu, Weijie Lyu, Huilin Li

Abstract: Coverage is one of the fundamental issues in wireless multimedia sensor networks (WMSNs). It reflects the ability of WMSNs to detect the fields. Motivated by the existing-enhancing algorithm of traditional 2D WMSNs, a new 3D WMSNs sensing model is established and a new coverage-enhancing algorithm based on this model is proposed. This algorithm defines the sensing model as trapezoidal pyramid, cal… ▽ More Coverage is one of the fundamental issues in wireless multimedia sensor networks (WMSNs). It reflects the ability of WMSNs to detect the fields. Motivated by the existing-enhancing algorithm of traditional 2D WMSNs, a new 3D WMSNs sensing model is established and a new coverage-enhancing algorithm based on this model is proposed. This algorithm defines the sensing model as trapezoidal pyramid, calculates the key parameters (tilt angle) then improves coverage ratio by optimizing it. However, there still exists redundant sensors in this optimized networks. Aiming at efficiently utilizing these redundant sensors and enhancing coverage ratio, the authors selects the redundant sensors by introducing the set cover model algorithm, further deploys them to the uncovered area following the greedy policy, so that the whole path coverage performance of WMSNs is enhanced. △ Less

Submitted 16 September, 2015; originally announced September 2015.

Showing 1–39 of 39 results for author: Lyu, W