subscribe to arXiv mailings

GPTQT: Quantize Large Language Models Twice to Push the Efficiency

Authors: Yipin Guo, Yilin Lang, Qinyuan Ren

Abstract: Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed by expressing the weight of LLM in 3bit/2bit. Practice has shown that minimizing the quantization error of weights is ineffective, leading to overfitting. There… ▽ More Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed by expressing the weight of LLM in 3bit/2bit. Practice has shown that minimizing the quantization error of weights is ineffective, leading to overfitting. Therefore, GPTQT employs a progressive two-step approach: initially quantizing weights using Linear quantization to a relatively high bit, followed by converting obtained int weight to lower bit binary coding. A re-explore strategy is proposed to optimize initial scaling factor. During inference, these steps are merged into pure binary coding, enabling efficient computation. Testing across various models and datasets confirms GPTQT's effectiveness. Compared to the strong 3-bit quantization baseline, GPTQT further reduces perplexity by 4.01 on opt-66B and increases speed by 1.24 times on opt-30b. The results on Llama2 show that GPTQT is currently the best binary coding quantization method for such kind of LLMs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by 11th IEEE International Conference on Cybernetics and Intelligent Systems

arXiv:2407.02881 [pdf, other]

ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation

Authors: Yipin Guo, Zihao Li, Yilin Lang, Qinyuan Ren

Abstract: Operators devoid of multiplication, such as Shift and Add, have gained prominence for their compatibility with hardware. However, neural networks (NNs) employing these operators typically exhibit lower accuracy compared to conventional NNs with identical structures. ShiftAddAug uses costly multiplication to augment efficient but less powerful multiplication-free operators, improving performance wi… ▽ More Operators devoid of multiplication, such as Shift and Add, have gained prominence for their compatibility with hardware. However, neural networks (NNs) employing these operators typically exhibit lower accuracy compared to conventional NNs with identical structures. ShiftAddAug uses costly multiplication to augment efficient but less powerful multiplication-free operators, improving performance without any inference overhead. It puts a ShiftAdd tiny NN into a large multiplicative model and encourages it to be trained as a sub-model to obtain additional supervision. In order to solve the weight discrepancy problem between hybrid operators, a new weight sharing method is proposed. Additionally, a novel two stage neural architecture search is used to obtain better augmentation effects for smaller but stronger multiplication-free tiny neural networks. The superiority of ShiftAddAug is validated through experiments in image classification and semantic segmentation, consistently delivering noteworthy enhancements. Remarkably, it secures up to a 4.95% increase in accuracy on the CIFAR100 compared to its directly trained counterparts, even surpassing the performance of multiplicative NNs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 CVPR Workshop : Efficient Deep Learning for Computer Vision

arXiv:2407.02878 [pdf, other]

Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving

Authors: Yipin Guo, Yilin Lang, Qinyuan Ren

Abstract: To address the challenges of sensor fusion and safety risk prediction, contemporary closed-loop autonomous driving neural networks leveraging imitation learning typically require a substantial volume of parameters and computational resources to run neural networks. Given the constrained computational capacities of onboard vehicular computers, we introduce a compact yet potent solution named Effici… ▽ More To address the challenges of sensor fusion and safety risk prediction, contemporary closed-loop autonomous driving neural networks leveraging imitation learning typically require a substantial volume of parameters and computational resources to run neural networks. Given the constrained computational capacities of onboard vehicular computers, we introduce a compact yet potent solution named EfficientFuser. This approach employs EfficientViT for visual information extraction and integrates feature maps via cross attention. Subsequently, it utilizes a decoder-only transformer for the amalgamation of multiple features. For prediction purposes, learnable vectors are embedded as tokens to probe the association between the task and sensor features through attention. Evaluated on the CARLA simulation platform, EfficientFuser demonstrates remarkable efficiency, utilizing merely 37.6% of the parameters and 8.7% of the computations compared to the state-of-the-art lightweight method with only 0.4% lower driving score, and the safety score neared that of the leading safety-enhanced method, showcasing its efficacy and potential for practical deployment in autonomous driving systems. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Best Paper Award of the IEEE 13th Data-Driven Control and Learning Systems Conference

arXiv:2406.19853 [pdf, other]

YuLan: An Open-source Large Language Model

Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.13583 [pdf, other]

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

Authors: Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

Abstract: The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus… ▽ More The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus on generating pseudo-labels for old datasets to force the model to memorize the learned features. However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks. To avoid this problem, we propose a network by introducing the data-specific Mixture of Experts (MoE) structure to handle the new tasks or categories, ensuring that the network parameters of previous tasks are unaffected or only minimally impacted. To further overcome the tremendous memory costs caused by introducing additional structures, we propose a Low-Rank strategy which significantly reduces memory cost. We validate our method on both class-level and task-level continual learning challenges. Extensive experiments on multiple datasets show our model outperforms all other methods. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11921 [pdf, other]

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

Authors: Jiaqi Lin, Qianqian Ren

Abstract: Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- an… ▽ More Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05914 [pdf, other]

Soundscape Captioning using Sound Affective Quality Network and Large Language Model

Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship between sounds and the emotions they evoke within a context. To fill this gap and to automate soundscape analysis, which traditionally relies on labour-intensive subjective ratings and surveys, we propose the soundscape captioning (SoundSCap) task. SoundSCap generates context-aware soundscape descriptions by capturing the acoustic scene, event information, and the corresponding human affective qualities. To this end, we propose an automatic soundscape captioner (SoundSCaper) composed of an acoustic model, SoundAQnet, and a general large language model (LLM). SoundAQnet simultaneously models multi-scale information about acoustic scenes, events, and perceived affective qualities, while LLM generates soundscape captions by parsing the information captured by SoundAQnet to a common language. The soundscape caption's quality is assessed by a jury of 16 audio/soundscape experts. The average score (out of 5) of SoundSCaper-generated captions is lower than the score of captions generated by two soundscape experts by 0.21 and 0.25, respectively, on the evaluation set and the model-unknown mixed external dataset with varying lengths and acoustic properties, but the differences are not statistically significant. Overall, SoundSCaper-generated captions show promising performance compared to captions annotated by soundscape experts. The models' code, LLM scripts, human assessment data and instructions, and expert evaluation statistics are all publicly available. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

arXiv:2405.13025 [pdf, other]

A survey on fairness of large language models in e-commerce: progress, application, and challenge

Authors: Qingyang Ren, Zilin Jiang, Jinghan Cao, Sijia Li, Chiqu Li, Yiyang Liu, Shuning Huo, Tiange He, Yuan Chen

Abstract: This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing… ▽ More This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments. △ Less

Submitted 21 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 21 pages, 9 figures

arXiv:2405.09708 [pdf, ps, other]

doi 10.1109/LRA.2024.3401117

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

Authors: Qiaoqiao Ren, Yuanbo Hou, Dick Botteldooren, Tony Belpaeme

Abstract: Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study… ▽ More Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: IEEE Robotics and Automation Letters (IEEE RAL)

arXiv:2404.17394 [pdf, other]

Child Speech Recognition in Human-Robot Interaction: Problem Solved?

Authors: Ruben Janssens, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, Tony Belpaeme

Abstract: Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child s… ▽ More Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Presented at 2024 International Symposium on Technological Advances in Human-Robot Interaction

arXiv:2404.00443 [pdf, ps, other]

UDE-based Dynamic Motion Force Control of Mobile Manipulators

Authors: Songqun Gao, Wendi Ding, Qinyuan Ren, Ben M. Chen

Abstract: Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor tra… ▽ More Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor transferability. To address this, this article presents a novel dynamic model of the manipulator on the mobile base that requires only the manipulator dynamics and the kinematic information of the mobile base. In addition, embedding the dynamic model, an uncertainty and disturbance estimator-based (UDE-based) dynamic motion/force control scheme is proposed for the mobile manipulator, which compensates for the dynamic coupling and other unmodeled uncertainties. Passivity and stability analyses justify the proposed control law. Simulation and experimental results on our mobile manipulator platform demonstrate the feasibility and effectiveness of our proposed methodology. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.07865 [pdf, other]

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Authors: Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

Abstract: The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces C… ▽ More The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs. △ Less

Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: ACL Findings 2024, Code is available at https://github.com/renqibing/CodeAttack

arXiv:2402.01163 [pdf, other]

Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning

Authors: Weiliang Chan, Qianqian Ren, Jinbao Li

Abstract: Urban region profiling is pivotal for smart cities, but mining fine-grained semantics from noisy and incomplete urban data remains challenging. In response, we propose a novel self-supervised graph collaborative filtering model for urban region embedding called EUPAS. Specifically, region heterogeneous graphs containing human mobility data, point of interests (POIs) information, and geographic nei… ▽ More Urban region profiling is pivotal for smart cities, but mining fine-grained semantics from noisy and incomplete urban data remains challenging. In response, we propose a novel self-supervised graph collaborative filtering model for urban region embedding called EUPAS. Specifically, region heterogeneous graphs containing human mobility data, point of interests (POIs) information, and geographic neighborhood details for each region are fed into the model, which generates region embeddings that preserve intra-region and inter-region dependencies through GCNs and multi-head attention. Meanwhile, we introduce spatial perturbation augmentation to generate positive samples that are semantically similar and spatially close to the anchor, preparing for subsequent contrastive learning. Furthermore, adversarial training is employed to construct an effective pretext task by generating strong positive pairs and mining hard negative pairs for the region embeddings. Finally, we jointly optimize supervised and self-supervised learning to encourage the model to capture the high-level semantics of region embeddings while ignoring the noisy and unimportant details. Extensive experiments on real-world datasets demonstrate the superiority of our model over state-of-the-art methods. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.18057 [pdf, other]

Rank Supervised Contrastive Learning for Time Series Classification

Authors: Qianying Ren, Dongsheng Luo, Dongjin Song

Abstract: Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimila… ▽ More Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimilar data points farther apart. Despite its efficacy, the fine-grained relative similarity (e.g., rank) information of positive samples is largely ignored, especially when labeled samples are limited. To this end, we present Rank Supervised Contrastive Learning (RankSCL) to perform time series classification. Different from conventional contrastive learning frameworks, RankSCL augments raw data in a targeted way in the embedding space and adopts certain filtering rules to select more informative positive and negative pairs of samples. Moreover, a novel rank loss is developed to assign different weights for different levels of positive samples, enable the encoder to extract the fine-grained information of the same class, and produce a clear boundary among different classes. Thoroughly empirical studies on 128 UCR datasets and 30 UEA datasets demonstrate that the proposed RankSCL can achieve state-of-the-art performance compared to existing baseline methods. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.17802 [pdf, other]

doi 10.1016/j.ins.2024.120712

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Authors: Haozhi Gao, Qianqian Ren, Jinbao Li

Abstract: Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative dis… ▽ More Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%. △ Less

Submitted 25 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15071 [pdf, other]

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications. △ Less

Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.09067 [pdf, other]

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Authors: Depeng Li, Tianqi Wang, Junwei Chen, Qining Ren, Kenji Kawaguchi, Zhigang Zeng

Abstract: Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the tr… ▽ More Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024

arXiv:2312.09952 [pdf, other]

Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

Authors: Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-relat… ▽ More WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR). Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP). Experiments show that: 1) MLGL with 4.1 M parameters improves AEC and ARP results by using semantic node information in local and global context aware graphs; 2) MLGL captures relations between coarse and fine-grained AEs and AR well; 3) Statistical analysis of MLGL results shows that some AEs from different sources significantly correlate with AR, which is consistent with previous research on human perception of these sound sources. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2311.09030 [pdf]

doi 10.1121/10.0022408

AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

Authors: Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang, Andrew Mitchell, Francesco Aletta, Jian Kang, Dick Botteldooren

Abstract: Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex… ▽ More Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: The Journal of the Acoustical Society of America, 154 (5), 3145

Journal ref: The Journal of the Acoustical Society of America, 154, 3145 (2023)

arXiv:2310.13347 [pdf, other]

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

Authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge

Abstract: The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hinder… ▽ More The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hindered by the scarcity of appropriately labeled datasets. The existing video datasets pose several limitations: 1) these datasets are small-scale in size to support comprehensive investigations of nursing activity; 2) they primarily focus on single procedures, lacking expert-level annotations for various nursing procedures and action steps; and 3) they lack temporally localized annotations, which prevents the effective localization of targeted actions within longer video sequences. To mitigate these limitations, we propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding. NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets. Notably, it encompasses 51 distinct nursing procedures and 177 action steps, providing a much more comprehensive coverage compared to existing datasets that primarily focus on limited procedures. To evaluate the efficacy of current deep learning methods on nursing activity understanding, we establish three benchmarks on NurViD: procedure recognition on untrimmed videos, procedure and action recognition on trimmed videos, and action detection. Our benchmark and code will be available at \url{https://github.com/minghu0830/NurViD-benchmark}. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

arXiv:2309.11907 [pdf, other]

Learning to Recover for Safe Reinforcement Learning

Authors: Haoyu Wang, Xin Yuan, Qinqing Ren

Abstract: Safety controllers is widely used to achieve safe reinforcement learning. Most methods that apply a safety controller are using handcrafted safety constraints to construct the safety controller. However, when the environment dynamics are sophisticated, handcrafted safety constraints become unavailable. Therefore, it worth to research on constructing safety controllers by learning algorithms. We pr… ▽ More Safety controllers is widely used to achieve safe reinforcement learning. Most methods that apply a safety controller are using handcrafted safety constraints to construct the safety controller. However, when the environment dynamics are sophisticated, handcrafted safety constraints become unavailable. Therefore, it worth to research on constructing safety controllers by learning algorithms. We propose a three-stage architecture for safe reinforcement learning, namely TU-Recovery Architecture. A safety critic and a recovery policy is learned before task training. They form a safety controller to ensure safety in task training. Then a phenomenon induced by disagreement between task policy and recovery policy, called adversarial phenomenon, which reduces learning efficiency and model performance, is described. Auxiliary reward is proposed to mitigate adversarial phenomenon, while help the task policy to learn to recover from high-risk states. A series of experiments are conducted in a robot navigation environment. Experiments demonstrate that TU-Recovery outperforms unconstrained counterpart in both reward gaining and constraint violations during task training, and auxiliary reward further improve TU-Recovery in reward-to-cost ratio by significantly reduce constraint violations. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11876 [pdf, other]

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

Authors: Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu

Abstract: Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representa… ▽ More Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. And they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 12 volumetric medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. {\itshape i.e.} Our MACL achieves a superior performance with more precise predictions from visualization figures and 2.28\%, 1.32\%, 1.62\% and 1.60\% Average Dice higher than previous best results on CHD, MMWHS, CHAOS and AMOS, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be available at https://github.com/stevezs315/MACL. △ Less

Submitted 13 May, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.08854 [pdf, other]

Intention-Aware Planner for Robust and Safe Aerial Tracking

Authors: Qiuyu Ren, Huan Yu, Jiajun Dai, Zhi Zheng, Jun Meng, Li Xu, Chao Xu, Fei Gao, Yanjun Cao

Abstract: Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move… ▽ More Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move aggressively, such as sudden turn or deceleration. In this paper, we propose an intention-aware planner by additionally considering the intention of the target to enhance safety and robustness in aerial tracking applications. Firstly, a designated intention prediction method is proposed, which combines a user-defined potential assessment function and a state observation function. A reachable region is generated to specifically evaluate the turning intentions. Then we design an intention-driven hybrid A* method to predict the future possible positions for the target. Finally, an intention-aware optimization approach is designed to generate a spatial-temporal optimal trajectory, allowing the tracker to perceive unexpected situations from the target. Benchmark comparisons and real-world experiments are conducted to validate the performance of our method. △ Less

Submitted 20 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 8 pages, 10 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2309.06912 [pdf, other]

Multi-behavior Recommendation with SVD Graph Neural Networks

Authors: Shengxi Fu, Qianqian Ren, Xingfeng Lv, Jinbao Li

Abstract: Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations… ▽ More Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations in resisting noise interference, especially for multi-behavior recommendation. To mitigate the aforementioned issues, this paper proposes a GNN-based multi-behavior recommendation model called MB-SVD that utilizes Singular Value Decomposition (SVD) graphs to enhance model performance. In particular, MB-SVD considers user preferences across different behaviors, improving recommendation effectiveness. First, MB-SVD integrates the representation of users and items under different behaviors with learnable weight scores, which efficiently considers the influence of different behaviors. Then, MB-SVD generates augmented graph representation with global collaborative relations. Next, we simplify the contrastive learning framework by directly contrasting original representation with the enhanced representation using the InfoNCE loss. Through extensive experimentation, the remarkable performance of our proposed MB-SVD approach in multi-behavior recommendation endeavors across diverse real-world datasets is exhibited. △ Less

Submitted 9 May, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.15150 [pdf, other]

Unleashing the Potential of Spiking Neural Networks for Sequential Modeling with Contextual Embedding

Authors: Xinyi Chen, Jibin Wu, Huajin Tang, Qinyuan Ren, Kay Chen Tan

Abstract: The human brain exhibits remarkable abilities in integrating temporally distant sensory inputs for decision-making. However, existing brain-inspired spiking neural networks (SNNs) have struggled to match their biological counterpart in modeling long-term temporal relationships. To address this problem, this paper presents a novel Contextual Embedding Leaky Integrate-and-Fire (CE-LIF) spiking neuro… ▽ More The human brain exhibits remarkable abilities in integrating temporally distant sensory inputs for decision-making. However, existing brain-inspired spiking neural networks (SNNs) have struggled to match their biological counterpart in modeling long-term temporal relationships. To address this problem, this paper presents a novel Contextual Embedding Leaky Integrate-and-Fire (CE-LIF) spiking neuron model. Specifically, the CE-LIF model incorporates a meticulously designed contextual embedding component into the adaptive neuronal firing threshold, thereby enhancing the memory storage of spiking neurons and facilitating effective sequential modeling. Additionally, theoretical analysis is provided to elucidate how the CE-LIF model enables long-term temporal credit assignment. Remarkably, when compared to state-of-the-art recurrent SNNs, feedforward SNNs comprising the proposed CE-LIF neurons demonstrate superior performance across extensive sequential modeling tasks in terms of classification accuracy, network convergence speed, and memory capacity. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.11980 [pdf, other]

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

arXiv:2308.04949 [pdf, other]

Branches Mutual Promotion for End-to-End Weakly Supervised Semantic Segmentation

Authors: Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, Shuang Zeng, Qiushi Ren, Yanye Lu

Abstract: End-to-end weakly supervised semantic segmentation aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hind… ▽ More End-to-end weakly supervised semantic segmentation aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hindering these two branches from assisting each other. In our work, we treat these two branches equally by viewing them as diverse ways to generate the segmentation map, and add interactions on both their supervision and operation to achieve mutual promotion. For this purpose, a bidirectional supervision mechanism is elaborated to force the consistency between the outputs of these two branches. Thus, the segmentation branch can also give feedback to the classification branch to enhance the quality of localization seeds. Moreover, our method also designs interaction operations between these two branches to exchange their knowledge to assist each other. Experiments indicate our work outperforms existing end-to-end weakly supervised segmentation methods. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.03212 [pdf, other]

Attentive Graph Enhanced Region Representation Learning

Authors: Weiliang Chen, Qianqian Ren, Jinbao Li

Abstract: Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks. Recently, with the expansion of the city, modeling long-range spatial dependencies with multiple data sources plays an important role in urban region representation. In this paper, we propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to c… ▽ More Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks. Recently, with the expansion of the city, modeling long-range spatial dependencies with multiple data sources plays an important role in urban region representation. In this paper, we propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to capture comprehensive dependencies from multiple graphs and learn rich semantic representations of urban regions. Specifically, we propose a graph-enhanced learning module to construct regional graphs by incorporating mobility flow patterns, point of interests (POIs) functions, and check-in semantics with noise filtering. Then, we present a multi-graph aggregation module to capture both local and global spatial dependencies between regions by integrating information from multiple graphs. In addition, we design a dual-stage fusion module to facilitate information sharing between different views and efficiently fuse multi-view representations for urban region embedding using an improved linear attention mechanism. Finally, extensive experiments on real-world datasets for three downstream tasks demonstrate the superior performance of our model compared to state-of-the-art methods. △ Less

Submitted 31 May, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.09718 [pdf, ps, other]

Label-noise-tolerant medical image classification via self-attention and self-supervised learning

Authors: Hongyang Jiang, Mengdi Gao, Yan Hu, Qiushi Ren, Zhaoheng Xie, Jiang Liu

Abstract: Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNN… ▽ More Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNNs suffer from overfitting noisy labels, degrading the performance of models. Therefore, in this work, we innovatively devise noise-robust training approach to mitigate the adverse effects of noisy labels in medical image classification. Specifically, we incorporate contrastive learning and intra-group attention mixup strategies into the vanilla supervised learning. The contrastive learning for feature extractor helps to enhance visual representation of DNNs. The intra-group attention mixup module constructs groups and assigns self-attention weights for group-wise samples, and subsequently interpolates massive noisy-suppressed samples through weighted mixup operation. We conduct comparative experiments on both synthetic and real-world noisy medical datasets under various noise levels. Rigorous experiments validate that our noise-robust method with contrastive learning and attention mixup can effectively handle with label noise, and is superior to state-of-the-art methods. An ablation study also shows that both components contribute to boost model performance. The proposed method demonstrates its capability of curb label noise and has certain potential toward real-world clinic applications. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 11pages, 8 figures

arXiv:2305.08062 [pdf, other]

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Authors: Yuta Saito, Qingyang Ren, Thorsten Joachims

Abstract: We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. O… ▽ More We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. OffCEM applies importance weighting only to action clusters and addresses the residual causal effect through model-based reward estimation. We show that the proposed estimator is unbiased under a new condition, called local correctness, which only requires that the residual-effect model preserves the relative expected reward differences of the actions within each cluster. To best leverage the CEM and local correctness, we also propose a new two-step procedure for performing model-based estimation that minimizes bias in the first step and variance in the second step. We find that the resulting OffCEM estimator substantially improves bias and variance compared to a range of conventional estimators. Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions. △ Less

Submitted 2 June, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

Comments: accepted at ICML2023. arXiv admin note: text overlap with arXiv:2202.06317

arXiv:2305.04196 [pdf, ps, other]

doi 10.1109/TWC.2023.3266344

Handoff-Aware Distributed Computing in High Altitude Platform Station (HAPS)-Assisted Vehicular Networks

Authors: Qiqi Ren, Omid Abbasi, Gunes Karabulut Kurt, Halim Yanikomeroglu, Jian Chen

Abstract: Distributed computing enables Internet of vehicle (IoV) services by collaboratively utilizing the computing resources from the network edge and the vehicles. However, the computing interruption issue caused by frequent edge network handoffs, and a severe shortage of computing resources are two problems in providing IoV services. High altitude platform station (HAPS) computing can be a promising ad… ▽ More Distributed computing enables Internet of vehicle (IoV) services by collaboratively utilizing the computing resources from the network edge and the vehicles. However, the computing interruption issue caused by frequent edge network handoffs, and a severe shortage of computing resources are two problems in providing IoV services. High altitude platform station (HAPS) computing can be a promising addition to existing distributed computing frameworks because of its wide coverage and strong computational capabilities. In this regard, this paper proposes an adaptive scheme in a new distributed computing framework that involves HAPS computing to deal with the two problems of the IoV. Based on the diverse demands of vehicles, network dynamics, and the time-sensitivity of handoffs, the proposed scheme flexibly divides each task into three parts and assigns them to the vehicle, roadside units (RSUs), and a HAPS to perform synchronous computing. The scheme also constrains the computing of tasks at RSUs such that they are completed before handoffs to avoid the risk of computing interruptions. On this basis, we formulate a delay minimization problem that considers task-splitting ratio, transmit power, bandwidth allocation, and computing resource allocation. To solve the problem, variable replacement and successive convex approximation-based method are proposed. The simulation results show that this scheme not only avoids the negative effects caused by handoffs in a flexible manner, it also takes delay performance into account and maintains the delay stability. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2305.01939 [pdf, other]

Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models

Authors: Qihan Ren, Jiayang Gao, Wen Shen, Quanshi Zhang

Abstract: This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded… ▽ More This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded samples, then the AI model will encode sparse interactive concepts. Each interactive concept represents an interaction between a specific set of input variables, and has a certain numerical effect on the inference score of the model. Specifically, it is proved that the inference score of the model can always be represented as the sum of the interaction effects of all interactive concepts. In fact, we hope to prove that conditions for the emergence of symbolic concepts are quite common. It means that for most AI models, we can usually use a small number of interactive concepts to mimic the model outputs on any arbitrarily masked samples. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2303.15182 [pdf, other]

Hybrid Augmented Automated Graph Contrastive Learning

Authors: Yifu Chen, Qianqian Ren, Liu Yong

Abstract: Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a… ▽ More Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a feature-level learnable view generator and an edge-level learnable view generator. The view generators are end-to-end differentiable to learn the probability distribution of views conditioned on the input graph. It insures to learn the most semantically meaningful structure in terms of features and topology, respectively. Furthermore, we propose an improved joint training strategy, which can achieve better results than previous works without resorting to any weak label information in the downstream tasks and extensive evaluation of additional work. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.13095 [pdf, other]

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

Authors: Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, Quanshi Zhang

Abstract: In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such… ▽ More In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability. The code is available at https://github.com/sjtu-xai-lab/BNN-concepts. △ Less

Submitted 1 December, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2301.12344 [pdf, other]

TJ-FlyingFish: Design and Implementation of an Aerial-Aquatic Quadrotor with Tiltable Propulsion Units

Authors: Xuchen Liu, Minghao Dou, Dongyue Huang, Biao Wang, Jinqiang Cui, Qinyuan Ren, Lihua Dou, Zhi Gao, Jie Chen, Ben M. Chen

Abstract: Aerial-aquatic vehicles are capable to move in the two most dominant fluids, making them more promising for a wide range of applications. We propose a prototype with special designs for propulsion and thruster configuration to cope with the vast differences in the fluid properties of water and air. For propulsion, the operating range is switched for the different mediums by the dual-speed propulsi… ▽ More Aerial-aquatic vehicles are capable to move in the two most dominant fluids, making them more promising for a wide range of applications. We propose a prototype with special designs for propulsion and thruster configuration to cope with the vast differences in the fluid properties of water and air. For propulsion, the operating range is switched for the different mediums by the dual-speed propulsion unit, providing sufficient thrust and also ensuring output efficiency. For thruster configuration, thrust vectoring is realized by the rotation of the propulsion unit around the mount arm, thus enhancing the underwater maneuverability. This paper presents a quadrotor prototype of this concept and the design details and realization in practice. △ Less

Submitted 6 February, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: 6 pages, 9 figures, accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2211.00730 [pdf, other]

Tactile interaction with a robot leads to increased risk-taking

Authors: Qiaoqiao Ren, Tony Belpaeme

Abstract: Tactile interaction plays a crucial role in interactions between people. Touch can, for example, help people calm down and lower physiological stress responses. Consequently, it is believed that tactile and haptic interaction matter also in human-robot interaction. We study if the intensity of the tactile interaction has an impact on people, and do so by studying whether different intensities of t… ▽ More Tactile interaction plays a crucial role in interactions between people. Touch can, for example, help people calm down and lower physiological stress responses. Consequently, it is believed that tactile and haptic interaction matter also in human-robot interaction. We study if the intensity of the tactile interaction has an impact on people, and do so by studying whether different intensities of tactile interaction modulate physiological measures and task performance. We use a paradigm in which a small humanoid robot is used to encourage risk-taking behaviour, relying on peer encouragement to take more risks which might lead to a higher pay-off, but potentially also to higher losses. For this, the Balloon Analogue Risk Task (BART) is used as a proxy for the propensity to take risks. We study four conditions, one control condition in which the task is completed without a robot, and three experimental conditions in which a robot is present that encourages risk-taking behaviour with different degrees of tactile interaction. The results show that both low-intensity and high-intensity tactile interaction increase people's risk-taking behaviour. However, low-intensity tactile interaction increases comfort and lowers stress, whereas high-intensity touch does not. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 10 pages, 5 figures, conference

MSC Class: International conference of social robotics

arXiv:2206.02424 [pdf]

doi 10.1007/s11554-024-01436-6

Slim-neck by GSConv: A lightweight-design for real-time detector architectures

Authors: Hulin Li, Jun Li, Hanbing Wei, Zheng Liu, Zhenfei Zhan, Qiliang Ren

Abstract: Real-time object detection is significant for industrial and research fields. On edge devices, a giant model is difficult to achieve the real-time detecting requirement and a lightweight model built from a large number of the depth-wise separable convolutional could not achieve the sufficient accuracy. We introduce a new lightweight convolutional technique, GSConv, to lighten the model but maintai… ▽ More Real-time object detection is significant for industrial and research fields. On edge devices, a giant model is difficult to achieve the real-time detecting requirement and a lightweight model built from a large number of the depth-wise separable convolutional could not achieve the sufficient accuracy. We introduce a new lightweight convolutional technique, GSConv, to lighten the model but maintain the accuracy. The GSConv accomplishes an excellent trade-off between the accuracy and speed. Furthermore, we provide a design suggestion based on the GSConv, Slim-Neck (SNs), to achieve a higher computational cost-effectiveness of the real-time detectors. The effectiveness of the SNs was robustly demonstrated in over twenty sets comparative experiments. In particular, the real-time detectors of ameliorated by the SNs obtain the state-of-the-art (70.9% AP50 for the SODA10M at a speed of ~ 100FPS on a Tesla T4) compared with the baselines. Code is available at https://github.com/alanli1997/slim-neck-by-gsconv △ Less

Submitted 1 July, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

arXiv:2206.01044 [pdf, other]

doi 10.1007/978-3-031-19907-3_43

Artificial Open World for Evaluating AGI: a Conceptual Design

Authors: Bowen Xu, Quansheng Ren

Abstract: How to evaluate Artificial General Intelligence (AGI) is a critical problem that is discussed and unsolved for a long period. In the research of narrow AI, this seems not a severe problem, since researchers in that field focus on some specific problems as well as one or some aspects of cognition, and the criteria for evaluation are explicitly defined. By contrast, an AGI agent should solve problem… ▽ More How to evaluate Artificial General Intelligence (AGI) is a critical problem that is discussed and unsolved for a long period. In the research of narrow AI, this seems not a severe problem, since researchers in that field focus on some specific problems as well as one or some aspects of cognition, and the criteria for evaluation are explicitly defined. By contrast, an AGI agent should solve problems that are never-encountered by both agents and developers. However, once a developer tests and debugs the agent with a problem, the never-encountered problem becomes the encountered problem, as a result, the problem is solved by the developers to some extent, exploiting their experience, rather than the agents. This conflict, as we call the trap of developers' experience, leads to that this kind of problems is probably hard to become an acknowledged criterion. In this paper, we propose an evaluation method named Artificial Open World, aiming to jump out of the trap. The intuition is that most of the experience in the actual world should not be necessary to be applied to the artificial world, and the world should be open in some sense, such that developers are unable to perceive the world and solve problems by themselves before testing, though after that they are allowed to check all the data. The world is generated in a similar way as the actual world, and a general form of problems is proposed. A metric is proposed aiming to quantify the progress of research. This paper describes the conceptual design of the Artificial Open World, though the formalization and the implementation are left to the future. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2202.10206 [pdf, other]

DECLOAK: Enable Secure and Cheap Multi-Party Transactions on Legacy Blockchains by a Minimally Trusted TEE Network

Authors: Qian Ren, Yue Li, Yingjun Wu, Yuchen Wu, Hong Lei, Lei Wang, Bangdao Chen

Abstract: As the confidentiality and scalability of smart contracts have become a crucial demand of blockchains, off-chain contract execution frameworks have been promising. Some have recently expanded off-chain contracts to Multi-Party Computation (MPC), which seek to transition the on-chain states by off-chain MPC. The most general problem among these solutions is MPT, since its off-chain MPC takes on- an… ▽ More As the confidentiality and scalability of smart contracts have become a crucial demand of blockchains, off-chain contract execution frameworks have been promising. Some have recently expanded off-chain contracts to Multi-Party Computation (MPC), which seek to transition the on-chain states by off-chain MPC. The most general problem among these solutions is MPT, since its off-chain MPC takes on- and off-chain inputs, delivers on- and off-chain outputs, and can be publicly verified by the blockchain, thus capable of covering more scenarios. However, existing Multi-Party Transaction (MPT) solutions lack at least one of data availability, financial fairness, delivery fairness, and delivery atomicity. These properties are crucially valued by communities, e.g., the Ethereum community, or users. Even worse, these solutions require high-cost interactions between the blockchain and off-chain systems. This paper proposes a novel MPT-enabled off-chain contract execution framework, DECLOAK. DECLOAK is the first to achieve data availability of MPT, and our method can apply to other fields that seek to persist user data on-chain. Moreover, DECLOAK solves all mentioned shortcomings with even lower gas costs and weaker assumptions. Specifically, DECLOAK tolerates all but one Byzantine party and TEE executors. Evaluating on 10 MPTs, DECLOAK reduces the gas cost of the SOTA, Cloak, by 65.6%. Consequently, we are the first to not only achieve such level secure MPT in practical assumption, but also demonstrate that evaluating MPT in the comparable gas cost to normal Ethereum transaction is possible. And the cost superiority of DECLOAK increases as the number of MPT parties grows. △ Less

Submitted 22 May, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2106.13926

arXiv:2201.10797 [pdf, other]

An Automated Question-Answering Framework Based on Evolution Algorithm

Authors: Sinan Tan, Hui Xue, Qiyu Ren, Huaping Liu, Jing Bai

Abstract: Building a deep learning model for a Question-Answering (QA) task requires a lot of human effort, it may need several months to carefully tune various model architectures and find a best one. It's even harder to find different excellent models for multiple datasets. Recent works show that the best model structure is related to the dataset used, and one single model cannot adapt to all tasks. In th… ▽ More Building a deep learning model for a Question-Answering (QA) task requires a lot of human effort, it may need several months to carefully tune various model architectures and find a best one. It's even harder to find different excellent models for multiple datasets. Recent works show that the best model structure is related to the dataset used, and one single model cannot adapt to all tasks. In this paper, we propose an automated Question-Answering framework, which could automatically adjust network architecture for multiple datasets. Our framework is based on an innovative evolution algorithm, which is stable and suitable for multiple dataset scenario. The evolution algorithm for search combine prior knowledge into initial population and use a performance estimator to avoid inefficient mutation by predicting the performance of candidate model architecture. The prior knowledge used in initial population could improve the final result of the evolution algorithm. The performance estimator could quickly filter out models with bad performance in population as the number of trials increases, to speed up the convergence. Our framework achieves 78.9 EM and 86.1 F1 on SQuAD 1.1, 69.9 EM and 72.5 F1 on SQuAD 2.0. On NewsQA dataset, the found model achieves 47.0 EM and 62.9 F1. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: In Proceedings of the AAAI 2019 Workshop (WS13) on Reasoning and Complex Question-Answering (RCQA-19) https://researcher.watson.ibm.com/researcher/view_group.php?id=9632

arXiv:2201.09907 [pdf, other]

Ordinal-Quadruplet: Retrieval of Missing Classes in Ordinal Time Series

Authors: Jurijs Nazarovs, Cristian Lumezanu, Qianying Ren, Yuncong Chen, Takehiko Mizoguchi, Dongjin Song, Haifeng Chen

Abstract: In this paper, we propose an ordered time series classification framework that is robust against missing classes in the training data, i.e., during testing we can prescribe classes that are missing during training. This framework relies on two main components: (1) our newly proposed ordinal-quadruplet loss, which forces the model to learn latent representation while preserving the ordinal relation… ▽ More In this paper, we propose an ordered time series classification framework that is robust against missing classes in the training data, i.e., during testing we can prescribe classes that are missing during training. This framework relies on two main components: (1) our newly proposed ordinal-quadruplet loss, which forces the model to learn latent representation while preserving the ordinal relation among labels, (2) testing procedure, which utilizes the property of latent representation (order preservation). We conduct experiments based on real world multivariate time series data and show the significant improvement in the prediction of missing labels even with 40% of the classes are missing from training. Compared with the well-known triplet loss optimization augmented with interpolation for missing information, in some cases, we nearly double the accuracy. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2201.00402 [pdf, other]

A General Framework for Evaluating Robustness of Combinatorial Optimization Solvers on Graphs

Authors: Han Lu, Zenan Li, Runzhong Wang, Qibing Ren, Junchi Yan, Xiaokang Yang

Abstract: Solving combinatorial optimization (CO) on graphs is among the fundamental tasks for upper-stream applications in data mining, machine learning and operations research. Despite the inherent NP-hard challenge for CO, heuristics, branch-and-bound, learning-based solvers are developed to tackle CO problems as accurately as possible given limited time budgets. However, a practical metric for the sensi… ▽ More Solving combinatorial optimization (CO) on graphs is among the fundamental tasks for upper-stream applications in data mining, machine learning and operations research. Despite the inherent NP-hard challenge for CO, heuristics, branch-and-bound, learning-based solvers are developed to tackle CO problems as accurately as possible given limited time budgets. However, a practical metric for the sensitivity of CO solvers remains largely unexplored. Existing theoretical metrics require the optimal solution which is infeasible, and the gradient-based adversarial attack metric from deep learning is not compatible with non-learning solvers that are usually non-differentiable. In this paper, we develop the first practically feasible robustness metric for general combinatorial optimization solvers. We develop a no worse optimal cost guarantee thus do not require optimal solutions, and we tackle the non-differentiable challenge by resorting to black-box adversarial attack methods. Extensive experiments are conducted on 14 unique combinations of solvers and CO problems, and we demonstrate that the performance of state-of-the-art solvers like Gurobi can degenerate by over 20% under the given time limit bound on the hard instances discovered by our robustness metric, raising concerns about the robustness of combinatorial optimization solvers. △ Less

Submitted 4 June, 2022; v1 submitted 28 December, 2021; originally announced January 2022.

arXiv:2112.14379 [pdf, other]

Background-aware Classification Activation Map for Weakly Supervised Object Localization

Authors: Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia Jin, Zhe Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu

Abstract: Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of backgro… ▽ More Weakly supervised object localization (WSOL) relaxes the requirement of dense annotations for object localization by using image-level classification masks to supervise its learning process. However, current WSOL methods suffer from excessive activation of background locations and need post-processing to obtain the localization mask. This paper attributes these issues to the unawareness of background cues, and propose the background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background with only image-level labels. In our B-CAM, two image-level features, aggregated by pixel-level features of potential background and object locations, are used to purify the object feature from the object-related background and to represent the feature of the pure-background sample, respectively. Then based on these two features, both the object classifier and the background classifier are learned to determine the binary object localization mask. Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss, which not only improves the objects localization but also suppresses the background activation. Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets. △ Less

Submitted 28 December, 2021; originally announced December 2021.

arXiv:2111.06236 [pdf, other]

Discovering and Explaining the Representation Bottleneck of DNNs

Authors: Huiqi Deng, Qihan Ren, Hao Zhang, Quanshi Zhang

Abstract: This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interacti… ▽ More This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interactions and too complex interactions, but usually fails to learn interactions of intermediate complexity. Such a phenomenon is widely shared by different DNNs for different tasks. This phenomenon indicates a cognition gap between DNNs and human beings, and we call it a representation bottleneck. We theoretically prove the underlying reason for the representation bottleneck. Furthermore, we propose a loss to encourage/penalize the learning of interactions of specific complexities, and analyze the representation capacities of interactions of different complexities. △ Less

Submitted 7 November, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

arXiv:2111.03549 [pdf, other]

Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Authors: Wen Shen, Qihan Ren, Dongrui Liu, Quanshi Zhang

Abstract: In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the r… ▽ More In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training. △ Less

Submitted 5 November, 2021; originally announced November 2021.

arXiv:2109.10750 [pdf, other]

Control of Pneumatic Artificial Muscles with SNN-based Cerebellar-like Model

Authors: Hongbo Zhang, Yunshuang Li, Yipin Guo, Xinyi Chen, Qinyuan Ren

Abstract: Soft robotics technologies have gained growing interest in recent years, which allows various applications from manufacturing to human-robot interaction. Pneumatic artificial muscle (PAM), a typical soft actuator, has been widely applied to soft robots. The compliance and resilience of soft actuators allow soft robots to behave compliant when interacting with unstructured environments, while the u… ▽ More Soft robotics technologies have gained growing interest in recent years, which allows various applications from manufacturing to human-robot interaction. Pneumatic artificial muscle (PAM), a typical soft actuator, has been widely applied to soft robots. The compliance and resilience of soft actuators allow soft robots to behave compliant when interacting with unstructured environments, while the utilization of soft actuators also introduces nonlinearity and uncertainty. Inspired by Cerebellum's vital functions in control of human's physical movement, a neural network model of Cerebellum based on spiking neuron networks (SNNs) is designed. This model is used as a feed-forward controller in controlling a 1-DOF robot arm driven by PAMs. The simulation results show that this Cerebellar-based system achieves good performance and increases the system's response. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2109.02396 [pdf, other]

Byzantine-Robust Federated Learning via Credibility Assessment on Non-IID Data

Authors: Kun Zhai, Qiang Ren, Junli Wang, Chungang Yan

Abstract: Federated learning is a novel framework that enables resource-constrained edge devices to jointly learn a model, which solves the problem of data protection and data islands. However, standard federated learning is vulnerable to Byzantine attacks, which will cause the global model to be manipulated by the attacker or fail to converge. On non-iid data, the current methods are not effective in defen… ▽ More Federated learning is a novel framework that enables resource-constrained edge devices to jointly learn a model, which solves the problem of data protection and data islands. However, standard federated learning is vulnerable to Byzantine attacks, which will cause the global model to be manipulated by the attacker or fail to converge. On non-iid data, the current methods are not effective in defensing against Byzantine attacks. In this paper, we propose a Byzantine-robust framework for federated learning via credibility assessment on non-iid data (BRCA). Credibility assessment is designed to detect Byzantine attacks by combing adaptive anomaly detection model and data verification. Specially, an adaptive mechanism is incorporated into the anomaly detection model for the training and prediction of the model. Simultaneously, a unified update algorithm is given to guarantee that the global model has a consistent direction. On non-iid data, our experiments demonstrate that the BRCA is more robust to Byzantine attacks compared with conventional methods △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2107.02763 [pdf, other]

Predicting Surface Heat Flux on Complex Systems via Conv-LSTM

Authors: Yinpeng Wang, Nianru Wang, Qiang Ren

Abstract: Existing algorithms with iterations as the principle for 3D inverse heat conduction problems (IHCPs) are usually time-consuming. With the recent advancements in deep learning techniques, it is possible to apply the neural network to compute IHCPs. In this paper, a new framework based on Convolutional-LSTM is introduced to predict the transient heat flux via measured temperature. The inverse heat c… ▽ More Existing algorithms with iterations as the principle for 3D inverse heat conduction problems (IHCPs) are usually time-consuming. With the recent advancements in deep learning techniques, it is possible to apply the neural network to compute IHCPs. In this paper, a new framework based on Convolutional-LSTM is introduced to predict the transient heat flux via measured temperature. The inverse heat conduction models concerned in this work have 3D complex structures with non-linear boundary conditions and thermophysical parameters. In order to reach high precision, a forward solver based on the finite element method is utilized to generate sufficient data for training. The fully trained framework can provide accurate predictions efficiently once the measured temperature and models are acquired. It is believed that the proposed framework offers a new pattern for real-time heat flux inversion. △ Less

Submitted 29 June, 2021; originally announced July 2021.

Comments: 11 pages, 9 figures

arXiv:2106.14928 [pdf, ps, other]

Caching and Computation Offloading in High Altitude Platform Station (HAPS) Assisted Intelligent Transportation Systems

Authors: Qiqi Ren, Omid Abbasi, Gunes Karabulut Kurt, Halim Yanikomeroglu, Jian Chen

Abstract: Edge intelligence, a new paradigm to accelerate artificial intelligence (AI) applications by leveraging computing resources on the network edge, can be used to improve intelligent transportation systems (ITS). However, due to physical limitations and energy-supply constraints, the computing powers of edge equipment are usually limited. High altitude platform station (HAPS) computing can be conside… ▽ More Edge intelligence, a new paradigm to accelerate artificial intelligence (AI) applications by leveraging computing resources on the network edge, can be used to improve intelligent transportation systems (ITS). However, due to physical limitations and energy-supply constraints, the computing powers of edge equipment are usually limited. High altitude platform station (HAPS) computing can be considered as a promising extension of edge computing. HAPS is deployed in the stratosphere to provide wide coverage and strong computational capabilities. It is suitable to coordinate terrestrial resources and store the fundamental data associated with ITS-based applications. In this work, three computing layers,i.e., vehicles, terrestrial network edges, and HAPS, are integrated to build a computation framework for ITS, where the HAPS data library stores the fundamental data needed for the applications. In addition, the caching technique is introduced for network edges to store some of the fundamental data from the HAPS so that large propagation delays can be reduced. We aim to minimize the delay of the system by optimizing computation offloading and caching decisions as well as bandwidth and computing resource allocations. The simulation results highlight the benefits of HAPS computing for mitigating delays and the significance of caching at network edges. △ Less

Submitted 13 January, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:2106.13926 [pdf, other]

Cloak: Transitioning States on Legacy Blockchains Using Secure and Publicly Verifiable Off-Chain Multi-Party Computation

Authors: Qian Ren, Yingjun Wu, Han Liu, Yue Li, Anne Victor, Hong Lei, Lei Wang, Bangdao Chen

Abstract: In recent years, the confidentiality of smart contracts has become a fundamental requirement for practical applications. While many efforts have been made to develop architectural capabilities for enforcing confidential smart contracts, a few works arise to extend confidential smart contracts to Multi-Party Computation (MPC), i.e., multiple parties jointly evaluate a transaction off-chain and comm… ▽ More In recent years, the confidentiality of smart contracts has become a fundamental requirement for practical applications. While many efforts have been made to develop architectural capabilities for enforcing confidential smart contracts, a few works arise to extend confidential smart contracts to Multi-Party Computation (MPC), i.e., multiple parties jointly evaluate a transaction off-chain and commit the outputs on-chain without revealing their secret inputs/outputs to each other. However, existing solutions lack public verifiability and require O(n) transactions to enable negotiation or resist adversaries, thus suffering from inefficiency and compromised security. In this paper, we propose Cloak, a framework for enabling Multi-Party Transaction (MPT) on existing blockchains. An MPT refers to transitioning blockchain states by an publicly verifiable off-chain MPC. We identify and handle the challenges of securing MPT by harmonizing TEE and blockchain. Consequently, Cloak secures the off-chain nondeterministic negotiation process (a party joins an MPT without knowing identities or the total number of parties until the MPT proposal settles), achieves public verifiability (the public can validate that the MPT correctly handles the secret inputs/outputs from multiple parties and reads/writes states on-chain), and resists Byzantine adversaries. According to our proof, Cloak achieves better security with only 2 transactions, superior to previous works that achieve compromised security at O(n) transactions cost. By evaluating examples and real-world MPTs, the gas cost of Cloak reduces by 32.4% on average. △ Less

Submitted 10 February, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: accepted by ACSAC'22

Showing 1–50 of 65 results for author: Ren, Q