subscribe to arXiv mailings

GPTQT: Quantize Large Language Models Twice to Push the Efficiency

Authors: Yipin Guo, Yilin Lang, Qinyuan Ren

Abstract: Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed by expressing the weight of LLM in 3bit/2bit. Practice has shown that minimizing the quantization error of weights is ineffective, leading to overfitting. There… ▽ More Due to their large size, generative Large Language Models (LLMs) require significant computing and storage resources. This paper introduces a new post-training quantization method, GPTQT, to reduce memory usage and enhance processing speed by expressing the weight of LLM in 3bit/2bit. Practice has shown that minimizing the quantization error of weights is ineffective, leading to overfitting. Therefore, GPTQT employs a progressive two-step approach: initially quantizing weights using Linear quantization to a relatively high bit, followed by converting obtained int weight to lower bit binary coding. A re-explore strategy is proposed to optimize initial scaling factor. During inference, these steps are merged into pure binary coding, enabling efficient computation. Testing across various models and datasets confirms GPTQT's effectiveness. Compared to the strong 3-bit quantization baseline, GPTQT further reduces perplexity by 4.01 on opt-66B and increases speed by 1.24 times on opt-30b. The results on Llama2 show that GPTQT is currently the best binary coding quantization method for such kind of LLMs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by 11th IEEE International Conference on Cybernetics and Intelligent Systems

arXiv:2407.02881 [pdf, other]

ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation

Authors: Yipin Guo, Zihao Li, Yilin Lang, Qinyuan Ren

Abstract: Operators devoid of multiplication, such as Shift and Add, have gained prominence for their compatibility with hardware. However, neural networks (NNs) employing these operators typically exhibit lower accuracy compared to conventional NNs with identical structures. ShiftAddAug uses costly multiplication to augment efficient but less powerful multiplication-free operators, improving performance wi… ▽ More Operators devoid of multiplication, such as Shift and Add, have gained prominence for their compatibility with hardware. However, neural networks (NNs) employing these operators typically exhibit lower accuracy compared to conventional NNs with identical structures. ShiftAddAug uses costly multiplication to augment efficient but less powerful multiplication-free operators, improving performance without any inference overhead. It puts a ShiftAdd tiny NN into a large multiplicative model and encourages it to be trained as a sub-model to obtain additional supervision. In order to solve the weight discrepancy problem between hybrid operators, a new weight sharing method is proposed. Additionally, a novel two stage neural architecture search is used to obtain better augmentation effects for smaller but stronger multiplication-free tiny neural networks. The superiority of ShiftAddAug is validated through experiments in image classification and semantic segmentation, consistently delivering noteworthy enhancements. Remarkably, it secures up to a 4.95% increase in accuracy on the CIFAR100 compared to its directly trained counterparts, even surpassing the performance of multiplicative NNs. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 CVPR Workshop : Efficient Deep Learning for Computer Vision

arXiv:2407.02878 [pdf, other]

Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving

Authors: Yipin Guo, Yilin Lang, Qinyuan Ren

Abstract: To address the challenges of sensor fusion and safety risk prediction, contemporary closed-loop autonomous driving neural networks leveraging imitation learning typically require a substantial volume of parameters and computational resources to run neural networks. Given the constrained computational capacities of onboard vehicular computers, we introduce a compact yet potent solution named Effici… ▽ More To address the challenges of sensor fusion and safety risk prediction, contemporary closed-loop autonomous driving neural networks leveraging imitation learning typically require a substantial volume of parameters and computational resources to run neural networks. Given the constrained computational capacities of onboard vehicular computers, we introduce a compact yet potent solution named EfficientFuser. This approach employs EfficientViT for visual information extraction and integrates feature maps via cross attention. Subsequently, it utilizes a decoder-only transformer for the amalgamation of multiple features. For prediction purposes, learnable vectors are embedded as tokens to probe the association between the task and sensor features through attention. Evaluated on the CARLA simulation platform, EfficientFuser demonstrates remarkable efficiency, utilizing merely 37.6% of the parameters and 8.7% of the computations compared to the state-of-the-art lightweight method with only 0.4% lower driving score, and the safety score neared that of the leading safety-enhanced method, showcasing its efficacy and potential for practical deployment in autonomous driving systems. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Best Paper Award of the IEEE 13th Data-Driven Control and Learning Systems Conference

arXiv:2406.19853 [pdf, other]

YuLan: An Open-source Large Language Model

Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.13583 [pdf, other]

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

Authors: Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

Abstract: The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus… ▽ More The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus on generating pseudo-labels for old datasets to force the model to memorize the learned features. However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks. To avoid this problem, we propose a network by introducing the data-specific Mixture of Experts (MoE) structure to handle the new tasks or categories, ensuring that the network parameters of previous tasks are unaffected or only minimally impacted. To further overcome the tremendous memory costs caused by introducing additional structures, we propose a Low-Rank strategy which significantly reduces memory cost. We validate our method on both class-level and task-level continual learning challenges. Extensive experiments on multiple datasets show our model outperforms all other methods. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11921 [pdf, other]

Rethinking Spatio-Temporal Transformer for Traffic Prediction:Multi-level Multi-view Augmented Learning Framework

Authors: Jiaqi Lin, Qianqian Ren

Abstract: Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- an… ▽ More Traffic prediction is a challenging spatio-temporal forecasting problem that involves highly complex spatio-temporal correlations. This paper proposes a Multi-level Multi-view Augmented Spatio-temporal Transformer (LVSTformer) for traffic prediction. The model aims to capture spatial dependencies from three different levels: local geographic, global semantic, and pivotal nodes, along with long- and short-term temporal dependencies. Specifically, we design three spatial augmented views to delve into the spatial information from the perspectives of local, global, and pivotal nodes. By combining three spatial augmented views with three parallel spatial self-attention mechanisms, the model can comprehensively captures spatial dependencies at different levels. We design a gated temporal self-attention mechanism to effectively capture long- and short-term temporal dependencies. Furthermore, a spatio-temporal context broadcasting module is introduced between two spatio-temporal layers to ensure a well-distributed allocation of attention scores, alleviating overfitting and information loss, and enhancing the generalization ability and robustness of the model. A comprehensive set of experiments is conducted on six well-known traffic benchmarks, the experimental results demonstrate that LVSTformer achieves state-of-the-art performance compared to competing baselines, with the maximum improvement reaching up to 4.32%. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05914 [pdf, other]

Soundscape Captioning using Sound Affective Quality Network and Large Language Model

Authors: Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren

Abstract: We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship betwe… ▽ More We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes of sounds, such as their category and temporal characteristics, ignoring the effect of sounds on people and failing to explore the relationship between sounds and the emotions they evoke within a context. To fill this gap and to automate soundscape analysis, which traditionally relies on labour-intensive subjective ratings and surveys, we propose the soundscape captioning (SoundSCap) task. SoundSCap generates context-aware soundscape descriptions by capturing the acoustic scene, event information, and the corresponding human affective qualities. To this end, we propose an automatic soundscape captioner (SoundSCaper) composed of an acoustic model, SoundAQnet, and a general large language model (LLM). SoundAQnet simultaneously models multi-scale information about acoustic scenes, events, and perceived affective qualities, while LLM generates soundscape captions by parsing the information captured by SoundAQnet to a common language. The soundscape caption's quality is assessed by a jury of 16 audio/soundscape experts. The average score (out of 5) of SoundSCaper-generated captions is lower than the score of captions generated by two soundscape experts by 0.21 and 0.25, respectively, on the evaluation set and the model-unknown mixed external dataset with varying lengths and acoustic properties, but the differences are not statistically significant. Overall, SoundSCaper-generated captions show promising performance compared to captions annotated by soundscape experts. The models' code, LLM scripts, human assessment data and instructions, and expert evaluation statistics are all publicly available. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/Yuanbo2020/SoundSCaper

arXiv:2405.13025 [pdf, other]

A survey on fairness of large language models in e-commerce: progress, application, and challenge

Authors: Qingyang Ren, Zilin Jiang, Jinghan Cao, Sijia Li, Chiqu Li, Yiyang Liu, Shuning Huo, Tiange He, Yuan Chen

Abstract: This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing… ▽ More This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments. △ Less

Submitted 21 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 21 pages, 9 figures

arXiv:2405.09708 [pdf, ps, other]

doi 10.1109/LRA.2024.3401117

No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation

Authors: Qiaoqiao Ren, Yuanbo Hou, Dick Botteldooren, Tony Belpaeme

Abstract: Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study… ▽ More Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: IEEE Robotics and Automation Letters (IEEE RAL)

arXiv:2404.17394 [pdf, other]

Child Speech Recognition in Human-Robot Interaction: Problem Solved?

Authors: Ruben Janssens, Eva Verhelst, Giulio Antonio Abbo, Qiaoqiao Ren, Maria Jose Pinto Bernal, Tony Belpaeme

Abstract: Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child s… ▽ More Automated Speech Recognition shows superhuman performance for adult English speech on a range of benchmarks, but disappoints when fed children's speech. This has long sat in the way of child-robot interaction. Recent evolutions in data-driven speech recognition, including the availability of Transformer architectures and unprecedented volumes of training data, might mean a breakthrough for child speech recognition and social robot applications aimed at children. We revisit a study on child speech recognition from 2017 and show that indeed performance has increased, with newcomer OpenAI Whisper doing markedly better than leading commercial cloud services. While transcription is not perfect yet, the best model recognises 60.3% of sentences correctly barring small grammatical differences, with sub-second transcription time running on a local GPU, showing potential for usable autonomous child-robot speech interactions. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Presented at 2024 International Symposium on Technological Advances in Human-Robot Interaction

arXiv:2404.00443 [pdf, ps, other]

UDE-based Dynamic Motion Force Control of Mobile Manipulators

Authors: Songqun Gao, Wendi Ding, Qinyuan Ren, Ben M. Chen

Abstract: Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor tra… ▽ More Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor transferability. To address this, this article presents a novel dynamic model of the manipulator on the mobile base that requires only the manipulator dynamics and the kinematic information of the mobile base. In addition, embedding the dynamic model, an uncertainty and disturbance estimator-based (UDE-based) dynamic motion/force control scheme is proposed for the mobile manipulator, which compensates for the dynamic coupling and other unmodeled uncertainties. Passivity and stability analyses justify the proposed control law. Simulation and experimental results on our mobile manipulator platform demonstrate the feasibility and effectiveness of our proposed methodology. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.07865 [pdf, other]

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Authors: Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

Abstract: The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces C… ▽ More The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse. While strategies like supervised fine-tuning and reinforcement learning from human feedback have enhanced their safety, these methods primarily focus on natural languages, which may not generalize to other domains. This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs, presenting a novel environment for testing the safety generalization of LLMs. Our comprehensive studies on state-of-the-art LLMs including GPT-4, Claude-2, and Llama-2 series reveal a new and universal safety vulnerability of these models against code input: CodeAttack bypasses the safety guardrails of all models more than 80\% of the time. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization, such as encoding natural language input with data structures. Furthermore, we give our hypotheses about the success of CodeAttack: the misaligned bias acquired by LLMs during code training, prioritizing code completion over avoiding the potential safety risk. Finally, we analyze potential mitigation measures. These findings highlight new safety risks in the code domain and the need for more robust safety alignment algorithms to match the code capabilities of LLMs. △ Less

Submitted 9 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: ACL Findings 2024, Code is available at https://github.com/renqibing/CodeAttack

arXiv:2402.10452 [pdf, other]

Khovanov homology and exotic $4$-manifolds

Authors: Qiuyu Ren, Michael Willis

Abstract: We show that the Khovanov-Rozansky $\mathfrak{gl}_2$ skein lasagna module distinguishes the exotic pair of knot traces $X_{-1}(-5_2)$ and $X_{-1}(P(3,-3,-8))$, an example first discovered by Akbulut. This gives the first analysis-free proof of the existence of exotic compact orientable $4$-manifolds. We also present a family of exotic knot traces that seem not directly recoverable from gauge/Floer… ▽ More We show that the Khovanov-Rozansky $\mathfrak{gl}_2$ skein lasagna module distinguishes the exotic pair of knot traces $X_{-1}(-5_2)$ and $X_{-1}(P(3,-3,-8))$, an example first discovered by Akbulut. This gives the first analysis-free proof of the existence of exotic compact orientable $4$-manifolds. We also present a family of exotic knot traces that seem not directly recoverable from gauge/Floer-theoretic methods. Along the way, we present new explicit calculations of the Khovanov skein lasagna modules, and we define lasagna generalizations of the Lee homology and Rasmussen $s$-invariant, which are of independent interest. Other consequences of our work include a slice obstruction of knots in $4$-manifolds with nonvanishing skein lasagna module, a sharp shake genus bound for some knots from the lasagna $s$-invariant, and a construction of induced maps on Khovanov homology for cobordisms in $k\mathbb{CP}^2$. △ Less

Submitted 2 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 53 pages, 2 figures; updated with corrections and new applications

MSC Class: 57K41 (Primary) 57K18; 57R55; 57R56 (Secondary)

arXiv:2402.01163 [pdf, other]

Enhanced Urban Region Profiling with Adversarial Self-Supervised Learning

Authors: Weiliang Chan, Qianqian Ren, Jinbao Li

Abstract: Urban region profiling is pivotal for smart cities, but mining fine-grained semantics from noisy and incomplete urban data remains challenging. In response, we propose a novel self-supervised graph collaborative filtering model for urban region embedding called EUPAS. Specifically, region heterogeneous graphs containing human mobility data, point of interests (POIs) information, and geographic nei… ▽ More Urban region profiling is pivotal for smart cities, but mining fine-grained semantics from noisy and incomplete urban data remains challenging. In response, we propose a novel self-supervised graph collaborative filtering model for urban region embedding called EUPAS. Specifically, region heterogeneous graphs containing human mobility data, point of interests (POIs) information, and geographic neighborhood details for each region are fed into the model, which generates region embeddings that preserve intra-region and inter-region dependencies through GCNs and multi-head attention. Meanwhile, we introduce spatial perturbation augmentation to generate positive samples that are semantically similar and spatially close to the anchor, preparing for subsequent contrastive learning. Furthermore, adversarial training is employed to construct an effective pretext task by generating strong positive pairs and mining hard negative pairs for the region embeddings. Finally, we jointly optimize supervised and self-supervised learning to encourage the model to capture the high-level semantics of region embeddings while ignoring the noisy and unimportant details. Extensive experiments on real-world datasets demonstrate the superiority of our model over state-of-the-art methods. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.18057 [pdf, other]

Rank Supervised Contrastive Learning for Time Series Classification

Authors: Qianying Ren, Dongsheng Luo, Dongjin Song

Abstract: Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimila… ▽ More Recently, various contrastive learning techniques have been developed to categorize time series data and exhibit promising performance. A general paradigm is to utilize appropriate augmentations and construct feasible positive samples such that the encoder can yield robust and discriminative representations by mapping similar data points closer together in the feature space while pushing dissimilar data points farther apart. Despite its efficacy, the fine-grained relative similarity (e.g., rank) information of positive samples is largely ignored, especially when labeled samples are limited. To this end, we present Rank Supervised Contrastive Learning (RankSCL) to perform time series classification. Different from conventional contrastive learning frameworks, RankSCL augments raw data in a targeted way in the embedding space and adopts certain filtering rules to select more informative positive and negative pairs of samples. Moreover, a novel rank loss is developed to assign different weights for different levels of positive samples, enable the encoder to extract the fine-grained information of the same class, and produce a clear boundary among different classes. Thoroughly empirical studies on 128 UCR datasets and 30 UEA datasets demonstrate that the proposed RankSCL can achieve state-of-the-art performance compared to existing baseline methods. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.17802 [pdf, other]

doi 10.1016/j.ins.2024.120712

Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

Authors: Haozhi Gao, Qianqian Ren, Jinbao Li

Abstract: Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative dis… ▽ More Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%. △ Less

Submitted 25 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15071 [pdf, other]

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications. △ Less

Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.10485 [pdf, ps, other]

Sign-changing concentration phenomena of an anisotropic sinh-Poisson type equation with a Hardy or Hénon term

Authors: Qiang Ren

Abstract: We consider the following anisotropic sinh-Poisson tpye equation with a Hardy or Hénon term: $${\begin{array}{ll} -\Div (a(x)\nabla u)+ a(x)u=\varepsilon^2a(x)|x-q|^{2α}(e^u-e^{-u}) &\mbox{in $Ω$,} \\ \frac{\partial u}{\partial n}=0, &\mbox{on $Ω$,} \end{array}$$ where $\varepsilon>0$, $q\in \barΩ\subset \R^2$, $α\in(-1,\infty)\char92 \N$, $Ω\subset \R^2$ is a smooth bounded domain, $n… ▽ More We consider the following anisotropic sinh-Poisson tpye equation with a Hardy or Hénon term: $${\begin{array}{ll} -\Div (a(x)\nabla u)+ a(x)u=\varepsilon^2a(x)|x-q|^{2α}(e^u-e^{-u}) &\mbox{in $Ω$,} \\ \frac{\partial u}{\partial n}=0, &\mbox{on $Ω$,} \end{array}$$ where $\varepsilon>0$, $q\in \barΩ\subset \R^2$, $α\in(-1,\infty)\char92 \N$, $Ω\subset \R^2$ is a smooth bounded domain, $n$ is the unit outward normal vector of $\partial Ω$ and $a(x)$ is a smooth positive function defined on $\barΩ$. From finite dimensional reduction method, we proved that the problem \eqref{115} has a sequence of sign-changing solutions with arbitrarily many interior spikes accumulating to $q$, provided $q\in Ω$ is a local maximizer of $a(x)$. However, if $q\in \partial Ω$ is a strict local maximum point of $a(x)$ and satisfies $\langle \nabla a(q),n \rangle=0$, we proved that \eqref{115} has a family of sign-changing solutions with arbitrarily many mixed interior and boundary spikes accumulating to $q$. Under the same condition, we could also construct a sequence of blow-up solutions for the following problem $$ {\begin{array}{ll} -\Div (a(x)\nabla u)+ a(x)u=\varepsilon^2a(x)|x-q|^{2α}e^u &\mbox{in $Ω$,} \\ \frac{\partial u}{\partial n}=0, &\mbox{on $\partialΩ$.} \end{array}$$ △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09067 [pdf, other]

Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding

Authors: Depeng Li, Tianqi Wang, Junwei Chen, Qining Ren, Kenji Kawaguchi, Zhigang Zeng

Abstract: Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the tr… ▽ More Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024

arXiv:2401.08844 [pdf]

Wind tunnel actuation movement system

Authors: Qiaoqiao Ren

Abstract: In this dissertation project, an actuation system was designed for the supersonic wind tunnel at the University of Manchester. The aim of this project is to build a remote control actuation system which could adjust the angle of attack for the aerodynamic shape to save researchers' time and improve the experimental efficiency. This project involves the model supporting system, a six component wind… ▽ More In this dissertation project, an actuation system was designed for the supersonic wind tunnel at the University of Manchester. The aim of this project is to build a remote control actuation system which could adjust the angle of attack for the aerodynamic shape to save researchers' time and improve the experimental efficiency. This project involves the model supporting system, a six component wind tunnel balance, a control system design, a virtual angle of attack adjustment interface and LabVIEW programming implementation, the angle of attack adjustment range is from -20 to 20 degree. The three-dimensional model of the mechanical part and its engineering drawing were finished in SolidWorks, and the control system including the sensors and rotary encoder control, the closed-loop control of the stepper motor and the wind tunnel balance feedback. The performance of the wind tunnel balance can be known in advance by finite element analysis. Finally, the virtual operating system was built based on the LabVIEW and Arduino interactive programs △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.09952 [pdf, other]

Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

Authors: Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-relat… ▽ More WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR). Specifically, this paper proposes a lightweight multi-level graph learning (MLGL) based on local and global semantic graphs to simultaneously perform audio event classification (AEC) and human annoyance rating prediction (ARP). Experiments show that: 1) MLGL with 4.1 M parameters improves AEC and ARP results by using semantic node information in local and global context aware graphs; 2) MLGL captures relations between coarse and fine-grained AEs and AR well; 3) Statistical analysis of MLGL results shows that some AEs from different sources significantly correlate with AR, which is consistent with previous research on human perception of these sound sources. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2311.09940 [pdf, ps, other]

On the Weisfeiler algorithm of depth-$1$ stabilization

Authors: Gang Chen, Qing Ren, Ilia Ponomarenko

Abstract: An origin of the multidimensional Weisfeiler-Leman algorithm goes back to a refinement procedure of deep stabilization, introduced by B. Weisfeiler in a paper included in the collective monograph ``On construction and identification of graphs"(1976). This procedure is recursive and the recursion starts from an algorithm of depth-$1$ stabilization, which has never been discussed in the literature.… ▽ More An origin of the multidimensional Weisfeiler-Leman algorithm goes back to a refinement procedure of deep stabilization, introduced by B. Weisfeiler in a paper included in the collective monograph ``On construction and identification of graphs"(1976). This procedure is recursive and the recursion starts from an algorithm of depth-$1$ stabilization, which has never been discussed in the literature. A goal of the present paper is to show that a simplified algorithm of the depth-$1$ stabilization has the same power as the $3$-dimensional Weisfeiler-Leman algorithm. It is proved that the class of coherent configurations obtained at the output of this simplified algorithm coincides with the class introduced earlier by the third author. As an application we also prove that if there exist at least two nonisomorphic projective planes of order $q$, then the Weisfeiler-Leman dimension of the incidence graph of any projective plane of order $q$ is at least $4$. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 24 pages

MSC Class: 05E16

arXiv:2311.09030 [pdf]

doi 10.1121/10.0022408

AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance

Authors: Yuanbo Hou, Qiaoqiao Ren, Huizhong Zhang, Andrew Mitchell, Francesco Aletta, Jian Kang, Dick Botteldooren

Abstract: Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex… ▽ More Soundscape studies typically attempt to capture the perception and understanding of sonic environments by surveying users. However, for long-term monitoring or assessing interventions, sound-signal-based approaches are required. To this end, most previous research focused on psycho-acoustic quantities or automatic sound recognition. Few attempts were made to include appraisal (e.g., in circumplex frameworks). This paper proposes an artificial intelligence (AI)-based dual-branch convolutional neural network with cross-attention-based fusion (DCNN-CaF) to analyze automatic soundscape characterization, including sound recognition and appraisal. Using the DeLTA dataset containing human-annotated sound source labels and perceived annoyance, the DCNN-CaF is proposed to perform sound source classification (SSC) and human-perceived annoyance rating prediction (ARP). Experimental findings indicate that (1) the proposed DCNN-CaF using loudness and Mel features outperforms the DCNN-CaF using only one of them. (2) The proposed DCNN-CaF with cross-attention fusion outperforms other typical AI-based models and soundscape-related traditional machine learning methods on the SSC and ARP tasks. (3) Correlation analysis reveals that the relationship between sound sources and annoyance is similar for humans and the proposed AI-based DCNN-CaF model. (4) Generalization tests show that the proposed model's ARP in the presence of model-unknown sound sources is consistent with expert expectations and can explain previous findings from the literature on sound-scape augmentation. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: The Journal of the Acoustical Society of America, 154 (5), 3145

Journal ref: The Journal of the Acoustical Society of America, 154, 3145 (2023)

arXiv:2310.19846 [pdf, other]

A high-Q metasurface signal isolator for 1.5T surface coil magnetic resonance imaging on the go

Authors: Qun Ren, Yuxin Lang, Yuqi Ja, Xia Xiao, Yu Liu, Xiangzheng Kong, Ruiqi Jin, Yongqing He, Jianwei You, Wei Sha, Yanwei Pang

Abstract: The combination of surface coils and metamaterials remarkably enhance magnetic resonance imaging (MRI) performance for significant local staging flexibility. However, due to the coupling in between, impeded signal-to-noise ratio (SNR) and low-contrast resolution, further hamper the future growth in clinical MRI. In this paper, we propose a high-Q metasurface decoupling isolator fueled by topologic… ▽ More The combination of surface coils and metamaterials remarkably enhance magnetic resonance imaging (MRI) performance for significant local staging flexibility. However, due to the coupling in between, impeded signal-to-noise ratio (SNR) and low-contrast resolution, further hamper the future growth in clinical MRI. In this paper, we propose a high-Q metasurface decoupling isolator fueled by topological LC loops for 1.5T surface coil MRI system, increasing the magnetic field up to fivefold at 63.8 MHz. We have employed a polarization conversion mechanism to effectively eliminate the coupling between the MRI metamaterial and the radio frequency (RF) surface transmitter-receiver coils. Furthermore, a high-Q metasurface isolator was achieved by taking advantage of bound states in the continuum (BIC) for extremely high-field MRI and spectroscopy. An equivalent physical model of the miniaturized metasurface design was put forward through LC circuit analysis. This study opens up a promising route for the easy-to-use and portable surface coil MRI scanners. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.13347 [pdf, other]

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

Authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge

Abstract: The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hinder… ▽ More The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hindered by the scarcity of appropriately labeled datasets. The existing video datasets pose several limitations: 1) these datasets are small-scale in size to support comprehensive investigations of nursing activity; 2) they primarily focus on single procedures, lacking expert-level annotations for various nursing procedures and action steps; and 3) they lack temporally localized annotations, which prevents the effective localization of targeted actions within longer video sequences. To mitigate these limitations, we propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding. NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets. Notably, it encompasses 51 distinct nursing procedures and 177 action steps, providing a much more comprehensive coverage compared to existing datasets that primarily focus on limited procedures. To evaluate the efficacy of current deep learning methods on nursing activity understanding, we establish three benchmarks on NurViD: procedure recognition on untrimmed videos, procedure and action recognition on trimmed videos, and action detection. Our benchmark and code will be available at \url{https://github.com/minghu0830/NurViD-benchmark}. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

arXiv:2309.11907 [pdf, other]

Learning to Recover for Safe Reinforcement Learning

Authors: Haoyu Wang, Xin Yuan, Qinqing Ren

Abstract: Safety controllers is widely used to achieve safe reinforcement learning. Most methods that apply a safety controller are using handcrafted safety constraints to construct the safety controller. However, when the environment dynamics are sophisticated, handcrafted safety constraints become unavailable. Therefore, it worth to research on constructing safety controllers by learning algorithms. We pr… ▽ More Safety controllers is widely used to achieve safe reinforcement learning. Most methods that apply a safety controller are using handcrafted safety constraints to construct the safety controller. However, when the environment dynamics are sophisticated, handcrafted safety constraints become unavailable. Therefore, it worth to research on constructing safety controllers by learning algorithms. We propose a three-stage architecture for safe reinforcement learning, namely TU-Recovery Architecture. A safety critic and a recovery policy is learned before task training. They form a safety controller to ensure safety in task training. Then a phenomenon induced by disagreement between task policy and recovery policy, called adversarial phenomenon, which reduces learning efficiency and model performance, is described. Auxiliary reward is proposed to mitigate adversarial phenomenon, while help the task policy to learn to recover from high-risk states. A series of experiments are conducted in a robot navigation environment. Experiments demonstrate that TU-Recovery outperforms unconstrained counterpart in both reward gaining and constraint violations during task training, and auxiliary reward further improve TU-Recovery in reward-to-cost ratio by significantly reduce constraint violations. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11876 [pdf, other]

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

Authors: Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu

Abstract: Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representa… ▽ More Medical image segmentation is a fundamental yet challenging task due to the arduous process of acquiring large volumes of high-quality labeled data from experts. Contrastive learning offers a promising but still problematic solution to this dilemma. Because existing medical contrastive learning strategies focus on extracting image-level representation, which ignores abundant multi-level representations. And they underutilize the decoder either by random initialization or separate pre-training from the encoder, thereby neglecting the potential collaboration between the encoder and decoder. To address these issues, we propose a novel multi-level asymmetric contrastive learning framework named MACL for volumetric medical image segmentation pre-training. Specifically, we design an asymmetric contrastive learning structure to pre-train encoder and decoder simultaneously to provide better initialization for segmentation models. Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase. Finally, experiments on 12 volumetric medical image datasets indicate our MACL framework outperforms existing 11 contrastive learning strategies. {\itshape i.e.} Our MACL achieves a superior performance with more precise predictions from visualization figures and 2.28\%, 1.32\%, 1.62\% and 1.60\% Average Dice higher than previous best results on CHD, MMWHS, CHAOS and AMOS, respectively. And our MACL also has a strong generalization ability among 5 variant U-Net backbones. Our code will be available at https://github.com/stevezs315/MACL. △ Less

Submitted 13 May, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.08854 [pdf, other]

Intention-Aware Planner for Robust and Safe Aerial Tracking

Authors: Qiuyu Ren, Huan Yu, Jiajun Dai, Zhi Zheng, Jun Meng, Li Xu, Chao Xu, Fei Gao, Yanjun Cao

Abstract: Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move… ▽ More Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move aggressively, such as sudden turn or deceleration. In this paper, we propose an intention-aware planner by additionally considering the intention of the target to enhance safety and robustness in aerial tracking applications. Firstly, a designated intention prediction method is proposed, which combines a user-defined potential assessment function and a state observation function. A reachable region is generated to specifically evaluate the turning intentions. Then we design an intention-driven hybrid A* method to predict the future possible positions for the target. Finally, an intention-aware optimization approach is designed to generate a spatial-temporal optimal trajectory, allowing the tracker to perceive unexpected situations from the target. Benchmark comparisons and real-world experiments are conducted to validate the performance of our method. △ Less

Submitted 20 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 8 pages, 10 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2309.06912 [pdf, other]

Multi-behavior Recommendation with SVD Graph Neural Networks

Authors: Shengxi Fu, Qianqian Ren, Xingfeng Lv, Jinbao Li

Abstract: Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations… ▽ More Graph Neural Networks (GNNs) have been extensively employed in the field of recommendation systems, offering users personalized recommendations and yielding remarkable outcomes. Recently, GNNs incorporating contrastive learning have demonstrated promising performance in handling the sparse data problem of recommendation systems. However, existing contrastive learning methods still have limitations in resisting noise interference, especially for multi-behavior recommendation. To mitigate the aforementioned issues, this paper proposes a GNN-based multi-behavior recommendation model called MB-SVD that utilizes Singular Value Decomposition (SVD) graphs to enhance model performance. In particular, MB-SVD considers user preferences across different behaviors, improving recommendation effectiveness. First, MB-SVD integrates the representation of users and items under different behaviors with learnable weight scores, which efficiently considers the influence of different behaviors. Then, MB-SVD generates augmented graph representation with global collaborative relations. Next, we simplify the contrastive learning framework by directly contrasting original representation with the enhanced representation using the InfoNCE loss. Through extensive experimentation, the remarkable performance of our proposed MB-SVD approach in multi-behavior recommendation endeavors across diverse real-world datasets is exhibited. △ Less

Submitted 9 May, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2309.02741 [pdf, other]

Toroidal Hitomezashi Patterns

Authors: Qiuyu Ren, Shengtong Zhang

Abstract: Extending a proposal of Defant and Kravitz [Discrete Mathematics, \textbf{1}, 347 (2024)], we define Hitomezashi patterns and loops on a torus and provide several structural results for such loops. For a given pattern, our main theorems give optimal residual information regarding the Hitomezashi loop length, loop count, as well as possible homology classes of such loops. Special attention is paid… ▽ More Extending a proposal of Defant and Kravitz [Discrete Mathematics, \textbf{1}, 347 (2024)], we define Hitomezashi patterns and loops on a torus and provide several structural results for such loops. For a given pattern, our main theorems give optimal residual information regarding the Hitomezashi loop length, loop count, as well as possible homology classes of such loops. Special attention is paid to toroidal Hitomezashi patterns that are symmetric with respect to the diagonal $x = y$, where we establish a novel connection between Hitomezashi and knot theory. △ Less

Submitted 13 January, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: 20 pages, 11 figures

MSC Class: 00A08; 00A66; 05B50

arXiv:2308.15150 [pdf, other]

Unleashing the Potential of Spiking Neural Networks for Sequential Modeling with Contextual Embedding

Authors: Xinyi Chen, Jibin Wu, Huajin Tang, Qinyuan Ren, Kay Chen Tan

Abstract: The human brain exhibits remarkable abilities in integrating temporally distant sensory inputs for decision-making. However, existing brain-inspired spiking neural networks (SNNs) have struggled to match their biological counterpart in modeling long-term temporal relationships. To address this problem, this paper presents a novel Contextual Embedding Leaky Integrate-and-Fire (CE-LIF) spiking neuro… ▽ More The human brain exhibits remarkable abilities in integrating temporally distant sensory inputs for decision-making. However, existing brain-inspired spiking neural networks (SNNs) have struggled to match their biological counterpart in modeling long-term temporal relationships. To address this problem, this paper presents a novel Contextual Embedding Leaky Integrate-and-Fire (CE-LIF) spiking neuron model. Specifically, the CE-LIF model incorporates a meticulously designed contextual embedding component into the adaptive neuronal firing threshold, thereby enhancing the memory storage of spiking neurons and facilitating effective sequential modeling. Additionally, theoretical analysis is provided to elucidate how the CE-LIF model enables long-term temporal credit assignment. Remarkably, when compared to state-of-the-art recurrent SNNs, feedforward SNNs comprising the proposed CE-LIF neurons demonstrate superior performance across extensive sequential modeling tasks in terms of classification accuracy, network convergence speed, and memory capacity. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.11980 [pdf, other]

Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel hierarchical graph representation learning (HGRL) approach which links objective audio events (AE) with subjective annoyance ratings (AR) of the soundscape perceived by humans. The hierarchical graph consists of fine-grained event (fAE) embeddings with single-class event semantics, coarse-grained event (cAE) embeddings with multi-class event semantics, and AR embeddings. Experiments show the proposed HGRL successfully integrates AE with AR for AEC and ARP tasks, while coordinating the relations between cAE and fAE and further aligning the two different grains of AE information with the AR. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

arXiv:2308.04949 [pdf, other]

Branches Mutual Promotion for End-to-End Weakly Supervised Semantic Segmentation

Authors: Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, Shuang Zeng, Qiushi Ren, Yanye Lu

Abstract: End-to-end weakly supervised semantic segmentation aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hind… ▽ More End-to-end weakly supervised semantic segmentation aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hindering these two branches from assisting each other. In our work, we treat these two branches equally by viewing them as diverse ways to generate the segmentation map, and add interactions on both their supervision and operation to achieve mutual promotion. For this purpose, a bidirectional supervision mechanism is elaborated to force the consistency between the outputs of these two branches. Thus, the segmentation branch can also give feedback to the classification branch to enhance the quality of localization seeds. Moreover, our method also designs interaction operations between these two branches to exchange their knowledge to assist each other. Experiments indicate our work outperforms existing end-to-end weakly supervised segmentation methods. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2307.08848 [pdf]

Microbiome-derived bile acids contribute to elevated antigenic response and bone erosion in rheumatoid arthritis

Authors: Xiuli Su, Xiaona Li, Yanqin Bian, Qing Ren, Leiguang Li, Xiaohao Wu, Hemi Luan, Bing He, Xiaojuan He, Hui Feng, Xingye Cheng, Pan-Jun Kim, Leihan Tang, Aiping Lu, Lianbo Xiao, Liang Tian, Zhu Yang, Zongwei Cai

Abstract: Rheumatoid arthritis (RA) is a chronic, disabling and incurable autoimmune disease. It has been widely recognized that gut microbial dysbiosis is an important contributor to the pathogenesis of RA, although distinct alterations in microbiota have been associated with this disease. Yet, the metabolites that mediate the impacts of the gut microbiome on RA are less well understood. Here, with microbi… ▽ More Rheumatoid arthritis (RA) is a chronic, disabling and incurable autoimmune disease. It has been widely recognized that gut microbial dysbiosis is an important contributor to the pathogenesis of RA, although distinct alterations in microbiota have been associated with this disease. Yet, the metabolites that mediate the impacts of the gut microbiome on RA are less well understood. Here, with microbial profiling and non-targeted metabolomics, we revealed profound yet diverse perturbation of the gut microbiome and metabolome in RA patients in a discovery set. In the Bacteroides-dominated RA patients, differentiation of gut microbiome resulted in distinct bile acid profiles compared to healthy subjects. Predominated Bacteroides species expressing BSH and 7a-HSDH increased, leading to elevated secondary bile acid production in this subgroup of RA patients. Reduced serum fibroblast growth factor-19 and dysregulated bile acids were evidence of impaired farnesoid X receptor-mediated signaling in the patients. This gut microbiota-bile acid axis was correlated to ACPA. The patients from the validation sets demonstrated that ACPA-positive patients have more abundant bacteria expressing BSH and 7a-HSDH but less Clostridium scindens expressing 7a-dehydroxylation enzymes, together with dysregulated microbial bile acid metabolism and more severe bone erosion than ACPA-negative ones. Mediation analyses revealed putative causal relationships between the gut microbiome, bile acids, and ACPA-positive RA, supporting a potential causal effect of Bacteroides species in increasing levels of ACPA and bone erosion mediated via disturbing bile acid metabolism. These results provide insights into the role of gut dysbiosis in RA in a manifestation-specific manner, as well as the functions of bile acids in this gut-joint axis, which may be a potential intervention target for precisely controlling RA conditions. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 38 pages, 6 figures

arXiv:2307.03212 [pdf, other]

Attentive Graph Enhanced Region Representation Learning

Authors: Weiliang Chen, Qianqian Ren, Jinbao Li

Abstract: Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks. Recently, with the expansion of the city, modeling long-range spatial dependencies with multiple data sources plays an important role in urban region representation. In this paper, we propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to c… ▽ More Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks. Recently, with the expansion of the city, modeling long-range spatial dependencies with multiple data sources plays an important role in urban region representation. In this paper, we propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to capture comprehensive dependencies from multiple graphs and learn rich semantic representations of urban regions. Specifically, we propose a graph-enhanced learning module to construct regional graphs by incorporating mobility flow patterns, point of interests (POIs) functions, and check-in semantics with noise filtering. Then, we present a multi-graph aggregation module to capture both local and global spatial dependencies between regions by integrating information from multiple graphs. In addition, we design a dual-stage fusion module to facilitate information sharing between different views and efficiently fuse multi-view representations for urban region embedding using an improved linear attention mechanism. Finally, extensive experiments on real-world datasets for three downstream tasks demonstrate the superior performance of our model compared to state-of-the-art methods. △ Less

Submitted 31 May, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

arXiv:2306.17816 [pdf, ps, other]

Slice genus bound in $DTS^2$ from $s$-invariant

Authors: Qiuyu Ren

Abstract: We prove a recent conjecture of Manolescu-Willis which states that the $s$-invariant of a knot in $\mathbb{RP}^3$ (as defined by them) gives a lower bound on its null-homologous slice genus in the unit disk bundle of $TS^2$. We also conjecture a lower bound in the more general case where the slice surface is not necessarily null-homologous, and give its proof in some special cases. We prove a recent conjecture of Manolescu-Willis which states that the $s$-invariant of a knot in $\mathbb{RP}^3$ (as defined by them) gives a lower bound on its null-homologous slice genus in the unit disk bundle of $TS^2$. We also conjecture a lower bound in the more general case where the slice surface is not necessarily null-homologous, and give its proof in some special cases. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 8 pages

MSC Class: 57K18 (Primary) 57K10; 57K40 (Secondary)

arXiv:2306.09718 [pdf, ps, other]

Label-noise-tolerant medical image classification via self-attention and self-supervised learning

Authors: Hongyang Jiang, Mengdi Gao, Yan Hu, Qiushi Ren, Zhaoheng Xie, Jiang Liu

Abstract: Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNN… ▽ More Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNNs suffer from overfitting noisy labels, degrading the performance of models. Therefore, in this work, we innovatively devise noise-robust training approach to mitigate the adverse effects of noisy labels in medical image classification. Specifically, we incorporate contrastive learning and intra-group attention mixup strategies into the vanilla supervised learning. The contrastive learning for feature extractor helps to enhance visual representation of DNNs. The intra-group attention mixup module constructs groups and assigns self-attention weights for group-wise samples, and subsequently interpolates massive noisy-suppressed samples through weighted mixup operation. We conduct comparative experiments on both synthetic and real-world noisy medical datasets under various noise levels. Rigorous experiments validate that our noise-robust method with contrastive learning and attention mixup can effectively handle with label noise, and is superior to state-of-the-art methods. An ablation study also shows that both components contribute to boost model performance. The proposed method demonstrates its capability of curb label noise and has certain potential toward real-world clinic applications. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 11pages, 8 figures

arXiv:2305.16089 [pdf, other]

Lee filtration structure of torus links

Authors: Qiuyu Ren

Abstract: We determine the quantum filtration structure of the Lee homology of all torus links. In particular, this determines the $s$-invariant of a torus link equipped with any orientation. In the special case $T(n,n)$, our result confirms a conjecture of Pardon, as well as a conjecture of Manolescu-Marengon-Sarkar-Willis which establishes an adjunction-type inequality of the $s$-invariant for cobordisms… ▽ More We determine the quantum filtration structure of the Lee homology of all torus links. In particular, this determines the $s$-invariant of a torus link equipped with any orientation. In the special case $T(n,n)$, our result confirms a conjecture of Pardon, as well as a conjecture of Manolescu-Marengon-Sarkar-Willis which establishes an adjunction-type inequality of the $s$-invariant for cobordisms in $k\overline{\mathbb{CP}^2}$. We also give a few applications of this adjunction inequality. △ Less

Submitted 20 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 22 pages, 4 figures in color; final version, to appear in Geometry and Topology

MSC Class: 57K18 (Primary) 57K10; 57K40 (Secondary)

arXiv:2305.08062 [pdf, other]

Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Authors: Yuta Saito, Qingyang Ren, Thorsten Joachims

Abstract: We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. O… ▽ More We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. OffCEM applies importance weighting only to action clusters and addresses the residual causal effect through model-based reward estimation. We show that the proposed estimator is unbiased under a new condition, called local correctness, which only requires that the residual-effect model preserves the relative expected reward differences of the actions within each cluster. To best leverage the CEM and local correctness, we also propose a new two-step procedure for performing model-based estimation that minimizes bias in the first step and variance in the second step. We find that the resulting OffCEM estimator substantially improves bias and variance compared to a range of conventional estimators. Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions. △ Less

Submitted 2 June, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

Comments: accepted at ICML2023. arXiv admin note: text overlap with arXiv:2202.06317

arXiv:2305.04196 [pdf, ps, other]

doi 10.1109/TWC.2023.3266344

Handoff-Aware Distributed Computing in High Altitude Platform Station (HAPS)-Assisted Vehicular Networks

Authors: Qiqi Ren, Omid Abbasi, Gunes Karabulut Kurt, Halim Yanikomeroglu, Jian Chen

Abstract: Distributed computing enables Internet of vehicle (IoV) services by collaboratively utilizing the computing resources from the network edge and the vehicles. However, the computing interruption issue caused by frequent edge network handoffs, and a severe shortage of computing resources are two problems in providing IoV services. High altitude platform station (HAPS) computing can be a promising ad… ▽ More Distributed computing enables Internet of vehicle (IoV) services by collaboratively utilizing the computing resources from the network edge and the vehicles. However, the computing interruption issue caused by frequent edge network handoffs, and a severe shortage of computing resources are two problems in providing IoV services. High altitude platform station (HAPS) computing can be a promising addition to existing distributed computing frameworks because of its wide coverage and strong computational capabilities. In this regard, this paper proposes an adaptive scheme in a new distributed computing framework that involves HAPS computing to deal with the two problems of the IoV. Based on the diverse demands of vehicles, network dynamics, and the time-sensitivity of handoffs, the proposed scheme flexibly divides each task into three parts and assigns them to the vehicle, roadside units (RSUs), and a HAPS to perform synchronous computing. The scheme also constrains the computing of tasks at RSUs such that they are completed before handoffs to avoid the risk of computing interruptions. On this basis, we formulate a delay minimization problem that considers task-splitting ratio, transmit power, bandwidth allocation, and computing resource allocation. To solve the problem, variable replacement and successive convex approximation-based method are proposed. The simulation results show that this scheme not only avoids the negative effects caused by handoffs in a flexible manner, it also takes delay performance into account and maintains the delay stability. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2305.01939 [pdf, other]

Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models

Authors: Qihan Ren, Jiayang Gao, Wen Shen, Quanshi Zhang

Abstract: This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded… ▽ More This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded samples, then the AI model will encode sparse interactive concepts. Each interactive concept represents an interaction between a specific set of input variables, and has a certain numerical effect on the inference score of the model. Specifically, it is proved that the inference score of the model can always be represented as the sum of the interaction effects of all interactive concepts. In fact, we hope to prove that conditions for the emergence of symbolic concepts are quite common. It means that for most AI models, we can usually use a small number of interactive concepts to mimic the model outputs on any arbitrarily masked samples. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.13914 [pdf]

doi 10.1103/PhysRevLett.130.216004

Evidence for Band Renormalizations in Strong-coupling Superconducting Alkali-fulleride Films

Authors: J. S. Zhou, R. Z. Xu, X. Q. Yu, F. J. Cheng, W. X. Zhao, X. Du, S. Z. Wang, Q. Q. Zhang, X. Gu, S. M. He, Y. D. Li, M. Q. Ren, X. C. Ma, Q. K. Xue, Y. L. Chen, C. L. Song, L. X. Yang

Abstract: There has been a long-standing debate about the mechanism of the unusual superconductivity in alkali-intercalated fulleride superconductors. In this work, using high-resolution angle-resolved photoemission spectroscopy, we systematically investigate the electronic structures of superconducting K3C60 thin films. We observe a dispersive energy band crossing the Fermi level with an occupied bandwidth… ▽ More There has been a long-standing debate about the mechanism of the unusual superconductivity in alkali-intercalated fulleride superconductors. In this work, using high-resolution angle-resolved photoemission spectroscopy, we systematically investigate the electronic structures of superconducting K3C60 thin films. We observe a dispersive energy band crossing the Fermi level with an occupied bandwidth of about 130 meV. The measured band structure shows prominent quasiparticle kinks and a replica band involving high-energy Jahn-Teller active Hg(8) phonon mode, reflecting strong electron-phonon coupling in the system. The electron-phonon coupling constant is estimated to be about 1.2, which dominates the quasiparticle mass renormalization. Moreover, we observe an isotropic nodeless superconducting gap beyond the mean-field estimation. Both the large electron-phonon coupling constant and large reduced superconducting gap suggest a strong-coupling superconductivity in K3C60, while the electronic correlation effect is indicated by the observation of a waterfall-like band dispersion and the small bandwidth compared with the effective Coulomb interaction. Our results not only directly visualize the crucial band structure of superconducting fulleride but also provide important insights into the mechanism of the unusual superconductivity. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted by Phys. Rev. Lett

arXiv:2304.02206 [pdf, other]

A Short Proof to Defant and Kravitz's theorem on the Length of Hitomezashi Loops

Authors: Qiuyu Ren, Shengtong Zhang

Abstract: We provide a shorter proof to Defant and Kravitz's theorem (arXiv:2201.03461, Theorem 1.2) on the length of Hitomezashi loops modulo 8. We provide a shorter proof to Defant and Kravitz's theorem (arXiv:2201.03461, Theorem 1.2) on the length of Hitomezashi loops modulo 8. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 4 pages, 2 figures

MSC Class: 00A08; 00A66; 05B50

arXiv:2303.15182 [pdf, other]

Hybrid Augmented Automated Graph Contrastive Learning

Authors: Yifu Chen, Qianqian Ren, Liu Yong

Abstract: Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a… ▽ More Graph augmentations are essential for graph contrastive learning. Most existing works use pre-defined random augmentations, which are usually unable to adapt to different input graphs and fail to consider the impact of different nodes and edges on graph semantics. To address this issue, we propose a framework called Hybrid Augmented Automated Graph Contrastive Learning (HAGCL). HAGCL consists of a feature-level learnable view generator and an edge-level learnable view generator. The view generators are end-to-end differentiable to learn the probability distribution of views conditioned on the input graph. It insures to learn the most semantically meaningful structure in terms of features and topology, respectively. Furthermore, we propose an improved joint training strategy, which can achieve better results than previous works without resorting to any weak label information in the downstream tasks and extensive evaluation of additional work. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.10899 [pdf]

doi 10.1038/s41567-023-02188-z

Giant phonon softening and avoided crossing in aliovalence-doped heavy-band thermoelectrics

Authors: Shen Han, Shengnan Dai, Jie Ma, Qingyong Ren, Chaoliang Hu, Ziheng Gao, Manh Duc Le, Denis Sheptyakov, Ping Miao, Shuki Torii, Takashi Kamiyama, Claudia Felser, Jiong Yang, Chenguang Fu, Tiejun Zhu

Abstract: Aliovalent doping has been adopted to optimize the electrical properties of semiconductors, while its impact on the phonon structure and propagation is seldom paid proper attention to. This work reveals that aliovalent doping can be much more effective in reducing the lattice thermal conductivity of thermoelectric semiconductors than the commonly employed isoelectronic alloying strategy. As demons… ▽ More Aliovalent doping has been adopted to optimize the electrical properties of semiconductors, while its impact on the phonon structure and propagation is seldom paid proper attention to. This work reveals that aliovalent doping can be much more effective in reducing the lattice thermal conductivity of thermoelectric semiconductors than the commonly employed isoelectronic alloying strategy. As demonstrated in the heavy-band NbFeSb system, a large reduction of 65% in the lattice thermal conductivity is achieved through only 10% aliovalent Hf-doping, compared to the 4 times higher isoelectronic Ta-alloying. It is elucidated that aliovalent doping introduces free charge carriers and enhances the screening, leading to the giant softening and deceleration of optical phonons. Moreover, the heavy dopant can induce the avoided-crossing of acoustic and optical phonon branches, further decelerating the acoustic phonons. These results highlight the significant role of aliovalent dopants in regulating the phonon structure and suppressing the phonon propagation of semiconductors. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.07588 [pdf, other]

doi 10.1093/mnras/stad795

Fast transitions of X-ray variability in the black hole transient GX 339--4: comparison with MAXI J1820+070 and MAXI J1348-630

Authors: Zi-Xu Yang, Liang Zhang, S. N. Zhang, M. Méndez, Federico García, Yue Huang, Qingcui Bu, He-Xin Liu, Wei Yu, P. J. Wang, L. Tao, D. Altamirano, Jin-Lu Qu, S. Zhang, X. Ma, L. M. Song, S. M. Jia, M. Y. Ge, Q. Z. Liu, J. Z. Yan, T. M. Li, X. Q. Ren, R. C. Ma, Yuexin Zhang, Y. C. Xu , et al. (8 additional authors not shown)

Abstract: Fast transitions between different types of power density spectra (PDS) happening over timescales of several tens of seconds are rare phenomena in black hole X-ray binaries. In this paper, we report a broadband spectral-timing analysis of the fast transitions observed in the 2021 outburst of GX 339-4 using NICER and HXMT observations. We observe transitions between band-limited noise-dominated PDS… ▽ More Fast transitions between different types of power density spectra (PDS) happening over timescales of several tens of seconds are rare phenomena in black hole X-ray binaries. In this paper, we report a broadband spectral-timing analysis of the fast transitions observed in the 2021 outburst of GX 339-4 using NICER and HXMT observations. We observe transitions between band-limited noise-dominated PDS and type-B quasi-periodic oscillations (QPOs), and their rapid appearance or disappearance. We also make a detailed comparison between the fast transitions in GX 339-4 with those seen in MAXI J1820+070 and MAXI J1348--630. By comparing the spectra of the periods with and without type-B QPOs, we find that the spectral ratios above 10 keV are nearly constant or slightly decreasing, and the values are different between sources. Below 10 keV, the flux change of the Comptonization component is inversely proportional to the flux change of the thermal component, suggesting that the appearance of type-B QPOs is associated with a redistribution of the accretion power between the disc and the Comptonizing emission region. The spectral ratios between the periods with type-B QPO and those with broadband noise are significantly different from that with type-B QPO and without type-B QPO, where the ratios (type-B QPO/broadband noise) show a maximum at around 4 keV and then decrease gradually towards high energies. Finally, we discuss the possible change of the geometry of the inner accretion flow and/or jet during the transitions. △ Less

Submitted 13 March, 2023; originally announced March 2023.

arXiv:2302.13095 [pdf, other]

Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

Authors: Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, Quanshi Zhang

Abstract: In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such… ▽ More In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability. The code is available at https://github.com/sjtu-xai-lab/BNN-concepts. △ Less

Submitted 1 December, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.02167 [pdf, other]

doi 10.1093/mnras/stad614

Timing analysis of EXO 2030+375 during its 2021 giant outburst observed with Insight-HXMT

Authors: Yu-Cong Fu, L. M. Song, G. Q. Ding, M. Y. Ge, Y. L. Tuo, S. Zhang, S. N. Zhang, X. Hou, J. L. Qu, J. Zhang, L. Zhang, Q. C. Bu, Y. Huang, X. Ma, X. Zhou, W. M. Yan, Z. X. Yang, X. F. Lu, T. M. Li, Y. C. Xu, P. J. Wang, S. H. Xiao, H. X. Liu, X. Q. Ren, Y. F. Du , et al. (2 additional authors not shown)

Abstract: We report the evolution of the X-ray pulsations of EXO 2030+375 during its 2021 outburst using the observations from \textit{Insight}-HXMT. Based on the accretion torque model, we study the correlation between the spin frequency derivatives and the luminosity. Pulsations can be detected in the energy band of 1--160 keV. The pulse profile evolves significantly with luminosity during the outburst, l… ▽ More We report the evolution of the X-ray pulsations of EXO 2030+375 during its 2021 outburst using the observations from \textit{Insight}-HXMT. Based on the accretion torque model, we study the correlation between the spin frequency derivatives and the luminosity. Pulsations can be detected in the energy band of 1--160 keV. The pulse profile evolves significantly with luminosity during the outburst, leading to that the whole outburst can be divided into several parts with different characteristics. The evolution of the pulse profile reveals the transition between the super-critical (fan-beam dominated) and the sub-critical accretion (pencil-beam dominated) mode. From the accretion torque model and the critical luminosity model, based on a distance of 7.1 kpc, the inferred magnetic fields are $(0.41-0.74) \times 10^{12}$ G and $(3.48-3.96) \times 10^{12}$ G, respectively, or based on a distance of 3.6 kpc, the estimated magnetic fields are $(2.4-4.3) \times 10^{13}$ G and $(0.98-1.11)\times 10^{12}$ G, respectively. Two different sets of magnetic fields both support the presence of multipole magnetic fields of the NS. △ Less

Submitted 25 February, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

arXiv:2302.01114 [pdf, ps, other]

On multidimensional Schur rings of finite groups

Authors: Gang Chen, Qing Ren, Ilia Ponomarenko

Abstract: For any finite group $G$ and a positive integer $m$, we define andstudy a Schur ring over the direct power $G^m$, which gives an algebraic interpretation of the partition of $G^m$ obtained by the $m$-dimensional Weisfeiler-Leman algorithm. It is proved that this ring determines the group $G$ up to isomorphism if $m\ge 3$, and approaches the Schur ring associated with the group $Aut(G)$ acting on… ▽ More For any finite group $G$ and a positive integer $m$, we define andstudy a Schur ring over the direct power $G^m$, which gives an algebraic interpretation of the partition of $G^m$ obtained by the $m$-dimensional Weisfeiler-Leman algorithm. It is proved that this ring determines the group $G$ up to isomorphism if $m\ge 3$, and approaches the Schur ring associated with the group $Aut(G)$ acting on $G^m$ naturally if $m$ increases. It turns out that the problem of finding this limit ring is polynomial-time equivalent to the group isomorphism problem. △ Less

Submitted 2 February, 2023; originally announced February 2023.

Comments: 20 pages

MSC Class: 20-08

arXiv:2301.12344 [pdf, other]

TJ-FlyingFish: Design and Implementation of an Aerial-Aquatic Quadrotor with Tiltable Propulsion Units

Authors: Xuchen Liu, Minghao Dou, Dongyue Huang, Biao Wang, Jinqiang Cui, Qinyuan Ren, Lihua Dou, Zhi Gao, Jie Chen, Ben M. Chen

Abstract: Aerial-aquatic vehicles are capable to move in the two most dominant fluids, making them more promising for a wide range of applications. We propose a prototype with special designs for propulsion and thruster configuration to cope with the vast differences in the fluid properties of water and air. For propulsion, the operating range is switched for the different mediums by the dual-speed propulsi… ▽ More Aerial-aquatic vehicles are capable to move in the two most dominant fluids, making them more promising for a wide range of applications. We propose a prototype with special designs for propulsion and thruster configuration to cope with the vast differences in the fluid properties of water and air. For propulsion, the operating range is switched for the different mediums by the dual-speed propulsion unit, providing sufficient thrust and also ensuring output efficiency. For thruster configuration, thrust vectoring is realized by the rotation of the propulsion unit around the mount arm, thus enhancing the underwater maneuverability. This paper presents a quadrotor prototype of this concept and the design details and realization in practice. △ Less

Submitted 6 February, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: 6 pages, 9 figures, accepted to 2023 IEEE International Conference on Robotics and Automation (ICRA)

Showing 1–50 of 146 results for author: Ren, Q