subscribe to arXiv mailings

The induced friction on a probe moving in a nonequilibrium medium

Abstract: Using a powerful combination of projection-operator method and path-space response theory, we derive the fluctuation dynamics of a slow inertial probe coupled to a steady nonequilibrium medium under the assumption of time-scale separation. The nonequilibrium can be realized by external nongradient driving on the medium particles or by their (athermal) active self-propulsion. The resulting friction… ▽ More Using a powerful combination of projection-operator method and path-space response theory, we derive the fluctuation dynamics of a slow inertial probe coupled to a steady nonequilibrium medium under the assumption of time-scale separation. The nonequilibrium can be realized by external nongradient driving on the medium particles or by their (athermal) active self-propulsion. The resulting friction on the probe is explicit as a time-correlation for medium observables and is decomposed into two terms, one entropic and proportional to the noise amplitude as in the Einstein relation for equilibrium media, and a frenetic term that can take both signs. As illustration, we give the exact expressions for the friction and noise of a probe in a rotating run-and-tumble medium and in a sheared overdamped medium. In both examples, we find an interesting transition to absolute negative probe friction as the nonequilibrium medium exhibits sufficient and persistent rotational current. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09771 [pdf, other]

Protecting Data Buyer Privacy in Data Markets

Authors: Minxing Zhang, Jian Pei

Abstract: Data markets serve as crucial platforms facilitating data discovery, exchange, sharing, and integration among data users and providers. However, the paramount concern of privacy has predominantly centered on protecting privacy of data owners and third parties, neglecting the challenges associated with protecting the privacy of data buyers. In this article, we address this gap by modeling the intri… ▽ More Data markets serve as crucial platforms facilitating data discovery, exchange, sharing, and integration among data users and providers. However, the paramount concern of privacy has predominantly centered on protecting privacy of data owners and third parties, neglecting the challenges associated with protecting the privacy of data buyers. In this article, we address this gap by modeling the intricacies of data buyer privacy protection and investigating the delicate balance between privacy and purchase cost. Through comprehensive experimentation, our results yield valuable insights, shedding light on the efficacy and efficiency of our proposed approaches. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.05100 [pdf, other]

doi 10.1109/TPAMI.2024.3425222

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

Authors: Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Bo Long, Yueting Zhuang, Jian Pei

Abstract: The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mapping problem, which leads to the failure of generating referential and meaningful questions from an image. ii) They fail to model complex implicit relati… ▽ More The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mapping problem, which leads to the failure of generating referential and meaningful questions from an image. ii) They fail to model complex implicit relations among the visual objects in an image and also overlook potential interactions between the side information and image. To address these limitations, we first propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference. Concretely, we aim to ask the right visual questions with Double Hints - textual answers and visual regions of interests, which could effectively mitigate the existing one-to-many mapping issue. Particularly, we develop a simple methodology to self-learn the visual hints without introducing any additional human annotations. Furthermore, to capture these sophisticated relationships, we propose a new double-hints guided Graph-to-Sequence learning framework, which first models them as a dynamic graph and learns the implicit topology end-to-end, and then utilizes a graph-to-sequence model to generate the questions with double hints. Experimental results demonstrate the priority of our proposed method. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 2024

arXiv:2407.00242 [pdf, other]

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

Authors: João Matos, Jack Gallifant, Jian Pei, A. Ian Wong

Abstract: Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize,… ▽ More Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics. While EHRmonize significantly enhances efficiency, reducing annotation time by an estimated 60%, we emphasize that clinician oversight remains essential. Our framework, available as a Python package, offers a promising tool to assist clinicians in EHR data abstraction, potentially accelerating healthcare research and improving data harmonization processes. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: submitted for review, total of 10 pages

arXiv:2406.18838 [pdf]

Electric-field control of the perpendicular magnetization switching in ferroelectric/ferrimagnet heterostructures

Authors: Pengfei Liu, Tao Xu, Qi Liu, Juncai Dong, Ting Lin, Qinhua Zhang, Xiukai Lan, Yu Sheng, Chunyu Wang, Jiajing Pei, Hongxin Yang, Lin Gu, Kaiyou Wang

Abstract: Electric field control of the magnetic state in ferrimagnets holds great promise for developing spintronic devices due to low power consumption. Here, we demonstrate a non-volatile reversal of perpendicular net magnetization in a ferrimagnet by manipulating the electric-field driven polarization within the Pb (Zr0.2Ti0.8) O3 (PZT)/CoGd heterostructure. Electron energy loss spectra and X-ray absorp… ▽ More Electric field control of the magnetic state in ferrimagnets holds great promise for developing spintronic devices due to low power consumption. Here, we demonstrate a non-volatile reversal of perpendicular net magnetization in a ferrimagnet by manipulating the electric-field driven polarization within the Pb (Zr0.2Ti0.8) O3 (PZT)/CoGd heterostructure. Electron energy loss spectra and X-ray absorption spectrum directly verify that the oxygen ion migration at the PZT/CoGd interface associated with reversing the polarization causes the enhanced/reduced oxidation in CoGd. Ab initio calculations further substantiate that the migrated oxygen ions can modulate the relative magnetization of Co/Gd sublattices, facilitating perpendicular net magnetization switching. Our findings offer an approach to effectively control ferrimagnetic net magnetization, holding significant implications for ferrimagnetic spintronic applications. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 21 pages,4 figures

arXiv:2406.18134 [pdf, other]

Assessing "Implicit" Retrieval Robustness of Large Language Models

Authors: Xiaoyu Shen, Rexhina Blloshmi, Dawei Zhu, Jiahuan Pei, Wei Zhang

Abstract: Retrieval-augmented generation has gained popularity as a framework to enhance large language models with external knowledge. However, its effectiveness hinges on the retrieval robustness of the model. If the model lacks retrieval robustness, its performance is constrained by the accuracy of the retriever, resulting in significant compromises when the retrieved context is irrelevant. In this paper… ▽ More Retrieval-augmented generation has gained popularity as a framework to enhance large language models with external knowledge. However, its effectiveness hinges on the retrieval robustness of the model. If the model lacks retrieval robustness, its performance is constrained by the accuracy of the retriever, resulting in significant compromises when the retrieved context is irrelevant. In this paper, we evaluate the "implicit" retrieval robustness of various large language models, instructing them to directly output the final answer without explicitly judging the relevance of the retrieved context. Our findings reveal that fine-tuning on a mix of gold and distracting context significantly enhances the model's robustness to retrieval inaccuracies, while still maintaining its ability to extract correct answers when retrieval is accurate. This suggests that large language models can implicitly handle relevant or irrelevant retrieved context by learning solely from the supervision of the final answer in an end-to-end manner. Introducing an additional process for explicit relevance judgment can be unnecessary and disrupts the end-to-end approach. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17858 [pdf, other]

Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection

Authors: Jialun Pei, Ruize Cui, Yaoqian Li, Weixin Si, Jing Qin, Pheng-Ann Heng

Abstract: Laparoscopic liver surgery poses a complex intraoperative dynamic environment for surgeons, where remains a significant challenge to distinguish critical or even hidden structures inside the liver. Liver anatomical landmarks, e.g., ridge and ligament, serve as important markers for 2D-3D alignment, which can significantly enhance the spatial perception of surgeons for precise surgery. To facilitat… ▽ More Laparoscopic liver surgery poses a complex intraoperative dynamic environment for surgeons, where remains a significant challenge to distinguish critical or even hidden structures inside the liver. Liver anatomical landmarks, e.g., ridge and ligament, serve as important markers for 2D-3D alignment, which can significantly enhance the spatial perception of surgeons for precise surgery. To facilitate the detection of laparoscopic liver landmarks, we collect a novel dataset called L3D, which comprises 1,152 frames with elaborated landmark annotations from surgical videos of 39 patients across two medical sites. For benchmarking purposes, 12 mainstream detection methods are selected and comprehensively evaluated on L3D. Further, we propose a depth-driven geometric prompt learning network, namely D2GPLand. Specifically, we design a Depth-aware Prompt Embedding (DPE) module that is guided by self-supervised prompts and generates semantically relevant geometric information with the benefit of global depth cues extracted from SAM-based features. Additionally, a Semantic-specific Geometric Augmentation (SGA) scheme is introduced to efficiently merge RGB-D spatial and geometric information through reverse anatomic perception. The experimental results indicate that D2GPLand obtains state-of-the-art performance on L3D, with 63.52% DICE and 48.68% IoU scores. Together with 2D-3D fusion technology, our method can directly provide the surgeon with intuitive guidance information in laparoscopic scenarios. △ Less

Submitted 27 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by MICCAI 2024

arXiv:2406.15968 [pdf, other]

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraining data by leveraging their conditional language modeling capabilities. ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. Our empirical findings show that conditioning member data on non-member prefixes induces a larger decrease in log-likelihood compared to non-member data. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset, even with random and synthetic prefixes, and can be further improved using an ensemble approach. Moreover, we conduct an in-depth analysis of LLMs' behavior with different membership contexts, providing insights into how LLMs leverage membership information for effective inference at both the sequence and token level. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15264 [pdf, other]

Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics

Authors: Weijia Zhang, Mohammad Aliannejadi, Yifei Yuan, Jiahuan Pei, Jia-Hong Huang, Evangelos Kanoulas

Abstract: Large language models (LLMs) often produce unsupported or unverifiable information, known as "hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports the associated statement remains a major challenge. Previous studies use faithfulness metrics to estima… ▽ More Large language models (LLMs) often produce unsupported or unverifiable information, known as "hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports the associated statement remains a major challenge. Previous studies use faithfulness metrics to estimate citation support automatically but are limited to binary classification, overlooking fine-grained citation support in practical scenarios. To investigate the effectiveness of faithfulness metrics in fine-grained scenarios, we propose a comparative evaluation framework that assesses the metric effectiveness in distinguishinging citations between three-category support levels: full, partial, and no support. Our framework employs correlation analysis, classification evaluation, and retrieval evaluation to measure the alignment between metric scores and human judgments comprehensively. Our results show no single metric consistently excels across all evaluations, revealing the complexity of assessing fine-grained support. Based on the findings, we provide practical recommendations for developing more effective metrics. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 12 pages, 3 figures

arXiv:2406.14797 [pdf, other]

Camera-Invariant Meta-Learning Network for Single-Camera-Training Person Re-identification

Authors: Jiangbo Pei, Zhuqing Jiang, Aidong Men, Haiying Wang, Haiyong Luo, Shiping Wen

Abstract: Single-camera-training person re-identification (SCT re-ID) aims to train a re-ID model using SCT datasets where each person appears in only one camera. The main challenge of SCT re-ID is to learn camera-invariant feature representations without cross-camera same-person (CCSP) data as supervision. Previous methods address it by assuming that the most similar person should be found in another camer… ▽ More Single-camera-training person re-identification (SCT re-ID) aims to train a re-ID model using SCT datasets where each person appears in only one camera. The main challenge of SCT re-ID is to learn camera-invariant feature representations without cross-camera same-person (CCSP) data as supervision. Previous methods address it by assuming that the most similar person should be found in another camera. However, this assumption is not guaranteed to be correct. In this paper, we propose a Camera-Invariant Meta-Learning Network (CIMN) for SCT re-ID. CIMN assumes that the camera-invariant feature representations should be robust to camera changes. To this end, we split the training data into meta-train set and meta-test set based on camera IDs and perform a cross-camera simulation via meta-learning strategy, aiming to enforce the representations learned from the meta-train set to be robust to the meta-test set. With the cross-camera simulation, CIMN can learn camera-invariant and identity-discriminative representations even there are no CCSP data. However, this simulation also causes the separation of the meta-train set and the meta-test set, which ignores some beneficial relations between them. Thus, we introduce three losses: meta triplet loss, meta classification loss, and meta camera alignment loss, to leverage the ignored relations. The experiment results demonstrate that our method achieves comparable performance with and without CCSP data, and outperforms the state-of-the-art methods on SCT re-ID benchmarks. In addition, it is also effective in improving the domain generalization ability of the model. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14534 [pdf, other]

Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

Authors: Long Lei, Jun Zhou, Jialun Pei, Baoliang Zhao, Yueming Jin, Yuen-Chun Jeremy Teoh, Jing Qin, Pheng-Ann Heng

Abstract: A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations betwe… ▽ More A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations between 2D frames and 3D volumes to be registered, resulting in real-time and accurate cardiac ultrasound frame-to-volume registration being a very challenging task. This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg. Specifically, the proposed model leverages epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features, thereby boosting the cross-dimensional matching effectiveness of low-quality ultrasound modalities. We further embed an inter-frame discriminative regularization term within the hybrid supervised learning to increase the distinction between adjacent slices in the same ultrasound volume to ensure registration stability. Experimental results on the reprocessed CAMUS dataset demonstrate that our CU-Reg surpasses existing methods in terms of registration accuracy and efficiency, meeting the guidance requirements of clinical cardiac interventional surgery. △ Less

Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by MICCAI 2024

arXiv:2406.14132 [pdf, other]

Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

Authors: Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, HengXu He, Jia Jia

Abstract: In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing doma… ▽ More In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing domain continues to face two primary challenges: (i) how to allocate a limited budget with greater efficiency, demanding precision in predicting users' monotonic response (i.e. sensitivity) to incentives, and (ii) ensuring spatio-temporal adaptability and robustness in diverse marketing campaigns across different times and locations. To address these issues, we propose a Constrained Monotonic Adaptive Network (CoMAN) method for spatio-temporal perception within marketing pricing. Specifically, we capture spatio-temporal preferences within attribute features through two foundational spatio-temporal perception modules. To further enhance catching the user sensitivity differentials to incentives across varied times and locations, we design modules for learning spatio-temporal convexity and concavity as well as for expressing sensitivity functions. CoMAN can achieve a more efficient allocation of incentive investments during pricing, thus increasing the conversion rate and orders while maintaining budget efficiency. Extensive offline and online experimental results within our diverse marketing campaigns demonstrate the effectiveness of the proposed approach while outperforming the monotonic state-of-the-art method. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 7 pages

arXiv:2406.06644 [pdf, other]

Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

Authors: Jianhua Pei, Cheng Feng, Ping Wang, Hina Tabassum, Dongyuan Shi

Abstract: Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading… ▽ More Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading gains and noises with uncertain signal-to-noise ratios (SNRs) commonly present in wireless channels usually restrict the accuracy of semantic information transmission. Consequently, this paper constructs a latent diffusion model-enabled SemCom system, and proposes three improvements compared to existing works: i) To handle potential outliers in the source data, semantic errors obtained by projected gradient descent based on the vulnerabilities of DL models, are utilized to update the parameters and obtain an outlier-robust encoder. ii) A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter and is placed before the decoder at the receiver, enabling adaptation for out-of-distribution data and enhancing human-perceptual quality. iii) An end-to-end consistency distillation (EECD) strategy is used to distill the diffusion models trained in latent space, enabling deterministic single or few-step real-time denoising in various noisy channels while maintaining high semantic quality. Extensive numerical experiments across different datasets demonstrate the superiority of the proposed SemCom system, consistently proving its robustness to outliers, the capability to transmit data with unknown distributions, and the ability to perform real-time channel denoising tasks while preserving high human perceptual quality, outperforming the existing denoising approaches in semantic metrics. △ Less

Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.17734 [pdf, other]

Towards Efficient Disaster Response via Cost-effective Unbiased Class Rate Estimation through Neyman Allocation Stratified Sampling Active Learning

Authors: Yanbing Bai, Xinyi Wu, Lai Xu, Jihan Pei, Erick Mas, Shunichi Koshimura

Abstract: With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree… ▽ More With the rapid development of earth observation technology, we have entered an era of massively available satellite remote-sensing data. However, a large amount of satellite remote sensing data lacks a label or the label cost is too high to hinder the potential of AI technology mining satellite data. Especially in such an emergency response scenario that uses satellite data to evaluate the degree of disaster damage. Disaster damage assessment encountered bottlenecks due to excessive focus on the damage of a certain building in a specific geographical space or a certain area on a larger scale. In fact, in the early days of disaster emergency response, government departments were more concerned about the overall damage rate of the disaster area instead of single-building damage, because this helps the government decide the level of emergency response. We present an innovative algorithm that constructs Neyman stratified random sampling trees for binary classification and extends this approach to multiclass problems. Through extensive experimentation on various datasets and model structures, our findings demonstrate that our method surpasses both passive and conventional active learning techniques in terms of class rate estimation and model enhancement with only 30\%-60\% of the annotation cost of simple sampling. It effectively addresses the 'sampling bias' challenge in traditional active learning strategies and mitigates the 'cold start' dilemma. The efficacy of our approach is further substantiated through application to disaster evaluation tasks using Xview2 Satellite imagery, showcasing its practical utility in real-world contexts. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13034 [pdf, other]

Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Authors: Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen, Pengjie Ren, Pablo Cesar

Abstract: Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI a… ▽ More Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities. △ Less

Submitted 5 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted by ACL 2024

arXiv:2405.09591 [pdf, other]

A Comprehensive Survey on Data Augmentation

Authors: Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, Yuanchun Zhou

Abstract: Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certa… ▽ More Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach. △ Less

Submitted 17 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.00568 [pdf, other]

Powering In-Database Dynamic Model Slicing for Structured Data Analytics

Authors: Lingze Zeng, Naili Xing, Shaofeng Cai, Gang Chen, Beng Chin Ooi, Jian Pei, Yuncheng Wu

Abstract: Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning… ▽ More Relational database management systems (RDBMS) are widely used for the storage and retrieval of structured data. To derive insights beyond statistical aggregation, we typically have to extract specific subdatasets from the database using conventional database operations, and then apply deep neural networks (DNN) training and inference on these respective subdatasets in a separate machine learning system. The process can be prohibitively expensive, especially when there are a combinatorial number of subdatasets extracted for different analytical purposes. This calls for efficient in-database support of advanced analytical methods In this paper, we introduce LEADS, a novel SQL-aware dynamic model slicing technique to customize models for subdatasets specified by SQL queries. LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) technique and maintains inference efficiency by a SQL-aware gating network. At the core of LEADS is the construction of a general model with multiple expert sub-models via MoE trained over the entire database. This SQL-aware MoE technique scales up the modeling capacity, enhances effectiveness, and preserves efficiency by activating only necessary experts via the gating network during inference. Additionally, we introduce two regularization terms during the training process of LEADS to strike a balance between effectiveness and efficiency. We also design and build an in-database inference system, called INDICES, to support end-to-end advanced structured data analytics by non-intrusively incorporating LEADS onto PostgreSQL. Our extensive experiments on real-world datasets demonstrate that LEADS consistently outperforms baseline models, and INDICES delivers effective in-database analytics with a considerable reduction in inference latency compared to traditional solutions. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.17288 [pdf, other]

ExcluIR: Exclusionary Neural Information Retrieval

Authors: Wenhao Zhang, Mengqi Zhang, Shiguang Wu, Jiahuan Pei, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Pengjie Ren

Abstract: Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set… ▽ More Exclusion is an important and universal linguistic skill that humans use to express what they do not want. However, in information retrieval community, there is little research on exclusionary retrieval, where users express what they do not want in their queries. In this work, we investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (1) Existing retrieval models with different architectures struggle to effectively comprehend exclusionary queries; (2) Although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; (3) Generative retrieval models have a natural advantage in handling exclusionary queries. To facilitate future research on exclusionary retrieval, we share the benchmark and evaluation scripts on \url{https://github.com/zwh-sdu/ExcluIR}. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16271 [pdf]

True random number generation using metastable 1T' molybdenum ditelluride

Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizations and statistical analysis of the characteristics of the conductance noise suggest that the noise arises from the volatility of the stochastic polarization of the underlying ferroelectric dipoles in the 1T' MoTe2. Further, as proved in our experiments and indicated by our Monte Carlo simulation, the ferroelectric dipole polarization is a reliable entropy source with the stochastic polarization persistent and stable over time. Exploiting the conductance noise, we achieve the generation of true random numbers and demonstrate their use in common cryptographic applications, for example, password generation and data encryption. Besides, particularly, we show a privacy safeguarding approach to sensitive data that can be critical for the cryptography of neural networks. We believe our work will bring insights into the understanding of the metastable 1T' MoTe2 and, more importantly, underpin its great potential in secure cryptography. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.13814 [pdf, other]

doi 10.1007/JHEP06(2024)197

Discovering Quirks through Timing at FASER and Future Forward Experiments at the LHC

Authors: Jonathan L. Feng, Jinmian Li, Xufei Liao, Jian Ni, Junle Pei

Abstract: Quirks are generic predictions of strongly-coupled dark sectors. For weak-scale masses and a broad range of confining scales in the dark sector, quirks can be discovered only at the energy frontier, but quirk--anti-quirk pairs are produced with unusual signatures at low $p_T$, making them difficult to detect at the large LHC detectors. We determine the prospects for discovering quirks using timing… ▽ More Quirks are generic predictions of strongly-coupled dark sectors. For weak-scale masses and a broad range of confining scales in the dark sector, quirks can be discovered only at the energy frontier, but quirk--anti-quirk pairs are produced with unusual signatures at low $p_T$, making them difficult to detect at the large LHC detectors. We determine the prospects for discovering quirks using timing information at FASER, FASER2, and an "ultimate detector" in the far-forward region at the LHC. NLO QCD corrections are incorporated in the simulation of quirk production, which can significantly increase the production rate. To accurately propagate quirk pairs from the ATLAS interaction point to the forward detectors, the ionization energy loss of charged quirks traveling through matter, the radiation of infracolor glueballs and QCD hadrons during quirk pair oscillations, and the annihilation of quirkonium are properly considered. The quirk signal is separated from the large muon background using timing information from scintillator detectors by requiring either two coincident delayed tracks, based on arrival times at the detector, or two coincident slow tracks, based on time differences between hits in the front and back scintillators. We find that simple cuts preserve much of the signal, but reduce the muon background to negligible levels. With the data already collected, FASER can discover quirks in currently unconstrained parameter space. FASER2, running at the Forward Physics Facility during the HL-LHC era, will greatly extend this reach, probing the TeV-scale quirk masses motivated by the gauge hierarchy problem for the broad range of dark-sector confining scales between 100 eV and 100 keV. △ Less

Submitted 20 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 29 pages, 11 figures, version to appear in JHEP

arXiv:2404.10252 [pdf, other]

Learning from Offline and Online Experiences: A Hybrid Adaptive Operator Selection Framework

Authors: Jiyuan Pei, Jialin Liu, Yi Mei

Abstract: In many practical applications, usually, similar optimisation problems or scenarios repeatedly appear. Learning from previous problem-solving experiences can help adjust algorithm components of meta-heuristics, e.g., adaptively selecting promising search operators, to achieve better optimisation performance. However, those experiences obtained from previously solved problems, namely offline experi… ▽ More In many practical applications, usually, similar optimisation problems or scenarios repeatedly appear. Learning from previous problem-solving experiences can help adjust algorithm components of meta-heuristics, e.g., adaptively selecting promising search operators, to achieve better optimisation performance. However, those experiences obtained from previously solved problems, namely offline experiences, may sometimes provide misleading perceptions when solving a new problem, if the characteristics of previous problems and the new one are relatively different. Learning from online experiences obtained during the ongoing problem-solving process is more instructive but highly restricted by limited computational resources. This paper focuses on the effective combination of offline and online experiences. A novel hybrid framework that learns to dynamically and adaptively select promising search operators is proposed. Two adaptive operator selection modules with complementary paradigms cooperate in the framework to learn from offline and online experiences and make decisions. An adaptive decision policy is maintained to balance the use of those two modules in an online manner. Extensive experiments on 170 widely studied real-value benchmark optimisation problems and a benchmark set with 34 instances for combinatorial optimisation show that the proposed hybrid framework outperforms the state-of-the-art methods. Ablation study verifies the effectiveness of each component of the framework. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09231 [pdf, other]

Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms

Authors: Diandian Guo, Manxi Lin, Jialun Pei, He Tang, Yueming Jin, Pheng-Ann Heng

Abstract: A comprehensive understanding of surgical scenes allows for monitoring of the surgical process, reducing the occurrence of accidents and enhancing efficiency for medical professionals. Semantic modeling within operating rooms, as a scene graph generation (SGG) task, is challenging since it involves consecutive recognition of subtle surgical actions over prolonged periods. To address this challenge… ▽ More A comprehensive understanding of surgical scenes allows for monitoring of the surgical process, reducing the occurrence of accidents and enhancing efficiency for medical professionals. Semantic modeling within operating rooms, as a scene graph generation (SGG) task, is challenging since it involves consecutive recognition of subtle surgical actions over prolonged periods. To address this challenge, we propose a Tri-modal (i.e., images, point clouds, and language) confluence with Temporal dynamics framework, termed TriTemp-OR. Diverging from previous approaches that integrated temporal information via memory graphs, our method embraces two advantages: 1) we directly exploit bi-modal temporal information from the video streaming for hierarchical feature interaction, and 2) the prior knowledge from Large Language Models (LLMs) is embedded to alleviate the class-imbalance problem in the operating theatre. Specifically, our model performs temporal interactions across 2D frames and 3D point clouds, including a scale-adaptive multi-view temporal interaction (ViewTemp) and a geometric-temporal point aggregation (PointTemp). Furthermore, we transfer knowledge from the biomedical LLM, LLaVA-Med, to deepen the comprehension of intraoperative relations. The proposed TriTemp-OR enables the aggregation of tri-modal features through relation-aware unification to predict relations so as to generate scene graphs. Experimental results on the 4D-OR benchmark demonstrate the superior performance of our model for long-term OR streaming. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 10 pages, 4 figures, 3 tables

arXiv:2404.08908 [pdf, other]

Reference Model Based Learning in Expectation Formation: Experimental Evidence

Authors: Jiaoying Pei

Abstract: How do people form expectations about future prices in financial markets? One of the dominant learning rules that explains the forecasting behavior is the Adaptive Expectation Rule (ADA), which suggests that people adjust their predictions by adapting to the most recent prediction error at a constant weight. However, this rule also implies that they will continually learn and adapt until the predi… ▽ More How do people form expectations about future prices in financial markets? One of the dominant learning rules that explains the forecasting behavior is the Adaptive Expectation Rule (ADA), which suggests that people adjust their predictions by adapting to the most recent prediction error at a constant weight. However, this rule also implies that they will continually learn and adapt until the prediction error is zero, which contradicts recent experimental evidence showing that people usually stop learning long before reaching zero prediction error. A more recent learning rule, Reference Model Based Learning (RMBL), extends and generalizes ADA, hypothesizing that: i) People apply ADA but dynamically adjust the adaptive coefficient with regards to the auto-correlation of the prediction error in the most recent two periods; ii) Meanwhile, they also utilize a satisficing rule so that people would only adjust their adaptive coefficient when the prediction error is higher than their anticipation. This paper utilizes a rich set of experimental data with observations of 41,490 predictions from 801 subjects from the Learning-to-Forecast Experiments (LtFEs), i.e., the experiment that has been used to study expectation formation. Our results concludes that RMBL fits better than ADA in all the experiments. △ Less

Submitted 5 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2403.17676 [pdf]

Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational challenge. Herein, we analyze the feasibility of employing the nonlinearity from solution-processed molybdenum disulfide (MoS2) devices for reservoir activation. The devices, fabricated using liquid-phase exfoliated MoS2, exhibit a high-order nonlinearity achieved by Stark modulation of the MoS2 material. We demonstrate that this nonlinearity can be fitted and employed as the activation function to facilitate reservoir computing implementation. Notably, owing to the high-order nonlinearity, the network exhibits long-term synchronization and robust generalization abilities for approximating complex dynamical systems. Given the remarkable reservoir activation capability, coupled with the scalability of the device fabrication, our findings open the possibility for the physical realization of lightweight, efficient reservoir computing for, for instance, signal classification, motion tracking, and pattern recognition of complex time series as well as secure cryptography. As an example, we show the network can be appointed to generate chaotic random numbers for secure data encryption. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.17040 [pdf]

Enhancing Graph Representation Learning with Attention-Driven Spiking Neural Networks

Authors: Huifeng Yin, Mingkun Xu, Jing Pei, Lei Deng

Abstract: Graph representation learning has become a crucial task in machine learning and data mining due to its potential for modeling complex structures such as social networks, chemical compounds, and biological systems. Spiking neural networks (SNNs) have recently emerged as a promising alternative to traditional neural networks for graph learning tasks, benefiting from their ability to efficiently enco… ▽ More Graph representation learning has become a crucial task in machine learning and data mining due to its potential for modeling complex structures such as social networks, chemical compounds, and biological systems. Spiking neural networks (SNNs) have recently emerged as a promising alternative to traditional neural networks for graph learning tasks, benefiting from their ability to efficiently encode and process temporal and spatial information. In this paper, we propose a novel approach that integrates attention mechanisms with SNNs to improve graph representation learning. Specifically, we introduce an attention mechanism for SNN that can selectively focus on important nodes and corresponding features in a graph during the learning process. We evaluate our proposed method on several benchmark datasets and show that it achieves comparable performance compared to existing graph learning techniques. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16674 [pdf, other]

Understanding the Functional Roles of Modelling Components in Spiking Neural Networks

Authors: Huifeng Yin, Hanle Zheng, Jiayi Mao, Siyuan Ding, Xing Liu, Mingkun Xu, Yifan Hu, Jing Pei, Lei Deng

Abstract: Spiking neural networks (SNNs), inspired by the neural circuits of the brain, are promising in achieving high computational efficiency with biological fidelity. Nevertheless, it is quite difficult to optimize SNNs because the functional roles of their modelling components remain unclear. By designing and evaluating several variants of the classic model, we systematically investigate the functional… ▽ More Spiking neural networks (SNNs), inspired by the neural circuits of the brain, are promising in achieving high computational efficiency with biological fidelity. Nevertheless, it is quite difficult to optimize SNNs because the functional roles of their modelling components remain unclear. By designing and evaluating several variants of the classic model, we systematically investigate the functional roles of key modelling components, leakage, reset, and recurrence, in leaky integrate-and-fire (LIF) based SNNs. Through extensive experiments, we demonstrate how these components influence the accuracy, generalization, and robustness of SNNs. Specifically, we find that the leakage plays a crucial role in balancing memory retention and robustness, the reset mechanism is essential for uninterrupted temporal processing and computational efficiency, and the recurrence enriches the capability to model complex dynamics at a cost of robustness degradation. With these interesting observations, we provide optimization suggestions for enhancing the performance of SNNs in different scenarios. This work deepens the understanding of how SNNs work, which offers valuable guidance for the development of more effective and robust neuromorphic models. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.10318 [pdf, other]

Anytime Neural Architecture Search on Tabular Data

Authors: Naili Xing, Shaofeng Cai, Zhaojing Luo, Beng Chin Ooi, Jian Pei

Abstract: The increasing demand for tabular data analysis calls for transitioning from manual architecture design to Neural Architecture Search (NAS). This transition demands an efficient and responsive anytime NAS approach that is capable of returning current optimal architectures within any given time budget while progressively enhancing architecture quality with increased budget allocation. However, the… ▽ More The increasing demand for tabular data analysis calls for transitioning from manual architecture design to Neural Architecture Search (NAS). This transition demands an efficient and responsive anytime NAS approach that is capable of returning current optimal architectures within any given time budget while progressively enhancing architecture quality with increased budget allocation. However, the area of research on Anytime NAS for tabular data remains unexplored. To this end, we introduce ATLAS, the first anytime NAS approach tailored for tabular data. ATLAS introduces a novel two-phase filtering-and-refinement optimization scheme with joint optimization, combining the strengths of both paradigms of training-free and training-based architecture evaluation. Specifically, in the filtering phase, ATLAS employs a new zero-cost proxy specifically designed for tabular data to efficiently estimate the performance of candidate architectures, thereby obtaining a set of promising architectures. Subsequently, in the refinement phase, ATLAS leverages a fixed-budget search algorithm to schedule the training of the promising candidates, so as to accurately identify the optimal architecture. To jointly optimize the two phases for anytime NAS, we also devise a budget-aware coordinator that delivers high NAS performance within constraints. Experimental evaluations demonstrate that our ATLAS can obtain a good-performing architecture within any predefined time budget and return better architectures as and when a new time budget is made available. Overall, it reduces the search time on tabular data by up to 82.75x compared to existing NAS approaches. △ Less

Submitted 6 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.07333 [pdf, other]

Development of generic no-scale inflation

Authors: Lina Wu, Jin-Ke Shen, Tianjun Li, Junle Pei

Abstract: We develop generalized no-scale supergravity models of inflation, and then study the corresponding cosmological predictions as well as the formation of primordial black holes (PBHs) and scalar-induced gravitational waves (SIGWs). With a new parameter $0<α\leq 1$, the $α$-generalized no-scale supergravity provides the continuous connections among the generic no-scale supergravity from string theory… ▽ More We develop generalized no-scale supergravity models of inflation, and then study the corresponding cosmological predictions as well as the formation of primordial black holes (PBHs) and scalar-induced gravitational waves (SIGWs). With a new parameter $0<α\leq 1$, the $α$-generalized no-scale supergravity provides the continuous connections among the generic no-scale supergravity from string theory compactifications. The resulting prediction of the CMB, spectrum index $n_s$, and tensor-to-scalar ratio $r$ can be highly consistent with the latest Planck/BICEP/Keck Array observations. Notably, the models with $α\neq 1$ give a smaller ratio $r\leq 10^{-3}$, which is flexible even under the anticipated tighter observational constraints at the future experiments. Additionally, these models have the potential to generate a broad-band stochastic gravitational wave background, and thus explain the NANOGrav 15yr signal. Furthermore, they predict the formation of PBHs with various mass scales, which could account for a significant portion of dark matter relic density in the Universe. △ Less

Submitted 2 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: v2: minor modifications with updated references and benchmark points for PBH production; 19 pages, 12 figures, 2 tables. Comments are welcome

arXiv:2403.06356 [pdf, other]

Video Generation with Consistency Tuning

Authors: Chaoyi Wang, Yaozhe Song, Yafeng Zhang, Jun Pei, Lijie Xia, Jianpo Liu

Abstract: Currently, various studies have been exploring generation of long videos. However, the generated frames in these videos often exhibit jitter and noise. Therefore, in order to generate the videos without these noise, we propose a novel framework composed of four modules: separate tuning module, average fusion module, combined tuning module, and inter-frame consistency module. By applying our newly… ▽ More Currently, various studies have been exploring generation of long videos. However, the generated frames in these videos often exhibit jitter and noise. Therefore, in order to generate the videos without these noise, we propose a novel framework composed of four modules: separate tuning module, average fusion module, combined tuning module, and inter-frame consistency module. By applying our newly proposed modules subsequently, the consistency of the background and foreground in each video frames is optimized. Besides, the experimental results demonstrate that videos generated by our method exhibit a high quality in comparison of the state-of-the-art methods. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.05782 [pdf, ps, other]

Symmetric cumulant $sc_{2,4} \left \{ 4 \right \}$ and asymmetric cumulant $ac_{2} \left \{ 3 \right \}$ from transverse momentum conservation and flow

Authors: Jia-Lin Pei, Guo-Liang Ma, Adam Bzdak

Abstract: The multi-particle cumulants method can be used to reveal long-range collectivity in small and large colliding systems. The four-particle symmetric cumulant $sc_{2,4} \left \{ 4 \right \}$, three-particle asymmetric cumulant $ac_{2} \left \{ 3 \right \}$, and the normalised cumulants $nsc_{2,4} \left \{ 4 \right \}$ and $nac_{2} \left \{ 3 \right \}$ from the transverse momentum conservation and f… ▽ More The multi-particle cumulants method can be used to reveal long-range collectivity in small and large colliding systems. The four-particle symmetric cumulant $sc_{2,4} \left \{ 4 \right \}$, three-particle asymmetric cumulant $ac_{2} \left \{ 3 \right \}$, and the normalised cumulants $nsc_{2,4} \left \{ 4 \right \}$ and $nac_{2} \left \{ 3 \right \}$ from the transverse momentum conservation and flow are calculated. The interplay between the two effects is also investigated. Our results are in good agreement with the recent ATLAS measurements of multi-particle azimuthal correlations with the subevent cumulant method, which provides insight into the origin of collective flow in small systems. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 17 pages, 4 figures

arXiv:2403.01582 [pdf, other]

Selection, Ensemble, and Adaptation: Advancing Multi-Source-Free Domain Adaptation via Architecture Zoo

Authors: Jiangbo Pei, Ruizhe Li, Aidong Men, Yang Liu, Xiahai Zhuang, Qingchao Chen

Abstract: Conventional Multi-Source Free Domain Adaptation (MSFDA) assumes that each source domain provides a single source model, and all source models adopt a uniform architecture. This paper introduces Zoo-MSFDA, a more general setting that allows each source domain to offer a zoo of multiple source models with different architectures. While it enriches the source knowledge, Zoo-MSFDA risks being dominat… ▽ More Conventional Multi-Source Free Domain Adaptation (MSFDA) assumes that each source domain provides a single source model, and all source models adopt a uniform architecture. This paper introduces Zoo-MSFDA, a more general setting that allows each source domain to offer a zoo of multiple source models with different architectures. While it enriches the source knowledge, Zoo-MSFDA risks being dominated by suboptimal/harmful models. To address this issue, we theoretically analyze the model selection problem in Zoo-MSFDA, and introduce two principles: transferability principle and diversity principle. Recognizing the challenge of measuring transferability, we subsequently propose a novel Source-Free Unsupervised Transferability Estimation (SUTE). It enables assessing and comparing transferability across multiple source models with different architectures under domain shift, without requiring target labels and source data. Based on above, we introduce a Selection, Ensemble, and Adaptation (SEA) framework to address Zoo-MSFDA, which consists of: 1) source models selection based on the proposed principles and SUTE; 2) ensemble construction based on SUTE-estimated transferability; 3) target-domain adaptation of the ensemble model. Evaluations demonstrate that our SEA framework, with the introduced Zoo-MSFDA setting, significantly improves adaptation performance (e.g., 13.5% on DomainNet). Additionally, our SUTE achieves state-of-the-art performance in transferability estimation. △ Less

Submitted 23 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.17263 [pdf, other]

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

Authors: Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei

Abstract: Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring pre-trained large language models (LLMs), especially as the models' scale and the diversity of tasks increase. Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional, i.e., significant model changes can be represented with relatively few parameters. However, decreasing the… ▽ More Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring pre-trained large language models (LLMs), especially as the models' scale and the diversity of tasks increase. Low-rank adaptation (LoRA) is based on the idea that the adaptation process is intrinsically low-dimensional, i.e., significant model changes can be represented with relatively few parameters. However, decreasing the rank encounters challenges with generalization errors for specific tasks when compared to full-parameter fine-tuning. We present MELoRA, a mini-ensemble low-rank adapters that uses fewer trainable parameters while maintaining a higher rank, thereby offering improved performance potential. The core idea is to freeze original pretrained weights and train a group of mini LoRAs with only a small number of parameters. This can capture a significant degree of diversity among mini LoRAs, thus promoting better generalization ability. We conduct a theoretical analysis and empirical studies on various NLP tasks. Our experimental results show that, compared to LoRA, MELoRA achieves better performance with 8 times fewer trainable parameters on natural language understanding tasks and 36 times fewer trainable parameters on instruction following tasks, which demonstrates the effectiveness of MELoRA. △ Less

Submitted 24 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: ACL2024

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2402.16908 [pdf]

Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics

Authors: Lekai Song, Pengyu Liu, Jingfang Pei, Yang Liu, Songwei Liu, Shengbo Wang, Leonard W. T. Ng, Tawfique Hasan, Kong-Pang Pun, Shuo Gao, Guohua Hu

Abstract: The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech… ▽ More The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in developing lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond. △ Less

Submitted 20 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.14461 [pdf, other]

S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR

Authors: Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng

Abstract: Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on the multi-stage learning that generates semantic scene graphs dependent on intermediate processes with pose estimation and object detection, which may compromise model efficiency and efficacy, also impose extra… ▽ More Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on the multi-stage learning that generates semantic scene graphs dependent on intermediate processes with pose estimation and object detection, which may compromise model efficiency and efficacy, also impose extra annotation burden. In this study, we introduce a novel single-stage bimodal transformer framework for SGG in the OR, termed S^2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S^2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3% Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. The code will be made available. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13973 [pdf, other]

Linear-Time Graph Neural Networks for Scalable Recommendations

Authors: Jiahao Zhang, Rui Xue, Wenqi Fan, Xin Xu, Qing Li, Jian Pei, Xiaorui Liu

Abstract: In an era of information explosion, recommender systems are vital tools to deliver personalized recommendations for users. The key of recommender systems is to forecast users' future behaviors based on previous user-item interactions. Due to their strong expressive power of capturing high-order connectivities in user-item interaction data, recent years have witnessed a rising interest in leveragin… ▽ More In an era of information explosion, recommender systems are vital tools to deliver personalized recommendations for users. The key of recommender systems is to forecast users' future behaviors based on previous user-item interactions. Due to their strong expressive power of capturing high-order connectivities in user-item interaction data, recent years have witnessed a rising interest in leveraging Graph Neural Networks (GNNs) to boost the prediction performance of recommender systems. Nonetheless, classic Matrix Factorization (MF) and Deep Neural Network (DNN) approaches still play an important role in real-world large-scale recommender systems due to their scalability advantages. Despite the existence of GNN-acceleration solutions, it remains an open question whether GNN-based recommender systems can scale as efficiently as classic MF and DNN methods. In this paper, we propose a Linear-Time Graph Neural Network (LTGNN) to scale up GNN-based recommender systems to achieve comparable scalability as classic MF approaches while maintaining GNNs' powerful expressiveness for superior prediction accuracy. Extensive experiments and ablation studies are presented to validate the effectiveness and scalability of the proposed algorithm. Our implementation based on PyTorch is available. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 12 pages, 5 figures, accepted by The Web Conference 2024

arXiv:2402.11514 [pdf, other]

High Quality Microscopic Nuclear Masses of Superheavy Nuclei

Authors: Dawei Guan, Junchen Pei

Abstract: To synthesize new superheavy elements, the accurate prediction of nuclear masses of superheavy nuclei is essential for calculations of reaction $Q$ values, neutron separation energies and $α$-decay energies, which are important for estimating beam energies, survival probabilities and also for identifications. In this work, we include existing $α$-decay energies of superheavy nuclei in the fitting… ▽ More To synthesize new superheavy elements, the accurate prediction of nuclear masses of superheavy nuclei is essential for calculations of reaction $Q$ values, neutron separation energies and $α$-decay energies, which are important for estimating beam energies, survival probabilities and also for identifications. In this work, we include existing $α$-decay energies of superheavy nuclei in the fitting procedure of extended Skyrme density functionals as corresponding nuclear masses are not available. Systematic $α$-decay energies are well reproduced with deviations smaller than 0.2 MeV. The high quality $α$-decay energies make it feasible for direct identification of new elements and new isotopes. The resulting binding energies in the heaviest region are surprisingly close to the inferences by AME2020. Our work should be useful for guiding experimental synthesis of new elements 119 and 120. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures

arXiv:2402.03483 [pdf, other]

SWAG: Storytelling With Action Guidance

Authors: Zeeshan Patel, Karim El-Refai, Jonathan Pei, Tianle Li

Abstract: Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop: one LLM generates story co… ▽ More Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop: one LLM generates story content, and another auxiliary LLM is used to choose the next best "action" to steer the story's future direction. Our results show that SWAG can substantially outperform previous end-to-end story generation techniques when evaluated by GPT-4 and through human evaluation, and our SWAG pipeline using only open-source models surpasses GPT-3.5-Turbo. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.16696 [pdf, ps, other]

Properties of chiral nucleon-nucleon interaction at N$^3$LO with high cutoffs studied by local projection

Authors: Haoyu Shang, Rongzhe Hu, Junchen Pei, Furong Xu

Abstract: The chiral nucleon-nucleon ($NN$) interaction at high cutoffs has been plagued by the presence of spurious bound states. In this work, the chiral $NN$ interaction at N$^3$LO is studied by the local projection method as the cutoff increases. The evolution of short-range behaviors of pion-exchange interactions and contact interactions is intuitively demonstrated. The $P$-channel potentials toward hi… ▽ More The chiral nucleon-nucleon ($NN$) interaction at high cutoffs has been plagued by the presence of spurious bound states. In this work, the chiral $NN$ interaction at N$^3$LO is studied by the local projection method as the cutoff increases. The evolution of short-range behaviors of pion-exchange interactions and contact interactions is intuitively demonstrated. The $P$-channel potentials toward high cutoffs appear to be erratic at short ranges to compromise with phase shifts, while such erratic behaviors can be avoided in $S$ and $D$ channels. Furthermore, a chiral $NN$ interaction at N$^3$LO is studied at a cutoff of 700 MeV. The properties of deuteron and triton are testified with this interaction. Such a hard interaction is expected to provide an alternative choice for studies of short-range correlations and high density nuclear matter. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 16 pages, 12 figures

arXiv:2401.16194 [pdf]

Macroscopic electro-optical modulation of solution-processed molybdenum disulfide

Authors: Songwei Liu, Yingyi Wen, Jingfang Pei, Xiaoyue Fan, Yongheng Zhou, Yang Liu, Ling-Kiu Ng, Yue Lin, Teng Ma, Panpan Zhang, Xiaolong Chen, Gang Wang, Guohua Hu

Abstract: Molybdenum disulfide (MoS2) has drawn great interest for tunable photonics and optoelectronics advancement. Its solution processing, though scalable, results in randomly networked ensembles of discrete nanosheets with compromised properties for tunable device fabrication. Here, we show via density-functional theory calculations that the electronic structure of the individual solution-processed nan… ▽ More Molybdenum disulfide (MoS2) has drawn great interest for tunable photonics and optoelectronics advancement. Its solution processing, though scalable, results in randomly networked ensembles of discrete nanosheets with compromised properties for tunable device fabrication. Here, we show via density-functional theory calculations that the electronic structure of the individual solution-processed nanosheets can be modulated by external electric fields collectively. Particularly, the nanosheets can form Stark ladders, leading to variations in the underlying optical transition processes and thus, tunable macroscopic optical properties of the ensembles. We experimentally confirm the macroscopic electro-optical modulation employing solution-processed thin-films of MoS2 and ferroelectric P(VDF-TrFE), and prove that the localized polarization fields of P(VDF-TrFE) can modulate the optical properties of MoS2, specifically, the optical absorption and photoluminescence on a macroscopic scale. Given the scalability of solution processing, our results underpin the potential of electro-optical modulation of solution-processed MoS2 for scalable tunable photonics and optoelectronics. As an illustrative example, we successfully demonstrate solution-processed electro-absorption modulators. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Manuscript 14 pages, 5 figures. Supplementary Materials 10 pages 5 figures

arXiv:2401.14702 [pdf, other]

doi 10.1109/TKDE.2023.3306378

FairSample: Training Fair and Accurate Graph Convolutional Neural Networks Efficiently

Authors: Zicun Cong, Shi Baoxu, Shan Li, Jaewon Yang, Qi He, Jian Pei

Abstract: Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the ch… ▽ More Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the challenge of training fair and accurate GCNs efficiently. We present an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs. Our insights lead to FairSample, a framework that jointly mitigates the three types of biases. We employ two intuitive strategies to rectify graph structures. First, we inject edges across nodes that are in different sensitive groups but similar in node features. Second, to enhance model fairness and retain model quality, we develop a learnable neighbor sampling policy using reinforcement learning. To address the bias in node features and model parameters, FairSample is complemented by a regularization objective to optimize fairness. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: Accepted by TKDE 2023

arXiv:2401.10806 [pdf, ps, other]

DeepRLI: A Multi-objective Framework for Universal Protein--Ligand Interaction Prediction

Authors: Haoyu Lin, Shiwei Wang, Jintao Zhu, Yibo Li, Jianfeng Pei, Luhua Lai

Abstract: Protein (receptor)--ligand interaction prediction is a critical component in computer-aided drug design, significantly influencing molecular docking and virtual screening processes. Despite the development of numerous scoring functions in recent years, particularly those employing machine learning, accurately and efficiently predicting binding affinities for protein--ligand complexes remains a for… ▽ More Protein (receptor)--ligand interaction prediction is a critical component in computer-aided drug design, significantly influencing molecular docking and virtual screening processes. Despite the development of numerous scoring functions in recent years, particularly those employing machine learning, accurately and efficiently predicting binding affinities for protein--ligand complexes remains a formidable challenge. Most contemporary methods are tailored for specific tasks, such as binding affinity prediction, binding pose prediction, or virtual screening, often failing to encompass all aspects. In this study, we put forward DeepRLI, a novel protein--ligand interaction prediction architecture. It encodes each protein--ligand complex into a fully connected graph, retaining the integrity of the topological and spatial structure, and leverages the improved graph transformer layers with cosine envelope as the central module of the neural network, thus exhibiting superior scoring power. In order to equip the model to generalize to conformations beyond the confines of crystal structures and to adapt to molecular docking and virtual screening tasks, we propose a multi-objective strategy, that is, the model outputs three scores for scoring and ranking, docking, and screening, and the training process optimizes these three objectives simultaneously. For the latter two objectives, we augment the dataset through a docking procedure, incorporate suitable physics-informed blocks and employ an effective contrastive learning approach. Eventually, our model manifests a balanced performance across scoring, ranking, docking, and screening, thereby demonstrating its ability to handle a range of tasks. Overall, this research contributes a multi-objective framework for universal protein--ligand interaction prediction, augmenting the landscape of structure-based drug design. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2401.03538 [pdf, other]

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

Authors: Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang

Abstract: Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with lingu… ▽ More Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with linguistic representations obtained from Text-to-Speech (TTS) systems, enabling training of the accent voice conversion model on non-parallel data. Furthermore, we investigate the effectiveness of a pretraining strategy on native data and different acoustic features within our proposed framework. We conduct a comprehensive evaluation using both subjective and objective metrics to assess the performance of our approach. The evaluation results highlight the benefits of the pretraining strategy and the incorporation of richer semantic features, resulting in significantly enhanced audio quality and intelligibility. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.01059 [pdf, other]

Accelerating Discovery of Novel and Bioactive Ligands With Pharmacophore-Informed Generative Models

Authors: Weixin Xie, Jianhang Zhang, Qin Xie, Chaojun Gong, Youjun Xu, Luhua Lai, Jianfeng Pei

Abstract: Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by ge… ▽ More Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by generative models lies in their ability to produce compounds that are both bioactive and novel, rather than merely making minor modifications to known actives present in the training set. Recognizing the utility of pharmacophores in facilitating scaffold hopping, we developed TransPharmer, an innovative generative model that integrates ligand-based interpretable pharmacophore fingerprints with generative pre-training transformer (GPT) for de novo molecule generation. TransPharmer demonstrates superior performance across tasks involving unconditioned distribution learning, de novo generation and scaffold elaboration under pharmacophoric constraints. Its distinct exploration mode within the local chemical space renders it particularly useful for scaffold hopping, producing compounds that are structurally novel while pharmaceutically related. The efficacy of TransPharmer is validated through two case studies involving the dopamine receptor D2 (DRD2) and polo-like kinase 1 (PLK1). Notably in the case of PLK1, three out of four synthesized designed compounds exhibit submicromolar activities, with the most potent one, IIP0943, demonstrating a potency of 5.1 nM. Featuring a new scaffold of 4-(benzo[b]thiophen-7-yloxy)pyrimidine, IIP0943 also exhibits high selectivity for PLK1. It was demonstrated that TransPharmer is a powerful tool for discovery of novel and bioactive ligands. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.17621 [pdf, other]

Special Relativistic Covariant Fluctuation Theorems

Authors: Ji-Hui Pei, Jin-Fu Chen, H. T. Quan

Abstract: Fluctuation theorems establish connections between fluctuations and irreversibility by considering stochastic thermodynamic quantities. In this study, we derive special relativistic covariant fluctuation theorems by defining covariant work, heat, and entropy. We focus on a driven scalar field in contact with a Markovian heat bath. For moving inertial observers relative to the heat bath, both the e… ▽ More Fluctuation theorems establish connections between fluctuations and irreversibility by considering stochastic thermodynamic quantities. In this study, we derive special relativistic covariant fluctuation theorems by defining covariant work, heat, and entropy. We focus on a driven scalar field in contact with a Markovian heat bath. For moving inertial observers relative to the heat bath, both the energy components and the momentum components of work and heat must be included to formulate the corresponding fluctuation theorems, and the four-velocity of the heat bath plays an important role. It turns out that, the irreversibility is characterized by the conventional thermodynamic quantities in the rest reference frame of the heat bath, regardless of the reference frame of the observer. Even in the nonrelativistic case, the above identification is nontrivial. We study the work statistics for a Klein-Gordon field in a driving process measured by a moving inertial observer to explicitly verify the covariant version of the Jarzynski equality. △ Less

Submitted 10 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.15809 [pdf, other]

doi 10.1109/ROBIO58561.2023.10354958

A Closed-Loop Multi-perspective Visual Servoing Approach with Reinforcement Learning

Authors: Lei Zhang, Jiacheng Pei, Kaixin Bai, Zhaopeng Chen, Jianwei Zhang

Abstract: Traditional visual servoing methods suffer from serving between scenes from multiple perspectives, which humans can complete with visual signals alone. In this paper, we investigated how multi-perspective visual servoing could be solved under robot-specific constraints, including self-collision, singularity problems. We presented a novel learning-based multi-perspective visual servoing framework,… ▽ More Traditional visual servoing methods suffer from serving between scenes from multiple perspectives, which humans can complete with visual signals alone. In this paper, we investigated how multi-perspective visual servoing could be solved under robot-specific constraints, including self-collision, singularity problems. We presented a novel learning-based multi-perspective visual servoing framework, which iteratively estimates robot actions from latent space representations of visual states using reinforcement learning. Furthermore, our approaches were trained and validated in a Gazebo simulation environment with connection to OpenAI/Gym. Through simulation experiments, we showed that our method can successfully learn an optimal control policy given initial images from different perspectives, and it outperformed the Direct Visual Servoing algorithm with mean success rate of 97.0%. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)

arXiv:2312.09563 [pdf, other]

Variational Quantum Domain Adaptation

Authors: Chunhui Wu, Junhao Pei, Yihua Wu, Shengmei Zhao

Abstract: Quantum machine learning is an important application of quantum computing in the era of noisy intermediate-scale quantum devices. Domain adaptation is an effective method for addressing the distribution discrepancy problem between the training data and the real data when the neural network model is deployed. In this paper, a variational quantum domain adaptation method is proposed by using a quant… ▽ More Quantum machine learning is an important application of quantum computing in the era of noisy intermediate-scale quantum devices. Domain adaptation is an effective method for addressing the distribution discrepancy problem between the training data and the real data when the neural network model is deployed. In this paper, a variational quantum domain adaptation method is proposed by using a quantum convolutional neural network, together with a gradient reversal module, and two quantum fully connected layers, named variational quantum domain adaptation(VQDA). The simulations on the local computer and IBM Quantum Experience (IBM Q) platform by Qiskit show the effectiveness of the proposed method. The results demonstrate that, compared to its classical corresponding domain adaptation method, VQDA achieves an average improvement of 4% on the accuracy for MNIST to USPS domain transfer under the same parameter scales. Similarly, for SYNDigits to SVHN domain transfer, VQDA achieves an average improvement of 2% on the accuracy under the same parameter scales. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 9 pages,9 figures

arXiv:2312.04346 [pdf, other]

Improved Efficient Two-Stage Denoising Diffusion Power System Measurement Recovery Against False Data Injection Attacks and Data Losses

Authors: Jianhua Pei, Jingyu Wang, Dongyuan Shi, Ping Wang

Abstract: Measurement uncertainties, represented by cyber-attacks and data losses, seriously degrade the quality of power system measurements. Fortunately, the powerful generation ability of the denoising diffusion models can enable more precise measurement generation for power system data recovery. However, the controllable data generation and efficient computing methods of denoising diffusion models for d… ▽ More Measurement uncertainties, represented by cyber-attacks and data losses, seriously degrade the quality of power system measurements. Fortunately, the powerful generation ability of the denoising diffusion models can enable more precise measurement generation for power system data recovery. However, the controllable data generation and efficient computing methods of denoising diffusion models for deterministic trajectory still need further investigation. To this end, this paper proposes an improved two-stage denoising diffusion model (TSDM) to identify and reconstruct the measurements with various measurement uncertainties. The first stage of the model comprises a classifier-guided conditional anomaly detection component, while the second stage involves diffusion-based measurement imputation component. Moreover, the proposed TSDM adopts precise means and optimal variances to accelerate the diffusion generation process with subsequence sampling. Extensive numerical case studies demonstrate that the proposed TSDM can accurately recover power system measurements despite strong randomness under renewable energy integration and highly nonlinear dynamics under complex cyber-physical contingencies. Additionally, the proposed TSDM has stronger robustness compared to existing reconstruction networks and exhibits lower computational complexity than general denoising diffusion models. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.15486 [pdf, other]

Detection prospects of long-lived quirk pairs at the LHC far detectors

Authors: Jinmian Li, Xufei Liao, Jian Ni, Junle Pei

Abstract: We examine the sensitivity reaches of several LHC far detectors, such as FASER2, MATHUSLA, ANUBIS, SND@LHC, and FACET, to five simplified quirk scenarios. We include the next-to-leading order QCD corrections in our simulation of quirk events, which enhance the total production rate and increase the fraction of events in the forward direction for most cases. We calculate the time scales for the qui… ▽ More We examine the sensitivity reaches of several LHC far detectors, such as FASER2, MATHUSLA, ANUBIS, SND@LHC, and FACET, to five simplified quirk scenarios. We include the next-to-leading order QCD corrections in our simulation of quirk events, which enhance the total production rate and increase the fraction of events in the forward direction for most cases. We calculate the time scales for the quirk pair to lose energy through radiations and for the quirk pair annihilation. Our results show that these far detectors can offer promising probes to the quirk scenario, complementing the searches at the main detectors. Especially, FACET and FASER2 detectors can surpass the majority of searches conducted at the LHC main detector, with the exception of the HSCP search, for the color-neutral quirk $\mathcal{E}$. △ Less

Submitted 29 April, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 21 pages, 11 figures

arXiv:2311.15201 [pdf, other]

DiffBindFR: An SE(3) Equivariant Network for Flexible Protein-Ligand Docking

Authors: Jintao Zhu, Zhonghui Gu, Jianfeng Pei, Luhua Lai

Abstract: Molecular docking, a key technique in structure-based drug design, plays pivotal roles in protein-ligand interaction modeling, hit identification and optimization, in which accurate prediction of protein-ligand binding mode is essential. Conventional docking approaches perform well in redocking tasks with known protein binding pocket conformation in the complex state. However, in real-world dockin… ▽ More Molecular docking, a key technique in structure-based drug design, plays pivotal roles in protein-ligand interaction modeling, hit identification and optimization, in which accurate prediction of protein-ligand binding mode is essential. Conventional docking approaches perform well in redocking tasks with known protein binding pocket conformation in the complex state. However, in real-world docking scenario without knowing the protein binding conformation for a new ligand, accurately modeling the binding complex structure remains challenging as flexible docking is computationally expensive and inaccurate. Typical deep learning-based docking methods do not explicitly consider protein side chain conformations and fail to ensure the physical plausibility and detailed atomic interactions. In this study, we present DiffBindFR, a full-atom diffusion-based flexible docking model that operates over the product space of ligand overall movements and flexibility and pocket side chain torsion changes. We show that DiffBindFR has higher accuracy in producing native-like binding structures with physically plausible and detailed interactions than available docking methods. Furthermore, in the Apo and AlphaFold2 modeled structures, DiffBindFR demonstrates superior advantages in accurate ligand binding pose and protein binding conformation prediction, making it suitable for Apo and AlphaFold2 structure-based drug design. DiffBindFR provides a powerful flexible docking tool for modeling accurate protein-ligand binding structures. △ Less

Submitted 19 December, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

Showing 1–50 of 305 results for author: Pei, J