subscribe to arXiv mailings

arXiv:2407.10458 [pdf, other]

Predicting doping strategies for ternary nickel-cobalt-manganese cathode materials to enhance battery performance using graph neural networks

Authors: Zirui Zhao, Dong Luo, Shuxing Wu, Kaitong Sun, Zhan Lin, Hai-Feng Li

Abstract: The exceptional electrochemical performance of lithium-ion batteries has spurred considerable interest in advanced battery technologies, particularly those utilizing ternary nickel-cobalt-manganese (NCM) cathode materials, which are renowned for their robust electrochemical performance and structural stability. Building upon this research, investigators have explored doping additional elements int… ▽ More The exceptional electrochemical performance of lithium-ion batteries has spurred considerable interest in advanced battery technologies, particularly those utilizing ternary nickel-cobalt-manganese (NCM) cathode materials, which are renowned for their robust electrochemical performance and structural stability. Building upon this research, investigators have explored doping additional elements into NCM cathode materials to further enhance their electrochemical performance and structural integrity. However, the multitude of doping strategies available for NCM battery systems presents a challenge in determining the most effective approach. In this study, we elucidate the potential of ternary NCM systems as cathode materials for lithium-ion batteries. We compile a comprehensive database of lithium-ion batteries employing NCM systems from various sources of prior research and develop a corresponding data-driven model utilizing graph neural networks to predict optimal doping strategies. Our aim is to provide insights into the NCM-based battery systems for both fundamental understanding and practical applications. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08554 [pdf, other]

Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.07289 [pdf, other]

Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensation, resulting in poor performance in the case of a video sequence including large motion. In this paper, we propose a Deformable Feature Alignment and Refinement (DFAR) method based on deformable convolution to explicitly use motion context in both the training and inference stages. Specifically, a Temporal Deformable Alignment (TDA) module based on the designed Dilated Convolution Attention Fusion (DCAF) block is developed to explicitly align the adjacent frames with the current frame at the feature level. Then, the feature refinement module adaptively fuses the aligned features and further aggregates useful spatio-temporal information by means of the proposed Attention-guided Deformable Fusion (AGDF) block. In addition, to improve the alignment of adjacent frames with the current frame, we extend the traditional loss function by introducing a new motion compensation loss. Extensive experimental results demonstrate that the proposed DFAR method achieves the state-of-the-art performance on two benchmark datasets including DAUB and IRDST. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.03980 [pdf, other]

Practical asynchronous measurement-device-independent quantum key distribution with advantage distillation

Authors: Di Luo, Xin Liu, Kaibiao Qin, Zhenrong Zhang, Kejin Wei

Abstract: The advantage distillation (AD) method has proven effective in improving the performance of quantum key distribution (QKD). In this paper, we introduce the AD method into a recently proposed asynchronous measurement-device-independent (AMDI) QKD protocol, taking finite-key effects into account. Simulation results show that the AD method significantly enhances AMDIQKD, e.g., extending the transmiss… ▽ More The advantage distillation (AD) method has proven effective in improving the performance of quantum key distribution (QKD). In this paper, we introduce the AD method into a recently proposed asynchronous measurement-device-independent (AMDI) QKD protocol, taking finite-key effects into account. Simulation results show that the AD method significantly enhances AMDIQKD, e.g., extending the transmission distance by 16 km with a total pulse count of N = 7.24*10^13, and enables AMDI-QKD, previously unable to generate keys, to generate keys with a misalignment error rate of 10%. As the AD method can be directly integrated into the current system through refined post-processing, our results facilitate the practical implementation of AMDI-QKD in various applications, particularly in scenarios with high channel losses and misalignment errors. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 13 pages, 5 figures

arXiv:2407.03900 [pdf, other]

Oracle Bone Inscriptions Multi-modal Dataset

Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories. Therefore, this paper proposes an Oracle Bone Inscriptions Multi-modal Dataset(OBIMD), which includes annotation information for 10,077 pieces of oracle bones. Each piece has two modalities: pixel-level aligned rubbings and facsimiles. The dataset annotates the detection boxes, character categories, transcriptions, corresponding inscription groups, and reading sequences in the groups of each oracle bone character, providing a comprehensive and high-quality level of annotations. This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on. We believe that the creation and publication of a dataset like this will help significantly advance the application of AI algorithms in the field of OBI research. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02328 [pdf, other]

Efficient Sparse Attention needs Adaptive Token Release

Authors: Chaoran Zhang, Lixin Zou, Dan Luo, Min Tang, Xiangyang Luo, Zihao Li, Chenliang Li

Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and reb… ▽ More In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and rebuild the necessary key-value states. Particularly, we accomplish this by a lightweight controller module to approximate an ideal top-$K$ sparse attention. This module retains the tokens with the highest top-$K$ attention weights and simultaneously rebuilds the discarded but necessary tokens, which may become essential for future decoding. Comprehensive experiments in natural language generation and modeling reveal that our method is not only competitive with full attention in terms of performance but also achieves a significant throughput improvement of up to 221.8%. The code for replication is available on the https://github.com/WHUIR/ADORE. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2024(Findings)

arXiv:2407.00614 [pdf, other]

Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we propose a granularity-aware affordance feature extraction method for locating functional affordance areas and predicting dexterous coarse gestures. We study the intrinsic mechanisms of human tool use. On one hand, we use fine-grained affordance features of object-functional finger contact areas to locate functional affordance regions. On the other hand, we use highly activated coarse-grained affordance features in hand-object interaction regions to predict grasp gestures. Additionally, we introduce a model-based post-processing module that includes functional finger coordinate localization, finger-to-end coordinate transformation, and force feedback-based coarse-to-fine grasping. This forms a complete dexterous robotic functional grasping framework GAAF-Dex, which learns Granularity-Aware Affordances from human-object interaction for tool-based Functional grasping in Dexterous Robotics. Unlike fully-supervised methods that require extensive data annotation, we employ a weakly supervised approach to extract relevant cues from exocentric (Exo) images of hand-object interactions to supervise feature extraction in egocentric (Ego) images. We have constructed a small-scale dataset, FAH, which includes near 6K images of functional hand-object interaction Exo- and Ego images of 18 commonly used tools performing 6 tasks. Extensive experiments on the dataset demonstrate our method outperforms state-of-the-art methods. The code will be made publicly available at https://github.com/yangfan293/GAAF-DEX. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

arXiv:2406.18284 [pdf, other]

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-quality facial renderings in real-time performance. In this paper, we propose a novel generalized audio-driven framework RealTalk, which consists of an audio-to-expression transformer and a high-fidelity expression-to-face renderer. In the first component, we consider both identity and intra-personal variation features related to speaking lip movements. By incorporating cross-modal attention on the enriched facial priors, we can effectively align lip movements with audio, thus attaining greater precision in expression prediction. In the second component, we design a lightweight facial identity alignment (FIA) module which includes a lip-shape control structure and a face texture reference structure. This novel design allows us to generate fine details in real-time, without depending on sophisticated and inefficient feature alignment modules. Our experimental results, both quantitative and qualitative, on public datasets demonstrate the clear advantages of our method in terms of lip-speech synchronization and generation quality. Furthermore, our method is efficient and requires fewer computational resources, making it well-suited to meet the needs of practical applications. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17645 [pdf, other]

Simulating moiré quantum matter with neural network

Authors: Di Luo, David D. Dai, Liang Fu

Abstract: Moiré materials provide an ideal platform for exploring quantum phases of matter. However, solving the many-electron problem in moiré systems is challenging due to strong correlation effects. We introduce a powerful variational representation of quantum states, many-body neural Bloch wavefunction, to solve many-electron problems in moiré materials accurately and efficiently. Applying our method to… ▽ More Moiré materials provide an ideal platform for exploring quantum phases of matter. However, solving the many-electron problem in moiré systems is challenging due to strong correlation effects. We introduce a powerful variational representation of quantum states, many-body neural Bloch wavefunction, to solve many-electron problems in moiré materials accurately and efficiently. Applying our method to the semiconductor heterobilayer WSe2/WS2 , we obtain a generalized Wigner crystal at filling factor n = 1/3, a Mott insulator n = 1, and a correlated insulator with local magnetic moments and antiferromagnetic spin correlation at n = 2. Our neural network approach improves the simulation accuracy of strongly interacting moiré materials and paves the way for discovery of new quantum phases with variational learning principle in a unified framework. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.13495 [pdf, other]

DF40: Toward Next-Generation Deepfake Detection

Authors: Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu

Abstract: We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass"… ▽ More We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass" for navigating SoTA detectors. But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? If not, what underlying factors contribute to this gap? In this work, we found the dataset (both train and test) can be the "primary culprit" due to: (1) forgery diversity: Deepfake techniques are commonly referred to as both face forgery (face-swapping and face-reenactment) and entire image synthesis (AIGC). Most existing datasets only contain partial types, with limited forgery methods implemented; (2) forgery realism: The dominant training dataset, FF++, contains old forgery techniques from the past five years. "Honing skills" on these forgeries makes it difficult to guarantee effective detection of nowadays' SoTA deepfakes; (3) evaluation protocol: Most detection works perform evaluations on one type, e.g., train and test on face-swapping only, which hinders the development of universal deepfake detectors. To address this dilemma, we construct a highly diverse and large-scale deepfake dataset called DF40, which comprises 40 distinct deepfake techniques. We then conduct comprehensive evaluations using 4 standard evaluation protocols and 7 representative detectors, resulting in over 2,000 evaluations. Through these evaluations, we analyze from various perspectives, leading to 12 new insightful findings contributing to the field. We also open up 5 valuable yet previously underexplored research questions to inspire future works. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11781 [pdf, other]

DiffMM: Multi-Modal Diffusion Model for Recommendation

Authors: Yangqin Jiang, Lianghao Xia, Wei Wei, Da Luo, Kangyi Lin, Chao Huang

Abstract: The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniq… ▽ More The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniques to enhance recommender systems. However, these methods often rely on simplistic random augmentation or intuitive cross-view information, which can introduce irrelevant noise and fail to accurately align the multi-modal context with user-item interaction modeling. To fill this research gap, we propose a novel multi-modal graph diffusion model for recommendation called DiffMM. Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning. This integration facilitates better alignment between multi-modal feature information and collaborative relation modeling. Our approach leverages diffusion models' generative capabilities to automatically generate a user-item graph that is aware of different modalities, facilitating the incorporation of useful multi-modal knowledge in modeling user-item interactions. We conduct extensive experiments on three public datasets, consistently demonstrating the superiority of our DiffMM over various competitive baselines. For open-sourced model implementation details, you can access the source codes of our proposed framework at: https://github.com/HKUDS/DiffMM . △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11643 [pdf, other]

AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyMaker, an innovative zero-shot object customization framework capable of generating general objects with high ID fidelity and flexible text editability. The efficacy of AnyMaker stems from its novel general ID extraction, dual-level ID injection, and ID-aware decoupling. Specifically, the general ID extraction module extracts sufficient ID information with an ensemble of self-supervised models to tackle the diverse customization tasks for general objects. Then, to provide the diffusion UNet with the extracted ID as much while not damaging the text editability in the generation process, we design a global-local dual-level ID injection module, in which the global-level semantic ID is injected into text descriptions while the local-level ID details are injected directly into the model through newly added cross-attention modules. In addition, we propose an ID-aware decoupling module to disentangle ID-related information from non-ID elements in the extracted representations for high-fidelity generation of both identity and text descriptions. To validate our approach and boost the research of general object customization, we create the first large-scale general ID dataset, Multi-Category ID-Consistent (MC-IDC) dataset, with 315k text-image samples and 10k categories. Experiments show that AnyMaker presents remarkable performance in general object customization and outperforms specialized methods in corresponding tasks. Code and dataset will be released soon. △ Less

Submitted 5 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10425 [pdf, other]

doi 10.1145/3637528.3671829

Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling

Authors: Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, Suhang Wang

Abstract: In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph s… ▽ More In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph structures further complicate this problem, rendering previous MSUDA approaches less effective. In this work, we present the framework Selective Multi-source Adaptation for Graph ({\method}), with a graph-modeling-based domain selector, a sub-graph node selector, and a bi-level alignment objective for the adaptation. Concretely, to facilitate the identification of informative source data, the similarity across graphs is disentangled and measured with the transferability of a graph-modeling task set, and we use it as evidence for source domain selection. A node selector is further incorporated to capture the variation in transferability of nodes within the same source domain. To learn invariant features for adaptation, we align the target domain to selected source data both at the embedding space by minimizing the optimal transport distance and at the classification level by distilling the label function. Modules are explicitly learned to select informative source data and conduct the alignment in virtual training splits with a meta-learning strategy. Experimental results on five graph datasets show the effectiveness of the proposed method. △ Less

Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

arXiv:2406.07362 [pdf, other]

AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment. △ Less

Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 12 pages

arXiv:2406.07167 [pdf, ps, other]

On the pathwise uniqueness of stochastic 2D Euler equations with Kraichnan noise and $L^p$-data

Authors: Shuaijie Jiao, Dejun Luo

Abstract: In the recent work [arXiv:2308.03216], Coghi and Maurelli proved pathwise uniqueness of solutions to the vorticity form of stochastic 2D Euler equation, with Kraichnan transport noise and initial data in $L^1\cap L^p$ for $p>3/2$. The aim of this note is to remove the constraint on $p$, showing that pathwise uniqueness holds for all $L^1\cap L^p$ initial data with arbitrary $p>1$. In the recent work [arXiv:2308.03216], Coghi and Maurelli proved pathwise uniqueness of solutions to the vorticity form of stochastic 2D Euler equation, with Kraichnan transport noise and initial data in $L^1\cap L^p$ for $p>3/2$. The aim of this note is to remove the constraint on $p$, showing that pathwise uniqueness holds for all $L^1\cap L^p$ initial data with arbitrary $p>1$. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages

arXiv:2406.04000 [pdf, other]

Stochastic logic in biased coupled photonic probabilistic bits

Authors: Michael Horodynski, Charles Roques-Carmes, Yannick Salamin, Seou Choi, Jamison Sloan, Di Luo, Marin Soljačić

Abstract: Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experi… ▽ More Optical computing often employs tailor-made hardware to implement specific algorithms, trading generality for improved performance in key aspects like speed and power efficiency. An important computing approach that is still missing its corresponding optical hardware is probabilistic computing, used e.g. for solving difficult combinatorial optimization problems. In this study, we propose an experimentally viable photonic approach to solve arbitrary probabilistic computing problems. Our method relies on the insight that coherent Ising machines composed of coupled and biased optical parametric oscillators can emulate stochastic logic. We demonstrate the feasibility of our approach by using numerical simulations equivalent to the full density matrix formulation of coupled optical parametric oscillators. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00132 [pdf, other]

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić

Abstract: We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for com… ▽ More We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.20081 [pdf, other]

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Authors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

Abstract: Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading… ▽ More Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training strategies, including supervised fine-tuning and reinforcement learning. Further, NoiseBoost pioneerly enables semi-supervised learning for MLLMs, unleashing the power of unlabeled data. Comprehensive experiments demonstrate that NoiseBoost improves dense caption accuracy by 8.1% with human evaluation and achieves comparable results with 50% of the data by mining unlabeled data. Code and models are available at https://kaiwu5.github.io/noiseboost. △ Less

Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures with supplementary material

arXiv:2405.19925 [pdf, other]

Integrated Sensing and Communications Framework for 6G Networks

Authors: Hongliang Luo, Tengyu Zhang, Chuanbin Zhao, Yucong Wang, Bo Lin, Yuhua Jiang, Dongqi Luo, Feifei Gao

Abstract: In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) sc… ▽ More In this paper, we propose a novel integrated sensing and communications (ISAC) framework for the sixth generation (6G) mobile networks, in which we decompose the real physical world into static environment, dynamic targets, and various object materials. The ubiquitous static environment occupies the vast majority of the physical world, for which we design static environment reconstruction (SER) scheme to obtain the layout and point cloud information of static buildings. The dynamic targets floating in static environments create the spatiotemporal transition of the physical world, for which we design comprehensive dynamic target sensing (DTS) scheme to detect, estimate, track, image and recognize the dynamic targets in real-time. The object materials enrich the electromagnetic laws of the physical world, for which we develop object material recognition (OMR) scheme to estimate the electromagnetic coefficient of the objects. Besides, to integrate these sensing functions into existing communications systems, we discuss the interference issues and corresponding solutions for ISAC cellular networks. Furthermore, we develop an ISAC hardware prototype platform that can reconstruct the environmental maps and sense the dynamic targets while maintaining communications services. With all these designs, the proposed ISAC framework can support multifarious emerging applications, such as digital twins, low altitude economy, internet of vehicles, marine management, deformation monitoring, etc. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16558 [pdf, other]

Experimental Refrence-Frame-Independent Quantum Key Distribution over 250 km of Optical Fiber

Authors: Xin Liu, Di Luo, Zhicheng Luo, Shizhuo Li, Zhenrong Zhang, Kejin Wei

Abstract: The reference-frame-independent quantum key distribution (RFI-QKD) protocol enables QKD systems to function effectively despite slowly varying reference frames, offering a distinct advantage in practical scenarios, particularly in mobile platforms. In this study, we successfully distribute secure key bits over a 250 km optical fiber distance by developing an RFI-QKD system with a repetition rate o… ▽ More The reference-frame-independent quantum key distribution (RFI-QKD) protocol enables QKD systems to function effectively despite slowly varying reference frames, offering a distinct advantage in practical scenarios, particularly in mobile platforms. In this study, we successfully distribute secure key bits over a 250 km optical fiber distance by developing an RFI-QKD system with a repetition rate of 150 MHz. Benefiting from high repetition rate, we achieve a finite-key secret key rate of 49.65 bit/s at a distance of 200 km, which is more than three times higher than state-of-the-art systems. Our work dramatically extends the transmission distance and enhances the secret key rate of RFI-QKD, significantly promoting its practical application. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 9 pages,4 figures

arXiv:2405.15287 [pdf, other]

StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models

Authors: Chengming Xu, Kai Hu, Donghao Luo, Jiangning Zhang, Wei Li, Yanhao Ge, Chengjie Wang

Abstract: Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion (SD), which tries to solve the previous problems such as insufficient style and inconsistent semantics. The enhancement lies in two novel module, namely multi-sourc… ▽ More Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion (SD), which tries to solve the previous problems such as insufficient style and inconsistent semantics. The enhancement lies in two novel module, namely multi-source style embedder and dynamic attention adapter. In order to provide SD with better style embeddings, we propose the multi-source style embedder considers both global and local level visual information along with textual one, which provide both complementary style-related and semantic-related knowledge. Additionally, aiming for better balance between the adaptor capacity and semantic control, the proposed dynamic attention adapter is applied to the diffusion UNet in which adaptation weights are dynamically calculated based on the style embeddings. Two objective functions are introduced to optimize the model together with denoising loss, which can further enhance semantic and style consistency. Extensive experiments demonstrate the superiority of StyleMaster over existing methods, rendering images with variable target styles while successfully maintaining the semantic information from the text prompts. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14646 [pdf, other]

Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

Authors: Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

Abstract: The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks.… ▽ More The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks. To address the problem, we introduce AdvEval, a novel black-box adversarial framework against NLG evaluators. AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators. Specifically, inspired by the recent success of large language models (LLMs) in text generation and evaluation, we adopt strong LLMs as both the data generator and gold evaluator. Adversarial data are automatically optimized with feedback from the gold and victim evaluator. We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation. The results show that AdvEval can lead to significant performance degradation of various victim metrics, thereby validating its efficacy. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: ACL24 Finding

arXiv:2405.13810 [pdf, other]

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Authors: Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

Abstract: Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individua… ▽ More Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individual series into separate variate tokens. The former method faces challenges in learning variate-centric representations, while the latter risks missing essential temporal information critical for accurate forecasting. In our work, we introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions based on a vanilla Transformer. We regard the input time series data as a grid, where the $x$-axis represents the time steps and the $y$-axis represents the variates. A vertical slicing of this grid combines the variates at each time step into a \textit{time token}, while a horizontal slicing embeds the individual series across all time steps into a \textit{variate token}. Correspondingly, a \textit{horizontal attention mechanism} focuses on time tokens to comprehend the correlations between data at various time steps, while a \textit{vertical}, variate-aware \textit{attention} is employed to grasp multivariate correlations. This combination enables efficient processing of information across both time and variate dimensions, thereby enhancing the model's analytical strength. % We also integrate the patch technique, segmenting time tokens into subseries-level patches, ensuring that local semantic information is retained in the embedding. The GridTST model consistently delivers state-of-the-art performance across various real-world datasets. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.09308 [pdf, other]

TimeX++: Learning Time-Series Explanations with Information Bottleneck

Authors: Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng Luo

Abstract: Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To add… ▽ More Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To address these issues, we introduce a simple yet practical objective function for time series explainable learning. The design of the objective function builds upon the principle of information bottleneck (IB), and modifies the IB objective function to avoid trivial solutions and distributional shift issues. We further present TimeX++, a novel explanation framework that leverages a parametric network to produce explanation-embedded instances that are both in-distributed and label-preserving. We evaluate TimeX++ on both synthetic and real-world datasets comparing its performance against leading baselines, and validate its practical efficacy through case studies in a real-world environmental application. Quantitative and qualitative evaluations show that TimeX++ outperforms baselines across all datasets, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{https://github.com/zichuan-liu/TimeXplusplus}. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Machine Learning (ICML 2024)

arXiv:2405.01045 [pdf, ps, other]

Well-posedness of stochastic mSQG equations with Kraichnan noise and $L^p$ data

Authors: Shuaijie Jiao, Dejun Luo

Abstract: We consider stochastic mSQG (modified Surface Quasi-Geostrophic) equations with multiplicative transport noise of Kraichnan type, and $L^p$-initial conditions. Inspired by the recent work of Coghi and Maurelli [arXiv:2308.03216], we show weak existence and pathwise uniqueness of solutions to the equations for suitable choices of parameters in the nonlinearity, the noise and the integrability of in… ▽ More We consider stochastic mSQG (modified Surface Quasi-Geostrophic) equations with multiplicative transport noise of Kraichnan type, and $L^p$-initial conditions. Inspired by the recent work of Coghi and Maurelli [arXiv:2308.03216], we show weak existence and pathwise uniqueness of solutions to the equations for suitable choices of parameters in the nonlinearity, the noise and the integrability of initial data. △ Less

Submitted 30 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 33 pages. We have updated the relation of $β_N$ and $β_L$ in Lemma 2.2, following Proposition 2.7 in arXiv:2308.03216v2. Moreover, we have simplified the statements of Theorem 1.4, covering slightly wider range of parameters

arXiv:2404.18884 [pdf, ps, other]

Reputation in Repeated Global Games of Regime Change with Exit

Authors: Daniel Luo

Abstract: I study a repeated binary-action supermodular game with endogenous exit where many short-lived agents attempt to coordinate a revolt against a regime. The regime undertakes costly actions to increase the short-run players' coordination frictions, though acts only after if the revolt is unsuccessful, inducing a lack-of-commitment problem. In the complete-information repeated game, a folk theorem ho… ▽ More I study a repeated binary-action supermodular game with endogenous exit where many short-lived agents attempt to coordinate a revolt against a regime. The regime undertakes costly actions to increase the short-run players' coordination frictions, though acts only after if the revolt is unsuccessful, inducing a lack-of-commitment problem. In the complete-information repeated game, a folk theorem holds, with payoff multiplicity arising due to both the regime's dynamic incentives and agents' stage-game strategic complementarities. Neither the regime's reputational incentives nor belief dispersion among agents (via global-games type uncertainty) alone meaningfully refine the equilibrium payoff set. Together, though, the interaction between these two forces uniquely select the regime's highest payoff in equilibrium. Furthermore, under a Markov refinement, they select a unique equilibrium where the regime plays their optimal commitment action. Methodologically, I develop tools to analyze repeated games with endogenous exit where the regime's commitment action flexibly varies with their discount rate. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.16826 [pdf, other]

Successive Convexification for Trajectory Optimization with Continuous-Time Constraint Satisfaction

Authors: Purnanand Elango, Dayou Luo, Abhinav G. Kamath, Samet Uzun, Taewan Kim, Behçet Açıkmeşe

Abstract: We present successive convexification, a real-time-capable solution method for nonconvex trajectory optimization, with continuous-time constraint satisfaction and guaranteed convergence, that only requires first-order information. The proposed framework combines several key methods to solve a large class of nonlinear optimal control problems: (i) exterior penalty-based reformulation of the path co… ▽ More We present successive convexification, a real-time-capable solution method for nonconvex trajectory optimization, with continuous-time constraint satisfaction and guaranteed convergence, that only requires first-order information. The proposed framework combines several key methods to solve a large class of nonlinear optimal control problems: (i) exterior penalty-based reformulation of the path constraints; (ii) generalized time-dilation; (iii) multiple-shooting discretization; (iv) $\ell_1$ exact penalization of the nonconvex constraints; and (v) the prox-linear method, a sequential convex programming (SCP) algorithm for convex-composite minimization. The reformulation of the path constraints enables continuous-time constraint satisfaction even on sparse discretization grids and obviates the need for mesh refinement heuristics. Through the prox-linear method, we guarantee convergence of the solution method to stationary points of the penalized problem and guarantee that the converged solutions that are feasible with respect to the discretized and control-parameterized optimal control problem are also Karush-Kuhn-Tucker (KKT) points. Furthermore, we highlight the specialization of this property to global minimizers of convex optimal control problems, wherein the reformulated path constraints cannot be represented by canonical cones, i.e., in the form required by existing convex optimization solvers. In addition to theoretical analysis, we demonstrate the effectiveness and real-time capability of the proposed framework with numerical examples based on popular optimal control applications: dynamic obstacle avoidance and rocket landing. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16740 [pdf, other]

Calculable neutrino Dirac mass matrix and one-loop $\bar θ$ in the minimal left-right symmetric model

Authors: Gang Li, Ding-Yi Luo, Xiang Zhao

Abstract: We revisit the contribution to the strong CP parameter $\bar θ$ from leptonic CP violation at one-loop level in the minimal left-right symmetric model in the case of parity as the left-right symmetry. The Hermitian neutrino Dirac mass matrix $M_D$ can be calculated using the light and heavy neutrino masses and mixings. We propose a parameterization of the right-handed neutrino mixing matrix $V_R$… ▽ More We revisit the contribution to the strong CP parameter $\bar θ$ from leptonic CP violation at one-loop level in the minimal left-right symmetric model in the case of parity as the left-right symmetry. The Hermitian neutrino Dirac mass matrix $M_D$ can be calculated using the light and heavy neutrino masses and mixings. We propose a parameterization of the right-handed neutrino mixing matrix $V_R$ and construct the heavy neutrino mass that maintains the Hermiticity of $M_D$. We further apply it to evaluate the one-loop $\barθ$, denoted as $\bar θ_{loop}$, as a function of the sterile neutrino masses for explicit examples of $V_R$. By requiring the magnitude of $\bar θ_{loop}\lesssim 10^{-10}$, we derive the upper limits on the sterile neutrino masses, which are within reach of direct searches at the Large Hadron Collider and neutrinoless double beta decay experiments. Furthermore, our parameterization is applicable to other phenomenological studies. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 6 pages, 1 figure

arXiv:2404.16227 [pdf, other]

Optimal entanglement generation in optomechanical systems via Krotov control of covariance matrix dynamics

Authors: Peng-Ju Chen, Da-Wei Luo, Ting Yu

Abstract: We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that… ▽ More We investigated the optimal control of a continuous variable system, focusing on entanglement generation in an optomechanical system without utilizing Fock basis cutoffs. Using the Krotov algorithm to optimize the dynamics of the covariance matrix, we illustrated how to design a control objective function to manipulate the dynamics of the system to generate a desirable target state. We showed that entanglement between the macroscopic mechanical mirror and the quantum optical cavity can be reliably generated through imposing the control on the detuning of the external laser field. It has be shown that the control may be still achieved when imposing spectral constraints on the external field to restrict it to low-frequency components. In addition, we systematically studies the effects of quantum control on non-Markovian open system dynamics. We observed that memory effects can play a beneficial role in mitigating the detrimental impact of environmental noises. Specifically, the entanglement generated shows reduced decay in the presence of these memory effects. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 10 pages, 5 figures

arXiv:2404.15354 [pdf, other]

Elevating Spectral GNNs through Enhanced Band-pass Filter Approximation

Authors: Guoming Li, Jian Yang, Shangsong Liang, Dongsheng Luo

Abstract: Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, pr… ▽ More Spectral Graph Neural Networks (GNNs) have attracted great attention due to their capacity to capture patterns in the frequency domains with essential graph filters. Polynomial-based ones (namely poly-GNNs), which approximately construct graph filters with conventional or rational polynomials, are routinely adopted in practice for their substantial performances on graph learning tasks. However, previous poly-GNNs aim at achieving overall lower approximation error on different types of filters, e.g., low-pass and high-pass, but ignore a key question: \textit{which type of filter warrants greater attention for poly-GNNs?} In this paper, we first show that poly-GNN with a better approximation for band-pass graph filters performs better on graph learning tasks. This insight further sheds light on critical issues of existing poly-GNNs, i.e., those poly-GNNs achieve trivial performance in approximating band-pass graph filters, hindering the great potential of poly-GNNs. To tackle the issues, we propose a novel poly-GNN named TrigoNet. TrigoNet constructs different graph filters with novel trigonometric polynomial, and achieves leading performance in approximating band-pass graph filters against other polynomials. By applying Taylor expansion and deserting nonlinearity, TrigoNet achieves noticeable efficiency among baselines. Extensive experiments show the advantages of TrigoNet in both accuracy performances and efficiency. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.13762 [pdf, other]

doi 10.1364/JOSAB.522819

Jaynes-Cummings atoms coupled to a structured environment: Leakage elimination operators and the Petz recovery maps

Authors: Da-Wei Luo, Ting Yu

Abstract: We consider the Jaynes-Cummings (JC) model embedded in a structured environment, where the atom inside an optical cavity will be affected by a hierarchical environment consisting of the cavity and its environment. We propose several effective strategies to control and suppress the decoherence effects to protect the quantum coherence of the JC atom. We study the non-perturbative control of the syst… ▽ More We consider the Jaynes-Cummings (JC) model embedded in a structured environment, where the atom inside an optical cavity will be affected by a hierarchical environment consisting of the cavity and its environment. We propose several effective strategies to control and suppress the decoherence effects to protect the quantum coherence of the JC atom. We study the non-perturbative control of the system dynamics by means of the leakage elimination operators. We also investigate a full quantum state reversal scheme by engineering the system and its coupling to the bath via the Petz recovery map. Our findings conclude that, with the Petz recovery map, the dynamics of the JC atom can be fully recovered regardless of Markov or non-Markovian noises. Finally, we show that our quantum control and recovery methods are effective at protecting different aspects of the system coherence. △ Less

Submitted 3 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures

Journal ref: Journal of the Optical Society of America B Vol. 41, Issue 8, pp. C112-C119 (2024)

arXiv:2404.12659 [pdf, ps, other]

SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Authors: Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu

Abstract: In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is… ▽ More In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12322 [pdf, other]

Generalizable Face Landmarking Guided by Conditional Face Warping

Authors: Jiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo

Abstract: As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of faci… ▽ More As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of facial styles and the scarcity of labeled stylized faces. In this study, we propose a simple but effective paradigm to learn a generalizable face landmarker based on labeled real human faces and unlabeled stylized faces. Our method learns the face landmarker as the key module of a conditional face warper. Given a pair of real and stylized facial images, the conditional face warper predicts a warping field from the real face to the stylized one, in which the face landmarker predicts the ending points of the warping field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images. Applying an alternating optimization strategy, we learn the face landmarker to minimize $i)$ the discrepancy between the stylized faces and the warped real ones and $ii)$ the prediction errors of both real and pseudo landmarks. Experiments on various datasets show that our method outperforms existing state-of-the-art domain adaptation methods in face landmarking tasks, leading to a face landmarker with better generalizability. Code is available at https://plustwo0.github.io/project-face-landmarker. △ Less

Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted in CVPR 2024

arXiv:2404.11449 [pdf, other]

AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Authors: Meng Jiang, Yi Jing Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu, Bing Xiang Yang

Abstract: Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including… ▽ More Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10771 [pdf, other]

TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

Authors: Zhuo Chen, Jacob McCarran, Esteban Vizcaino, Marin Soljačić, Di Luo

Abstract: Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent var… ▽ More Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the $\textit{Time-Evolving Natural Gradient (TENG)}$, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving $\textit{machine precision}$ in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation. △ Less

Submitted 3 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Report number: MIT-CTP/5706

arXiv:2404.08288 [pdf, other]

Istanbul Flower Auction: The Need for Speed

Authors: Isa Hafalir, Donglai Luo, Cong Tao

Abstract: We examine the unique format of the Istanbul Flower Auction and compare it to traditional Dutch and English auctions, emphasizing the need to auction large volumes rapidly. In a model with time costs, we study how this auction format, which cleverly combines Dutch and English auction mechanisms, manages time costs by dynamically adapting to initial bidding behaviors. Our numerical analysis conside… ▽ More We examine the unique format of the Istanbul Flower Auction and compare it to traditional Dutch and English auctions, emphasizing the need to auction large volumes rapidly. In a model with time costs, we study how this auction format, which cleverly combines Dutch and English auction mechanisms, manages time costs by dynamically adapting to initial bidding behaviors. Our numerical analysis considers specific time cost functions and reveals the high performance of the Istanbul Flower Auction in comparison to standard auction formats, in terms of both auctioneer and bidder utilities. This work highlights the critical role of auction design in improving social welfare, particularly in scenarios demanding the quick sale of numerous lots. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 35 pages, 8 figures, working paper

arXiv:2404.04559 [pdf, ps, other]

Spectral GNN via Two-dimensional (2-D) Graph Convolution

Authors: Guoming Li, Jian Yang, Shangsong Liang, Dongsheng Luo

Abstract: Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph learning. As an essential part of spectral GNNs, spectral graph convolution extracts crucial frequency information in graph data, leading to superior performance of spectral GNNs in downstream tasks. However, in this paper, we show that existing spectral GNNs remain critical drawbacks in performing the spectral graph c… ▽ More Spectral Graph Neural Networks (GNNs) have achieved tremendous success in graph learning. As an essential part of spectral GNNs, spectral graph convolution extracts crucial frequency information in graph data, leading to superior performance of spectral GNNs in downstream tasks. However, in this paper, we show that existing spectral GNNs remain critical drawbacks in performing the spectral graph convolution. Specifically, considering the spectral graph convolution as a construction operation towards target output, we prove that existing popular convolution paradigms cannot construct the target output with mild conditions on input graph signals, causing spectral GNNs to fall into suboptimal solutions. To address the issues, we rethink the spectral graph convolution from a more general two-dimensional (2-D) signal convolution perspective and propose a new convolution paradigm, named 2-D graph convolution. We prove that 2-D graph convolution unifies existing graph convolution paradigms, and is capable to construct arbitrary target output. Based on the proposed 2-D graph convolution, we further propose ChebNet2D, an efficient and effective GNN implementation of 2-D graph convolution through applying Chebyshev interpolation. Extensive experiments on benchmark datasets demonstrate both effectiveness and efficiency of the ChebNet2D. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2404.02465 [pdf, other]

DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to a Cryo-EM Map

Authors: Deng Luo, Zainab Alsuwaykit, Dawar Khan, Ondřej Strnad, Tobias Isenberg, Ivan Viola

Abstract: We introduce DiffFit, a differentiable algorithm for fitting protein atomistic structures into experimental reconstructed Cryo-Electron Microscopy (cryo-EM) volume map. This process is essential in structural biology to semi-automatically reconstruct large meso-scale models of complex protein assemblies and complete cellular structures that are based on measured cryo-EM data. Current approaches re… ▽ More We introduce DiffFit, a differentiable algorithm for fitting protein atomistic structures into experimental reconstructed Cryo-Electron Microscopy (cryo-EM) volume map. This process is essential in structural biology to semi-automatically reconstruct large meso-scale models of complex protein assemblies and complete cellular structures that are based on measured cryo-EM data. Current approaches require manual fitting in 3D that already results in approximately aligned structures followed by an automated fine-tuning of the alignment. With our DiffFit approach, we enable domain scientists to automatically fit new structures and visualize the fitting results for inspection and interactive revision. Our fitting begins with differentiable 3D rigid transformations of the protein atom coordinates, followed by sampling the density values at its atom coordinates from the target cryo-EM volume. To ensure a meaningful correlation between the sampled densities and the protein structure, we propose a novel loss function based on a multi-resolution volume-array approach and the exploitation of the negative space. Such loss function serves as a critical metric for assessing the fitting quality, ensuring both fitting accuracy and improved visualization of the results. We assessed the placement quality of DiffFit with several large, realistic datasets and found its quality to be superior to that of previous methods. We further evaluated our method in two use cases. First, we demonstrate its use in the process of automating the integration of known composite structures into larger protein complexes. Second, we show that it facilitates the fitting of predicted protein domains into volume densities to aid researchers in the identification of unknown proteins. We open-sourced (github.com/nanovis/DiffFitViewer) DiffFit as a plugin in ChimeraX. All supplemental materials are available at osf.io/5tx4q. △ Less

Submitted 3 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: 16 pages, 7 figures, 3 tables, submitted to IEEE VIS 2024

arXiv:2404.02268 [pdf, other]

doi 10.1038/s41467-024-48923-9

Multi-Objective Bayesian Active Learning for MeV-ultrafast electron diffraction

Authors: Fuhao Ji, Auralee Edelen, Ryan Roussel, Xiaozhe Shen, Sara Miskovich, Stephen Weathersby, Duan Luo, Mianzhen Mo, Patrick Kramer, Christopher Mayes, Mohamed A. K. Othman, Emilio Nanni, Xijie Wang, Alexander Reid, Michael Minitti, Robert Joel England

Abstract: Ultrafast electron diffraction using MeV energy beams(MeV-UED) has enabled unprecedented scientific opportunities in the study of ultrafast structural dynamics in a variety of gas, liquid and solid state systems. Broad scientific applications usually pose different requirements for electron probe properties. Due to the complex, nonlinear and correlated nature of accelerator systems, electron beam… ▽ More Ultrafast electron diffraction using MeV energy beams(MeV-UED) has enabled unprecedented scientific opportunities in the study of ultrafast structural dynamics in a variety of gas, liquid and solid state systems. Broad scientific applications usually pose different requirements for electron probe properties. Due to the complex, nonlinear and correlated nature of accelerator systems, electron beam property optimization is a time-taking process and often relies on extensive hand-tuning by experienced human operators. Algorithm based efficient online tuning strategies are highly desired. Here, we demonstrate multi-objective Bayesian active learning for speeding up online beam tuning at the SLAC MeV-UED facility. The multi-objective Bayesian optimization algorithm was used for efficiently searching the parameter space and mapping out the Pareto Fronts which give the trade-offs between key beam properties. Such scheme enables an unprecedented overview of the global behavior of the experimental system and takes a significantly smaller number of measurements compared with traditional methods such as a grid scan. This methodology can be applied in other experimental scenarios that require simultaneously optimizing multiple objectives by explorations in high dimensional, nonlinear and correlated systems. △ Less

Submitted 3 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Journal ref: Nat Commun 15, 4726 (2024)

arXiv:2403.17712 [pdf, other]

Invisible Gas Detection: An RGB-Thermal Cross Attention Network and A New Benchmark

Authors: Jue Wang, Yuxiang Lin, Qi Zhao, Dong Luo, Shuaibao Chen, Wei Chen, Xiaojiang Peng

Abstract: The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the lo… ▽ More The widespread use of various chemical gases in industrial processes necessitates effective measures to prevent their leakage during transportation and storage, given their high toxicity. Thermal infrared-based computer vision detection techniques provide a straightforward approach to identify gas leakage areas. However, the development of high-quality algorithms has been challenging due to the low texture in thermal images and the lack of open-source datasets. In this paper, we present the RGB-Thermal Cross Attention Network (RT-CAN), which employs an RGB-assisted two-stream network architecture to integrate texture information from RGB images and gas area information from thermal images. Additionally, to facilitate the research of invisible gas detection, we introduce Gas-DB, an extensive open-source gas detection database including about 1.3K well-annotated RGB-thermal images with eight variant collection scenes. Experimental results demonstrate that our method successfully leverages the advantages of both modalities, achieving state-of-the-art (SOTA) performance among RGB-thermal methods, surpassing single-stream SOTA models in terms of accuracy, Intersection of Union (IoU), and F2 metrics by 4.86%, 5.65%, and 4.88%, respectively. The code and data will be made available soon. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.12468 [pdf, other]

CrossTune: Black-Box Few-Shot Classification with Label Enhancement

Authors: Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li

Abstract: Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks. One approach is to treat these models as black boxes and use forward passes (Inference APIs) to interact with them. Current research focuses on adapting these black-box models to downstream tasks using gradien… ▽ More Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks. One approach is to treat these models as black boxes and use forward passes (Inference APIs) to interact with them. Current research focuses on adapting these black-box models to downstream tasks using gradient-free prompt optimization, but this often involves an expensive process of searching task-specific prompts. Therefore, we are motivated to study black-box language model adaptation without prompt search. Specifically, we introduce a label-enhanced cross-attention network called CrossTune, which models the semantic relatedness between the input text sequence and task-specific label descriptions. Its effectiveness is examined in the context of few-shot text classification. To improve the generalization of CrossTune, we utilize ChatGPT to generate additional training data through in-context learning. A switch mechanism is implemented to exclude low-quality ChatGPT-generated data. Through extensive experiments on seven benchmark text classification datasets, we demonstrate that our proposed approach outperforms the previous state-of-the-art gradient-free black-box tuning method by 5.7% on average. Even without using ChatGPT-augmented data, CrossTune performs better or comparably than previous black-box tuning methods, suggesting the effectiveness of our approach. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-Coling 2024

arXiv:2403.07061 [pdf, other]

Simulating Meson Scattering on Spin Quantum Simulators

Authors: Elizabeth R. Bennewitz, Brayden Ware, Alexander Schuckert, Alessio Lerose, Federica M. Surace, Ron Belyansky, William Morong, De Luo, Arinjoy De, Kate S. Collins, Or Katz, Christopher Monroe, Zohreh Davoudi, Alexey V. Gorshkov

Abstract: Studying high-energy collisions of composite particles, such as hadrons and nuclei, is an outstanding goal for quantum simulators. However, preparation of hadronic wave packets has posed a significant challenge, due to the complexity of hadrons and the precise structure of wave packets. This has limited demonstrations of hadron scattering on quantum simulators to date. Observations of confinement… ▽ More Studying high-energy collisions of composite particles, such as hadrons and nuclei, is an outstanding goal for quantum simulators. However, preparation of hadronic wave packets has posed a significant challenge, due to the complexity of hadrons and the precise structure of wave packets. This has limited demonstrations of hadron scattering on quantum simulators to date. Observations of confinement and composite excitations in quantum spin systems have opened up the possibility to explore scattering dynamics in spin models. In this article, we develop two methods to create entangled spin states corresponding to wave packets of composite particles in analog quantum simulators of Ising spin Hamiltonians. One wave-packet preparation method uses the blockade effect enabled by beyond-nearest-neighbor Ising spin interactions. The other method utilizes a quantum-bus-mediated exchange, such as the native spin-phonon coupling in trapped-ion arrays. With a focus on trapped-ion simulators, we numerically benchmark both methods and show that high-fidelity wave packets can be achieved in near-term experiments. We numerically study scattering of wave packets for experimentally realizable parameters in the Ising model and find inelastic-scattering regimes, corresponding to particle production in the scattering event, with prominent and distinct experimental signals. Our proposal, therefore, demonstrates the potential of observing inelastic scattering in near-term quantum simulators. △ Less

Submitted 13 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 18 pages, 4 main figures, 2 supplementary figures

arXiv:2403.06168 [pdf, other]

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

Authors: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

Abstract: Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matti… ▽ More Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matting factory with high accurate annotations 2). be well-compatible with community LoRAs or various conditional control approaches to achieve the community-friendly art design and controllable generation. Specifically, inspired by green-screen-matting, we aim to teach the diffusion model to paint on a fixed green screen canvas. To this end, a large-scale greenscreen dataset (Green100K) is collected as a training dataset for DiffuMatting. Secondly, a green background control loss is proposed to keep the drawing board as a pure green color to distinguish the foreground and background. To ensure the synthesized object has more edge details, a detailed-enhancement of transition boundary loss is proposed as a guideline to generate objects with more complicated edge structures. Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder. Our DiffuMatting shows several potential applications (e.g., matting-data generator, community-friendly art design and controllable generation). As a matting-data generator, DiffuMatting synthesizes general object and portrait matting sets, effectively reducing the relative MSE error by 15.4% in General Object Matting and 11.4% in Portrait Matting tasks. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.06013 [pdf, other]

Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape

Authors: Tiejin Chen, Wenwang Huang, Linsey Pang, Dongsheng Luo, Hua Wei

Abstract: This paper delves into the critical area of deep learning robustness, challenging the conventional belief that classification robustness and explanation robustness in image classification systems are inherently correlated. Through a novel evaluation approach leveraging clustering for efficient assessment of explanation robustness, we demonstrate that enhancing explanation robustness does not neces… ▽ More This paper delves into the critical area of deep learning robustness, challenging the conventional belief that classification robustness and explanation robustness in image classification systems are inherently correlated. Through a novel evaluation approach leveraging clustering for efficient assessment of explanation robustness, we demonstrate that enhancing explanation robustness does not necessarily flatten the input loss landscape with respect to explanation loss - contrary to flattened loss landscapes indicating better classification robustness. To deeply investigate this contradiction, a groundbreaking training method designed to adjust the loss landscape with respect to explanation loss is proposed. Through the new training method, we uncover that although such adjustments can impact the robustness of explanations, they do not have an influence on the robustness of classification. These findings not only challenge the prevailing assumption of a strong correlation between the two forms of robustness but also pave new pathways for understanding relationship between loss landscape and explanation loss. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.04785 [pdf, other]

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Authors: Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

Abstract: Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from… ▽ More Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.04731 [pdf, other]

Photonic probabilistic machine learning using quantum vacuum noise

Authors: Seou Choi, Yannick Salamin, Charles Roques-Carmes, Rumen Dangovski, Di Luo, Zhuo Chen, Michael Horodynski, Jamison Sloan, Shiekh Zia Uddin, Marin Soljacic

Abstract: Probabilistic machine learning utilizes controllable sources of randomness to encode uncertainty and enable statistical modeling. Harnessing the pure randomness of quantum vacuum noise, which stems from fluctuating electromagnetic fields, has shown promise for high speed and energy-efficient stochastic photonic elements. Nevertheless, photonic computing hardware which can control these stochastic… ▽ More Probabilistic machine learning utilizes controllable sources of randomness to encode uncertainty and enable statistical modeling. Harnessing the pure randomness of quantum vacuum noise, which stems from fluctuating electromagnetic fields, has shown promise for high speed and energy-efficient stochastic photonic elements. Nevertheless, photonic computing hardware which can control these stochastic elements to program probabilistic machine learning algorithms has been limited. Here, we implement a photonic probabilistic computer consisting of a controllable stochastic photonic element - a photonic probabilistic neuron (PPN). Our PPN is implemented in a bistable optical parametric oscillator (OPO) with vacuum-level injected bias fields. We then program a measurement-and-feedback loop for time-multiplexed PPNs with electronic processors (FPGA or GPU) to solve certain probabilistic machine learning tasks. We showcase probabilistic inference and image generation of MNIST-handwritten digits, which are representative examples of discriminative and generative models. In both implementations, quantum vacuum noise is used as a random seed to encode classification uncertainty or probabilistic generation of samples. In addition, we propose a path towards an all-optical probabilistic computing platform, with an estimated sampling rate of ~ 1 Gbps and energy consumption of ~ 5 fJ/MAC. Our work paves the way for scalable, ultrafast, and energy-efficient probabilistic machine learning hardware. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03199 [pdf, other]

Operator Learning Renormalization Group

Authors: Xiu-Zhe Luo, Di Luo, Roger G. Melko

Abstract: In this paper, we present a general framework for quantum many-body simulations called the operator learning renormalization group (OLRG). Inspired by machine learning perspectives, OLRG is a generalization of Wilson's numerical renormalization group and White's density matrix renormalization group, which recursively builds a simulatable system to approximate a target system of the same number of… ▽ More In this paper, we present a general framework for quantum many-body simulations called the operator learning renormalization group (OLRG). Inspired by machine learning perspectives, OLRG is a generalization of Wilson's numerical renormalization group and White's density matrix renormalization group, which recursively builds a simulatable system to approximate a target system of the same number of sites via operator maps. OLRG uses a loss function to minimize the error of a target property directly by learning the operator map in lieu of a state ansatz. This loss function is designed by a scaling consistency condition that also provides a provable bound for real-time evolution. We implement two versions of the operator maps for classical and quantum simulations. The former, which we call the Operator Matrix Map, can be implemented via neural networks on classical computers. The latter, which we call the Hamiltonian Expression Map, generates device pulse sequences to leverage the capabilities of quantum computing hardware. We illustrate the performance of both maps for calculating time-dependent quantities in the quantum Ising model Hamiltonian. △ Less

Submitted 28 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 18 pages, 14 figures

Report number: MIT-CTP/5676

arXiv:2403.00733 [pdf, ps, other]

Remarks on "Successive Convexification: A Superlinearly Convergent Algorithm for Non-convex Optimal Control Problems"

Authors: Dayou Luo, Purnanand Elango, Behcet Acikmese

Abstract: The purpose of this note is to highlight and address inaccuracies in the convergence guarantees of SCvx, a nonconvex trajectory optimization algorithm proposed by Mao et al. (arXiv:1804.06539), and make connections to relevant prior work. Specifically, we identify errors in the convergence proof within Mao et al. (arXiv:1804.06539) and reestablish the proof of convergence by employing a new method… ▽ More The purpose of this note is to highlight and address inaccuracies in the convergence guarantees of SCvx, a nonconvex trajectory optimization algorithm proposed by Mao et al. (arXiv:1804.06539), and make connections to relevant prior work. Specifically, we identify errors in the convergence proof within Mao et al. (arXiv:1804.06539) and reestablish the proof of convergence by employing a new method under stricter assumptions. △ Less

Submitted 13 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.10434 [pdf, other]

Parametric Augmentation for Time Series Contrastive Learning

Authors: Xu Zheng, Tianchun Wang, Wei Cheng, Aitian Ma, Haifeng Chen, Mo Sha, Dongsheng Luo

Abstract: Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data au… ▽ More Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a contrastive learning framework with parametric augmentation, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5\% reduction in MSE and 4.7\% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a $1.2\%$ increase in average accuracy. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

arXiv:2402.07484 [pdf, other]

An elementary approach to mixing and dissipation enhancement by transport noise

Authors: Dejun Luo, Bin Tang, Guohuan Zhao

Abstract: We investigate the mixing properties of solutions to the stochastic transport equation $d u= \circ d W \cdot\nabla u$, where the driving noise $W(t,x)$ is white in time, colored and divergence-free in space. Furthermore, we prove the dissipation enhancement in the presence of a small viscous term. Applying our results, we also derive the mixing properties for a regularized stochastic 2D Euler equa… ▽ More We investigate the mixing properties of solutions to the stochastic transport equation $d u= \circ d W \cdot\nabla u$, where the driving noise $W(t,x)$ is white in time, colored and divergence-free in space. Furthermore, we prove the dissipation enhancement in the presence of a small viscous term. Applying our results, we also derive the mixing properties for a regularized stochastic 2D Euler equation. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 34 pages

Showing 1–50 of 375 results for author: Luo, D