subscribe to arXiv mailings

Flow Perturbation to Accelerate Unbiased Sampling of Boltzmann distribution

Abstract: Flow-based generative models have been employed for sampling the Boltzmann distribution, but their application to high-dimensional systems is hindered by the significant computational cost of obtaining the Jacobian of the flow. To overcome this challenge, we introduce the flow perturbation method, which incorporates optimized stochastic perturbations into the flow. By reweighting trajectories gene… ▽ More Flow-based generative models have been employed for sampling the Boltzmann distribution, but their application to high-dimensional systems is hindered by the significant computational cost of obtaining the Jacobian of the flow. To overcome this challenge, we introduce the flow perturbation method, which incorporates optimized stochastic perturbations into the flow. By reweighting trajectories generated by the perturbed flow, our method achieves unbiased sampling of the Boltzmann distribution with orders of magnitude speedup compared to both brute force Jacobian calculations and the Hutchinson estimator. Notably, it accurately sampled the Chignolin protein with all atomic Cartesian coordinates explicitly represented, which, to our best knowledge, is the largest molecule ever Boltzmann sampled in such detail using generative models. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10481 [pdf, other]

doi 10.1145/3641519.3657492

SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation

Authors: Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

Abstract: Physically-simulated models for human motion can generate high-quality responsive character animations, often in real-time. Natural language serves as a flexible interface for controlling these models, allowing expert and non-expert users to quickly create and edit their animations. Many recent physics-based animation methods, including those that use text interfaces, train control policies using… ▽ More Physically-simulated models for human motion can generate high-quality responsive character animations, often in real-time. Natural language serves as a flexible interface for controlling these models, allowing expert and non-expert users to quickly create and edit their animations. Many recent physics-based animation methods, including those that use text interfaces, train control policies using reinforcement learning (RL). However, scaling these methods beyond several hundred motions has remained challenging. Meanwhile, kinematic animation models are able to successfully learn from thousands of diverse motions by leveraging supervised learning methods. Inspired by these successes, in this work we introduce SuperPADL, a scalable framework for physics-based text-to-motion that leverages both RL and supervised learning to train controllers on thousands of diverse motion clips. SuperPADL is trained in stages using progressive distillation, starting with a large number of specialized experts using RL. These experts are then iteratively distilled into larger, more robust policies using a combination of reinforcement learning and supervised learning. Our final SuperPADL controller is trained on a dataset containing over 5000 skills and runs in real time on a consumer GPU. Moreover, our policy can naturally transition between skills, allowing for users to interactively craft multi-stage animations. We experimentally demonstrate that SuperPADL significantly outperforms RL-based baselines at this large data scale. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10215 [pdf, other]

DMRIntTk: integrating different DMR sets based on density peak clustering

Authors: Wenjin Zhang, Wenlong Jie, Wanxin Cui, Guihua Duan, You zou, Xiaoqing Peng

Abstract: \textbf{Background}: Identifying differentially methylated regions (DMRs) is a basic task in DNA methylation analysis. However, due to the different strategies adopted, different DMR sets will be predicted on the same dataset, which poses a challenge in selecting a reliable and comprehensive DMR set for downstream analysis. \textbf{Results}: Here, we develop DMRIntTk, a toolkit for integrating DMR… ▽ More \textbf{Background}: Identifying differentially methylated regions (DMRs) is a basic task in DNA methylation analysis. However, due to the different strategies adopted, different DMR sets will be predicted on the same dataset, which poses a challenge in selecting a reliable and comprehensive DMR set for downstream analysis. \textbf{Results}: Here, we develop DMRIntTk, a toolkit for integrating DMR sets predicted by different methods on a same dataset. In DMRIntTk, the genome is segmented into bins and the reliability of each DMR set at different methylation thresholds is evaluated. Then, the bins are weighted based on the covered DMR sets and integrated into DMRs by using a density peak clustering algorithm. To demonstrate the practicality of DMRIntTk, DMRIntTk was applied to different scenarios, including different tissues with relatively large methylation differences, cancer tissues versus normal tissues with medium methylation differences, and disease tissues versus normal tissues with subtle methylation differences. The results show that DMRIntTk can effectively trim the regions with small methylation differences in the original DMR sets and therefore it can enhance the proportion of DMRs with higher methylation differences. In addition, the overlap analysis suggests that the integrated DMR sets are quite comprehensive, and the functional analysis indicates the integrated disease-related DMR sets are significantly enriched in biological pathways, which are associated with the pathological mechanisms of the diseases. \textbf{Conclusions}: Conclusively, DMRIntTk can help researchers obtaining a reliable and comprehensive DMR set from many prediction methods. \textbf{Keywords}:{Differentially methylated regions, Methylation array, Cancer-related differentially methylated regions, Tissue-specific differentially methylated regions, Density peak clustering.} △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 21 pages, 9 figures

arXiv:2407.08931 [pdf, other]

Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous lidar-based OVD methods only focus on the usage of object-level features, ignoring the essence of scene-level information. In this paper, we propose a Global-Local Collaborative Scheme (GLIS) for the lidar-based OVD task, which contains a local branch to generate object-level detection result and a global branch to obtain scene-level global feature. With the global-local information, a Large Language Model (LLM) is applied for chain-of-thought inference, and the detection result can be refined accordingly. We further propose Reflected Pseudo Labels Generation (RPLG) to generate high-quality pseudo labels for supervision and Background-Aware Object Localization (BAOL) to select precise object proposals. Extensive experiments on ScanNetV2 and SUN RGB-D demonstrate the superiority of our methods. Code is released at https://github.com/GradiusTwinbee/GLIS. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: accepted by ECCV 2024

arXiv:2407.08353 [pdf]

One-dimensional flat bands in phosphorene nanoribbons with pentagonal nature

Authors: Shuo Sun, Jing-Yang You, Zhihao Cai, Jie Su, Tong Yang, Xinnan Peng, Yihe Wang, Daiyu Geng, Jian Gou, Yuli Huang, Sisheng Duan, Lan Chen, Kehui Wu, Andrew T. S. Wee, Yuan Ping Feng, Jia Lin Zhang, Jiong Lu, Baojie Feng, Wei Chen

Abstract: Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNR… ▽ More Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNRs, wherein the corresponding flat bands are directly verified by using angle-resolved photoemission spectroscopy. We confirm that the observed 1D flat bands originate from the electronic 1D sawtooth and Lieb lattices, respectively, as revealed by the combination of bond-resolved scanning tunneling microscopy, scanning tunneling spectroscopy, tight-binding models, and first-principles calculations. Our study demonstrates a general way to construct 1D flat bands in 1D solid materials system, which provides a robust platform to explore strongly interacting phases of matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages, 4 figures

arXiv:2407.08195 [pdf]

A Text-to-Game Engine for UGC-Based Role-Playing Games

Authors: Lei Zhang, Xuezheng Peng, Shuyi Yang, Feiyang Wang

Abstract: The shift from professionally generated content (PGC) to user-generated content (UGC) has revolutionized various media formats, from text to video. With the rapid advancements in generative AI, a similar shift is set to transform the game industry, particularly in the realm of role-playing games (RPGs). This paper introduces a new framework for a text-to-game engine that utilizes foundation models… ▽ More The shift from professionally generated content (PGC) to user-generated content (UGC) has revolutionized various media formats, from text to video. With the rapid advancements in generative AI, a similar shift is set to transform the game industry, particularly in the realm of role-playing games (RPGs). This paper introduces a new framework for a text-to-game engine that utilizes foundation models to convert simple textual inputs into complex, interactive RPG experiences. The engine dynamically renders the game story in a multi-modal format and adjusts the game character, environment, and mechanics in real-time in response to player actions. Using this framework, we developed the "Zagii" game engine, which has successfully supported hundreds of RPG games across a diverse range of genres and facilitated tens of thousands of online user gameplay instances. This validates the effectiveness of our frame-work. Our work showcases the potential for a more open and democratized gaming paradigm, highlighting the transformative impact of generative AI on the game life cycle. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 13 pages,11 figures

arXiv:2407.07200 [pdf, ps, other]

Measuring Trust for Exoskeleton Systems

Authors: Leia Stirling, Man I Wu, Xiangyu Peng

Abstract: Wearable robotic systems are a class of robots that have a tight coupling between human and robot movements. Similar to non-wearable robots, it is important to measure the trust a person has that the robot can support achieving the desired goals. While some measures of trust may apply to all potential robotic roles, there are key distinctions between wearable and non-wearable robotic systems. In t… ▽ More Wearable robotic systems are a class of robots that have a tight coupling between human and robot movements. Similar to non-wearable robots, it is important to measure the trust a person has that the robot can support achieving the desired goals. While some measures of trust may apply to all potential robotic roles, there are key distinctions between wearable and non-wearable robotic systems. In this paper, we considered the dimensions and sub-dimensions of trust, with example attributes defined for exoskeleton applications. As the research community comes together to discuss measures of trust, it will be important to consider how the selected measures support interpreting trust along different dimensions for the variety of robotic systems that are emerging in the field in a way that leads to actionable outcomes. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Taking a Closer Look: Refining Trust and Its Impact in HRI Workshop, HRI '24, March 11, 2024

arXiv:2407.06584 [pdf, other]

HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel design of this framework tackles the challenges of integrating continuous locomotion control and manipulation using legs. It develops an operational space locomotion controller that can track arbitrary robot end-effector (toe) trajectories while walking at different velocities. This controller is designed to be general to different downstream tasks, and therefore, can be utilized in high-level manipulation planning policy to address specific tasks. To demonstrate the versatility of this framework, we utilize HiLMa-Res to tackle several challenging loco-manipulation tasks using a quadrupedal robot in the real world. These tasks span from leveraging state-based policy to vision-based policy, from training purely from the simulation data to learning from real-world data. In these tasks, HiLMa-Res shows better performance than other methods. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: IROS 2024

arXiv:2407.04949 [pdf, other]

Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients

Authors: Mengmeng Ma, Tang Li, Xi Peng

Abstract: Federated Learning is widely employed to tackle distributed sensitive data. Existing methods primarily focus on addressing in-federation data heterogeneity. However, we observed that they suffer from significant performance degradation when applied to unseen clients for out-of-federation (OOF) generalization. The recent attempts to address generalization to unseen clients generally struggle to sca… ▽ More Federated Learning is widely employed to tackle distributed sensitive data. Existing methods primarily focus on addressing in-federation data heterogeneity. However, we observed that they suffer from significant performance degradation when applied to unseen clients for out-of-federation (OOF) generalization. The recent attempts to address generalization to unseen clients generally struggle to scale up to large-scale distributed settings due to high communication or computation costs. Moreover, methods that scale well often demonstrate poor generalization capability. To achieve OOF-resiliency in a scalable manner, we propose Topology-aware Federated Learning (TFL) that leverages client topology - a graph representing client relationships - to effectively train robust models against OOF data. We formulate a novel optimization problem for TFL, consisting of two key modules: Client Topology Learning, which infers the client relationships in a privacy-preserving manner, and Learning on Client Topology, which leverages the learned topology to identify influential clients and harness this information into the FL optimization process to efficiently build robust models. Empirical evaluation on a variety of real-world datasets verifies TFL's superior OOF robustness and scalability. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: ICML 2024

arXiv:2407.03900 [pdf, other]

Oracle Bone Inscriptions Multi-modal Dataset

Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories. Therefore, this paper proposes an Oracle Bone Inscriptions Multi-modal Dataset(OBIMD), which includes annotation information for 10,077 pieces of oracle bones. Each piece has two modalities: pixel-level aligned rubbings and facsimiles. The dataset annotates the detection boxes, character categories, transcriptions, corresponding inscription groups, and reading sequences in the groups of each oracle bone character, providing a comprehensive and high-quality level of annotations. This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on. We believe that the creation and publication of a dataset like this will help significantly advance the application of AI algorithms in the field of OBI research. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03886 [pdf, other]

DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment

Authors: Jinsong Shi, Pan Gao, Xiaojiang Peng, Jie Qin

Abstract: Image quality assessment (IQA) has long been a fundamental challenge in image understanding. In recent years, deep learning-based IQA methods have shown promising performance. However, the lack of large amounts of labeled data in the IQA field has hindered further advancements in these methods. This paper introduces DSMix, a novel data augmentation technique specifically designed for IQA tasks, ai… ▽ More Image quality assessment (IQA) has long been a fundamental challenge in image understanding. In recent years, deep learning-based IQA methods have shown promising performance. However, the lack of large amounts of labeled data in the IQA field has hindered further advancements in these methods. This paper introduces DSMix, a novel data augmentation technique specifically designed for IQA tasks, aiming to overcome this limitation. DSMix leverages the distortion-induced sensitivity map (DSM) of an image as prior knowledge. It applies cut and mix operations to diverse categories of synthetic distorted images, assigning confidence scores to class labels based on the aforementioned prior knowledge. In the pre-training phase using DSMix-augmented data, knowledge distillation is employed to enhance the model's ability to extract semantic features. Experimental results on both synthetic and authentic IQA datasets demonstrate the significant predictive and generalization performance achieved by DSMix, without requiring fine-tuning of the full model. Code is available at \url{https://github.com/I2-Multimedia-Lab/DSMix}. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.02095 [pdf, other]

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Authors: Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng

Abstract: Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined type… ▽ More Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference. △ Less

Submitted 16 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by ICSE'25

arXiv:2406.19602 [pdf, other]

A Survey on Deep Clustering: From the Prior Perspective

Authors: Yiding Lu, Haobin Li, Yunfan Li, Yijie Lin, Xi Peng

Abstract: Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation… ▽ More Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community. △ Less

Submitted 30 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18629 [pdf, other]

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

Authors: Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia

Abstract: Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benef… ▽ More Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benefits for long-chain mathematical reasoning, as models employing DPO struggle to identify detailed errors in incorrect answers. This limitation stems from a lack of fine-grained process supervision. We propose a simple, effective, and data-efficient method called Step-DPO, which treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Additionally, we have developed a data construction pipeline for Step-DPO, enabling the creation of a high-quality dataset containing 10K step-wise preference pairs. We also observe that in DPO, self-generated data is more effective than data generated by humans or GPT-4, due to the latter's out-of-distribution nature. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters. Notably, Step-DPO, when applied to Qwen2-72B-Instruct, achieves scores of 70.8% and 94.0% on the test sets of MATH and GSM8K, respectively, surpassing a series of closed-source models, including GPT-4-1106, Claude-3-Opus, and Gemini-1.5-Pro. Our code, data, and models are available at https://github.com/dvlab-research/Step-DPO. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Code, data, and models are available at https://github.com/dvlab-research/Step-DPO

arXiv:2406.17304 [pdf, other]

Leveraging LLMs for Dialogue Quality Measurement

Authors: Jinghan Jia, Abi Komma, Timothy Leffel, Xujun Peng, Ajay Nagesh, Tamer Soliman, Aram Galstyan, Anoop Kumar

Abstract: In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and pro… ▽ More In task-oriented conversational AI evaluation, unsupervised methods poorly correlate with human judgments, and supervised approaches lack generalization. Recent advances in large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks. This paper explores using LLMs for automated dialogue quality evaluation, experimenting with various configurations on public and proprietary datasets. Manipulating factors such as model size, in-context examples, and selection techniques, we examine "chain-of-thought" (CoT) reasoning and label extraction procedures. Our results show that (1) larger models yield more accurate dialogue labels; (2) algorithmic selection of in-context examples outperforms random selection; (3) CoT reasoning where an LLM is asked to provide justifications before outputting final labels improves performance; and (4) fine-tuned LLMs outperform out-of-the-box ones. Our results indicate that LLMs that are suitably fine-tuned and have sufficient reasoning capabilities can be leveraged for automated dialogue evaluation. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16262 [pdf, ps, other]

Large deviations for 2D Stochastic Chemotaxis-Navier-Stokes System

Authors: Yunfeng Chen, Xuhui Peng, Jianliang Zhai

Abstract: In this paper, we establish a large deviation principle for 2D stochastic Chemotaxis-Navier-Stokes equation perturbed by a small multiplicative noise. The main difficulties come from the lack of a suitable compact embedding into the space occupied by the solutions and the inherent complexity of equation. Finite dimensional projection arguments and introducing suitable stopping times play important… ▽ More In this paper, we establish a large deviation principle for 2D stochastic Chemotaxis-Navier-Stokes equation perturbed by a small multiplicative noise. The main difficulties come from the lack of a suitable compact embedding into the space occupied by the solutions and the inherent complexity of equation. Finite dimensional projection arguments and introducing suitable stopping times play important roles. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14800 [pdf, ps, other]

Multi-quasisymmetric functions with semigroup exponents, Hopf algebras and Rota-Baxter algebras

Authors: Xing Gao, Li Guo, Xiao-Song Peng

Abstract: Many years ago, G.-C.~Rota discovered a close connection between symmetric functions and Rota-Baxter algebras, and proposed to study generalizations of symmetric functions in the framework of Rota-Baxter algebras. Guided by this proposal, quasisymmetric functions from weak composition (instead of just compositions) were obtained from free Rota-Baxter algebras on one generator. This paper aims to g… ▽ More Many years ago, G.-C.~Rota discovered a close connection between symmetric functions and Rota-Baxter algebras, and proposed to study generalizations of symmetric functions in the framework of Rota-Baxter algebras. Guided by this proposal, quasisymmetric functions from weak composition (instead of just compositions) were obtained from free Rota-Baxter algebras on one generator. This paper aims to generalize this approach to free Rota-Baxter algebras on multiple generators in order to obtain further generalizations of quasisymmetric functions. For this purpose and also for its independent interest, the space $\mathrm{MQSym}$ of quasisymmetric functions on multiple sequences of variables is defined, generalizing quasisymmetric functions and diagonally quasisymmetric functions of Aval, Bergeron and Bergeron. Linear bases of such multi-quasisymmetric functions are given by monomial multi-quasisymmetric functions and fundamental multi-quasisymmetric functions, the latter recover the fundamental $G^m$-quasisymmetric functions of Aval and Chapoton. Next introduced is the even more general notion of multi-quasisymmetric functions $\mathrm{MQSym}^E$ with exponents in a semigroup $E$, which also generalizes the quasisymmetric functions with semigroup exponents in a recent work. Through this approach, a natural Hopf algebraic structure is obtained on $\mathrm{MQSym}^E$. Finally, in support of Rota's proposal, the free commutative unitary Rota-Baxter algebra on a finite set is shown to be isomorphic to a scalar extension of $\mathrm{MQSym}^E$, a fact which in turn equips the free Rota-Baxter algebra with a Hopf algebra structure. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 27 pages

MSC Class: 05E05; 16W99; 16S100; 17B38; 08B20; 16T30

arXiv:2406.14185 [pdf, other]

Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices

Authors: Li Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei

Abstract: The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their comp… ▽ More The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their computing/communication capacity and prone to crash or timeout failures. In this paper, we present RoCoIn, a robust cooperative inference mechanism for locally distributed execution of deep neural network-based inference tasks over heterogeneous edge devices. It creates a set of independent and compact student models that are learned from a large model using knowledge distillation for distributed deployment. In particular, the devices are strategically grouped to redundantly deploy and execute the same student model such that the inference process is resilient to any local failures, while a joint knowledge partition and student model assignment scheme are designed to minimize the response latency of the distributed inference system in the presence of devices with diverse capacities. Extensive simulations are conducted to corroborate the superior performance of our RoCoIn for distributed inference compared to several baselines, and the results demonstrate its efficacy in timely inference and failure resiliency. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12379 [pdf, other]

The projected sensitivity of SCEP experiment to Magnetic Monopole

Authors: Changqing Ye, Beige Liu, Zhe Cao, Lingzhi Han, Xinming Huang, Min Jiang, Dong Liu, Qing Lin, Shitian Wan, Yusheng Wu, Lei Zhao, Yue Zhang, Xinhua Peng, Zhengguo Zhao

Abstract: The investigation of beyond-Standard-Model particles is a compelling direction in the pursuit of new physics. One such hypothetical particle, the magnetic monopole, has garnered considerable attention due to its strong theoretical motivation and potential to unveil profound physical phenomena. The magnetic monopole is intricately linked to the long-standing enigma surrounding the quantization of e… ▽ More The investigation of beyond-Standard-Model particles is a compelling direction in the pursuit of new physics. One such hypothetical particle, the magnetic monopole, has garnered considerable attention due to its strong theoretical motivation and potential to unveil profound physical phenomena. The magnetic monopole is intricately linked to the long-standing enigma surrounding the quantization of electric charge. In this manuscript, we propose a novel detection scenario for magnetic monopoles by employing a coincidence measurement technique that combines a room-temperature magnetometer with plastic scintillators. This setup allows for the collection of both the induction and scintillation signals generated by the passage of a monopole. The estimation of the sensitivity using a simple benchmark setup is given. △ Less

Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11161 [pdf, other]

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Authors: Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann

Abstract: Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing su… ▽ More Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023 challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 37 pages, 12 figures, Project: https://github.com/ZebangCheng/Emotion-LLaMA, Demo: https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA

arXiv:2406.11147 [pdf, other]

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77. △ Less

Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11087 [pdf, other]

MemDPT: Differential Privacy for Memory Efficient Language Models

Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on memory resources, which is a matter of significant concern in practice. In this paper, we present an innovative training framework MemDPT that not only reduces the memory cost of large language models but also places a strong emphasis on safeguarding user data privacy. MemDPT provides edge network and reverse network designs to accommodate various differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves $2 \sim 3 \times$ memory optimization but also provides robust privacy protection, ensuring that user data remains secure and confidential. Extensive experiments have demonstrated that MemDPT can effectively provide differential privacy efficient fine-tuning across various task scenarios. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages first version

arXiv:2406.10018 [pdf, other]

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

Authors: Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou

Abstract: Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based rep… ▽ More Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures

arXiv:2406.09834 [pdf, other]

How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs, the specific problem of deprecated API usage in LLM-based code completion has not been thoroughly investigated. To address this gap, we conducted the first evaluation study on deprecated API usage in LLM-based code completion. This study involved seven advanced LLMs, 145 API mappings from eight popular Python libraries, and 28,125 completion prompts. The study results reveal the \textit{status quo} and \textit{root causes} of deprecated API usage in LLM-based code completion from the perspectives of \textit{model}, \textit{prompt}, and \textit{library}. Based on these findings, we propose two lightweight fixing approaches, \textsc{ReplaceAPI} and \textsc{InsertPrompt}, which can serve as baseline approaches for future research on mitigating deprecated API usage in LLM-based completion. Additionally, we provide implications for future research on integrating library evolution with LLM-driven software development. △ Less

Submitted 3 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09730 [pdf]

Ultra-bright and energy-efficient quantum-dot LEDs by idealizing charge injection

Authors: Yizhen Zheng, Xing Lin, Jiongzhao Li, Jianan Chen, Zixuan Song, Yuan Gao, Huifeng Wang, Zikang Ye, Haiyan Qin, Xiaogang Peng

Abstract: Lighting and display, relying on electric and optical down-conversion emission with sluggish power efficiency, account for >15% global electricity consumption1,2. In 2014, quantum-dot (QD) LEDs (QLEDs) with near-optimal external quantum efficiency emerged3 and promised a pathway to avoid the vast down-conversion energy loss4,5. Despite a decade of progress4-22, fabrication of energy-efficient QLED… ▽ More Lighting and display, relying on electric and optical down-conversion emission with sluggish power efficiency, account for >15% global electricity consumption1,2. In 2014, quantum-dot (QD) LEDs (QLEDs) with near-optimal external quantum efficiency emerged3 and promised a pathway to avoid the vast down-conversion energy loss4,5. Despite a decade of progress4-22, fabrication of energy-efficient QLEDs with application-relevant brightness remains elusive. Here, the main roadblock is identified as the oxidative species adsorbed in the nanocrystalline electron-injection layer of QLEDs, which is then addressed by a simple reductive treatment to simultaneously boosts electron conductivity and hole blockage of the electron-injection layer. The resulting sub-bandgap-driven QLEDs with optimal efficiency achieve ultra-high brightness across the entire visible spectrum at least 2.6-fold higher than existing benchmarks. The brightness fully satisfies the demands of various forms of lighting and display, which surges to a remarkable level sufficient for QD laser diodes with a moderate bias (~9 V). Optimized electron injection further enables new types of QD-blend LEDs for diffuse white-light sources surpassing the 2035 R&D targets set by the U.S. Department of Energy. Our findings open a door for understanding and optimizing carrier transport in nanocrystalline semiconductors shared by various types of solution-processed optoelectronic devices. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07625 [pdf, other]

Emergent Universal Quench Dynamics in Randomly Interacting Spin Models

Authors: Yuchen Li, Tian-Gang Zhou, Ze Wu, Pai Peng, Shengyu Zhang, Riqiang Fu, Ren Zhang, Wei Zheng, Pengfei Zhang, Hui Zhai, Xinhua Peng, Jiangfeng Du

Abstract: Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also… ▽ More Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures; Supplementary Information 26 pages, 11 figures, 2 tables

arXiv:2406.06615 [pdf, other]

Language Guided Skill Discovery

Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity" of skills. We hypothesize that leveraging the semantic knowledge of large language models (LLMs) can lead us to improve semantic diversity of resulting behaviors. In this sense, we introduce Language Guided Skill Discovery (LGSD), a skill discovery framework that aims to directly maximize the semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. The prompts serve as a means to constrain the search space into a semantically desired subspace, and the generated LLM outputs guide the agent to visit semantically diverse states within the subspace. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt. Furthermore, we show that language guidance aids in discovering more diverse skills compared to five existing skill discovery methods in robot-arm manipulation environments. Lastly, LGSD provides a simple way of utilizing learned skills via natural language. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04677 [pdf, other]

Electric leakage suppression of phase-transforming ferroelectrics with donor impurities

Authors: Chenbo Zhang, Xiaotong Peng, Bo Liu, Kai Zhang, Xian Chen

Abstract: Phase-transforming ferroelectric materials are widely used in energy harvesting and conversion devices. However, the functionality of these devices is significantly impeded by electrical leakage at high temperatures. In this study, we fundamentally study the mechanism of electrical leakage suppression due to phase transformation in a series of donor-doped ferroelectric oxides,Ba0.955Eu0.03Ti(1-x)Z… ▽ More Phase-transforming ferroelectric materials are widely used in energy harvesting and conversion devices. However, the functionality of these devices is significantly impeded by electrical leakage at high temperatures. In this study, we fundamentally study the mechanism of electrical leakage suppression due to phase transformation in a series of donor-doped ferroelectric oxides,Ba0.955Eu0.03Ti(1-x)ZrxO3 with 0<= x<= 0.15. Our experiments clearly demonstrate that the symmetry-breaking phase transformations result in the reduction in electrical conductivity of the donor-doped ferroelectric oxides. The DFT calculation suggests that the donor energy level undergoes a shallow-to-deep transition at the phase transformation temperature. By analyzing the constitutive model of the leakage current density function, we propose a leakage suppression coefficient that rationalizes the development of ferroelectrics with low electrical leakage at elevated temperatures. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 16 pages, 6 figures

arXiv:2406.04482 [pdf, other]

Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Authors: Claire Jin, Sudha Rao, Xiangyu Peng, Portia Botchway, Jessica Quaye, Chris Brockett, Bill Dolan

Abstract: Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detec… ▽ More Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted for publication in Findings of the Association for Computational Linguistics: ACL 2024

arXiv:2406.03807 [pdf, other]

Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs can address tasks that they cannot complete independently, thereby enhancing their potential across different tasks. However, this approach faces two key challenges. First, redundant error correction leads to unstable planning and long execution time. Additionally, designing a correct plan among multiple tools is also a challenge in tool learning. To address these issues, we propose Tool-Planner, a task-processing framework based on toolkits. Tool-Planner groups tools based on the API functions with the same function into a toolkit and allows LLMs to implement planning across the various toolkits. When a tool error occurs, the language model can reselect and adjust tools based on the toolkit. Experiments show that our approach demonstrates a high pass and win rate across different datasets and optimizes the planning scheme for tool learning in models such as GPT-4 and Claude 3, showcasing the potential of our method. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 46pages first version

arXiv:2405.18741 [pdf, other]

Genshin: General Shield for Natural Language Processing with Large Language Models

Authors: Xiao Peng, Tao Liu, Ying Wang

Abstract: Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains lik… ▽ More Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless. △ Less

Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18396 [pdf, other]

What can machine learning help with microstructure-informed materials modeling and design?

Authors: Xiang-Long Peng, Mozhdeh Fathidoost, Binbin Lin, Yangyiwei Yang, Bai-Xiang Xu

Abstract: Machine learning techniques have been widely employed as effective tools in addressing various engineering challenges in recent years, particularly for the challenging task of microstructure-informed materials modeling. This work provides a comprehensive review of the current machine learning-assisted and data-driven advancements in this field, including microstructure characterization and reconst… ▽ More Machine learning techniques have been widely employed as effective tools in addressing various engineering challenges in recent years, particularly for the challenging task of microstructure-informed materials modeling. This work provides a comprehensive review of the current machine learning-assisted and data-driven advancements in this field, including microstructure characterization and reconstruction, multiscale simulation, correlations among process, microstructure, and properties, as well as microstructure optimization and inverse design. It outlines the achievements of existing research through best practices and suggests potential avenues for future investigations. Moreover, it prepares the readers with educative instructions of basic knowledge and an overview on machine learning, microstructure descriptors and machine learning-assisted material modeling, lowering the interdisciplinary hurdles. It should help to stimulate and attract more research attention to the rapidly growing field of machine learning-based modeling and design of microstructured materials. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.18347 [pdf, other]

Dataset Growth

Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. However, these methods are mainly proposed for offline settings that target one of the cleanness and redundancy problems. In practice, data are growing exponentially with both problems. This leads to repeated data curation with sub-optimal efficiency. To tackle this challenge, we propose InfoGrowth, an efficient online algorithm for data cleaning and selection, resulting in a growing dataset that keeps up to date with awareness of cleanliness and diversity. InfoGrowth can improve data quality/efficiency on both single-modal and multi-modal tasks, with an efficient and scalable design. Its framework makes it practical for real-world data engines. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17403 [pdf, other]

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Authors: Kai Wang, Yukun Zhou, Mingjia Shi, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You

Abstract: Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many… ▽ More Training diffusion models is always a computation-intensive task. In this paper, we introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps. Our key findings are: i) Time steps can be empirically divided into acceleration, deceleration, and convergence areas based on the process increment. ii) These time steps are imbalanced, with many concentrated in the convergence area. iii) The concentrated steps provide limited benefits for diffusion training. To address this, we design an asymmetric sampling strategy that reduces the frequency of steps from the convergence area while increasing the sampling probability for steps from other areas. Additionally, we propose a weighting strategy to emphasize the importance of time steps with rapid-change process increments. As a plug-and-play and architecture-agnostic approach, SpeeD consistently achieves 3-times acceleration across various diffusion architectures, datasets, and tasks. Notably, due to its simple design, our approach significantly reduces the cost of diffusion model training with minimal overhead. Our research enables more researchers to train diffusion models at a lower cost. △ Less

Submitted 27 May, 2024; originally announced May 2024.

ACM Class: I.2

arXiv:2405.16780 [pdf, other]

Analysis of Broken Randomized Experiments by Principal Stratification

Authors: Qinqing Liu, Xiang Peng, Tao Zhang, Yuhao Deng

Abstract: Although randomized controlled trials have long been regarded as the ``gold standard'' for evaluating treatment effects, there is no natural prevention from post-treatment events. For example, non-compliance makes the actual treatment different from the assigned treatment, truncation-by-death renders the outcome undefined or ill-defined, and missingness prevents the outcomes from being measured. I… ▽ More Although randomized controlled trials have long been regarded as the ``gold standard'' for evaluating treatment effects, there is no natural prevention from post-treatment events. For example, non-compliance makes the actual treatment different from the assigned treatment, truncation-by-death renders the outcome undefined or ill-defined, and missingness prevents the outcomes from being measured. In this paper, we develop a statistical analysis framework using principal stratification to investigate the treatment effect in broken randomized experiments. The average treatment effect in compliers and always-survivors is adopted as the target causal estimand. We establish the asymptotic property for the estimator. We apply the framework to study the effect of training on earnings in the Job Corps Study and find that the training program does not have an effect on employment but possibly have an effect on improving the earnings after employment. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16074 [pdf, other]

Global existence and Rayleigh-Taylor instability for the semi-dissipative Boussinesq system with Naiver boundary conditions

Authors: Huafei Di, Liang Li, Xiaoming Peng, Quan Wang

Abstract: Considered herein is the global existence of weak, strong solutions and Rayleigh-Taylor (RT) instability for 2D semi-dissipative Boussinesq equations in an infinite strip domain $Ω_{\infty}$ subject to Navier boundary conditions with non-positive slip coefficients. We first prove the global existence of weak and strong solutions on bounded domain $Ω_{R}$ via the Galerkin method, characteristic ana… ▽ More Considered herein is the global existence of weak, strong solutions and Rayleigh-Taylor (RT) instability for 2D semi-dissipative Boussinesq equations in an infinite strip domain $Ω_{\infty}$ subject to Navier boundary conditions with non-positive slip coefficients. We first prove the global existence of weak and strong solutions on bounded domain $Ω_{R}$ via the Galerkin method, characteristic analyzing technique and Stokes estimates etc. Based on above results, we further derive the uniform estimates, independent of the length of horizontal direction of $Ω_{R}$, ensuring the global existence of weak and strong solutions in unbounded case $Ω_{\infty}$ by utilizing the domain expansion method. Moreover, when the steady temperature is higher with decreasing height (i.e., RT steady-state) on certain region, we demonstrate that the steady-state is linear unstable through the construction of energy functional and the settlement of a family of modified variational problems. Furthermore, with the help of unstable solutions constructed in linear instability and global existence theorems, we confirm the instability of nonlinear problem in a Lipschitz structural sense. Finally, we give a series of rigorous verification (see Appendix) including the spectra of Stokes equations with Navier boundary conditions, Sobolev embedding inequalities, trace inequalities, and Stokes estimates under Navier boundary conditions etc, used in the proof of main conclusions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15261 [pdf]

doi 10.1016/j.jallcom.2023.170685

Electric Polarization and Magnetic Properties of (NH$_4$)$_{1-x}$K$_x$I (x = 0.05-0.17)

Authors: Yi Yang Xu, Lei Meng, Miao Miao Zhao, Chu Xin Peng, Fei Yen

Abstract: While all of the polymorphs of pure NH$_4$I and KI are non-polar, we identify that (NH$_4$)$_{0.95}$K$_{0.05}$I is ferroelectric and (NH$_4$)$_{0.87}$K$_{0.13}$I and (NH$_4$)$_{0.83}$K$_{0.17}$I are pyroelectric through measurements of their pyroelectric current and complex dielectric constant. The order to disorder phase transitions occur near 245 K. Magnetic susceptibility measurements indicate… ▽ More While all of the polymorphs of pure NH$_4$I and KI are non-polar, we identify that (NH$_4$)$_{0.95}$K$_{0.05}$I is ferroelectric and (NH$_4$)$_{0.87}$K$_{0.13}$I and (NH$_4$)$_{0.83}$K$_{0.17}$I are pyroelectric through measurements of their pyroelectric current and complex dielectric constant. The order to disorder phase transitions occur near 245 K. Magnetic susceptibility measurements indicate that the proton orbitals of the NH$_4$$^+$ continue to become ordered in the ground state in the (NH$_4$)$_{1-x}$K$_x$I system up to x <= 0.17. The polar phases are proposed to stem from K$^+$ ions disrupting the symmetry of proton-orbital-lattice interactions between the NH$_4$$^+$ and I$^-$ ions. Our work introduces a new pathway for the ordered phases of ammonium-based compounds to potentially become ferroelectric. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures

Journal ref: Journal of Alloys and Compounds 960, 170685 (2023)

arXiv:2405.12144 [pdf]

Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Authors: Yihan Wu, Tao Chang, Siliang Chen, Xiaodong Niu, Yu Li, Yuan Fang, Lei Yang, Yixuan Zong, Yaoxin Yang, Yuehua Li, Mengsong Wang, Wen Yang, Yixuan Wu, Chen Fu, Xia Fang, Yuxin Quan, Xilin Peng, Qiang Sun, Marc M. Van Hulle, Yanhui Liu, Ning Jiang, Dario Farina, Yuan Yang, Jiayuan He, Qing Mao

Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with gl… ▽ More Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11126 [pdf, other]

Flexible Motion In-betweening with Diffusion Models

Authors: Setareh Cohan, Guy Tevet, Daniele Reda, Xue Bin Peng, Michiel van de Panne

Abstract: Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a s… ▽ More Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a simple unified model capable of generating precise and diverse motions that conform to a flexible range of user-specified spatial constraints, as well as text conditioning. To this end, we propose Conditional Motion Diffusion In-betweening (CondMDI) which allows for arbitrary dense-or-sparse keyframe placement and partial keyframe constraints while generating high-quality motions that are diverse and coherent with the given keyframes. We evaluate the performance of CondMDI on the text-conditioned HumanML3D dataset and demonstrate the versatility and efficacy of diffusion models for keyframe in-betweening. We further explore the use of guidance and imputation-based approaches for inference-time keyframing and compare CondMDI against these methods. △ Less

Submitted 23 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: SIGGRAPH 2024. For project page and code, see https://setarehc.github.io/CondMDI/

arXiv:2405.10492 [pdf]

Automatic News Generation and Fact-Checking System Based on Language Processing

Authors: Xirui Peng, Qiming Xu, Zheng Feng, Haopeng Zhao, Lianghao Tan, Yan Zhou, Zecheng Zhang, Chenwei Gong, Yingqiao Zheng

Abstract: This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key in… ▽ More This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key information from massive data and generating well-structured, fluent news articles. Meanwhile, by integrating fact-checking technology, the system can effectively prevent the spread of false news and improve the accuracy and credibility of news. This study details the key technologies involved in automatic news generation and factchecking, including text generation, information extraction, and the application of knowledge graphs, and validates the effectiveness of these technologies through experiments. Additionally, the paper discusses the future development directions of automatic news generation and fact-checking systems, emphasizing the importance of further integration and innovation of technologies. The results show that with continuous technological optimization and practical application, these systems will play an increasingly important role in the future news industry, providing more efficient and reliable news services. △ Less

Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

ACM Class: I.5; H.4

arXiv:2405.09759 [pdf]

doi 10.1039/d1tc04718c

Ferroelectricity Driven by Orbital Resonance of Protons in CH$_3$NH$_3$Cl and CH$_3$NH$_3$Br

Authors: Chu Xin Peng, Lei Meng, Yi Yang Xu, Tian Tian Xing, Miao Miao Zhao, Peng Ren, Fei Yen

Abstract: The $β$ and $γ$ phases of methylammonium chloride CH$_3$NH$_3$Cl and methylammonium bromide CH$_3$NH$_3$Br are identified to be ferroelectric $via$ pyroelectric current and dielectric constant measurements. The magnetic susceptibility also exhibits pronounced discontinuities at the Curie temperatures. We attribute the origin of spontaneous polarization to the emergence of two groups of proton orbi… ▽ More The $β$ and $γ$ phases of methylammonium chloride CH$_3$NH$_3$Cl and methylammonium bromide CH$_3$NH$_3$Br are identified to be ferroelectric $via$ pyroelectric current and dielectric constant measurements. The magnetic susceptibility also exhibits pronounced discontinuities at the Curie temperatures. We attribute the origin of spontaneous polarization to the emergence of two groups of proton orbital magnetic moments from the uncorrelated motion of the CH$_3$ and NH$_3$ groups in the $β$ and $γ$ phases. The two inequivalent frameworks of intermolecular orbital resonances interact with each other to distort the lattice in a non-centrosymmetric fashion. Our findings indicate that the structural instabilities in molecular frameworks are magnetic in origin as well as provide a new pathway toward uncovering new organic ferroelectrics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 5 pages, 5 figures

Journal ref: J. Mater. Chem. C, 10, 1334-1338 (2022)

arXiv:2405.09227 [pdf, other]

Unraveling impacts of polycrystalline microstructures on ionic conductivity of ceramic electrolytes by computational homogenization and machine learning

Authors: Xiang-Long Peng, Bai-Xiang Xu

Abstract: The ionic conductivity at the grain boundaries (GBs) in oxide ceramics is typically several orders of magnitude lower than that within the grain interior. This detrimental GB effect is the main bottleneck for designing high-performance ceramic electrolytes intended for use in solid-state Lithium-ion batteries, fuel cells, and electrolyzer cells. The macroscopic ionic conductivity in oxide ceramics… ▽ More The ionic conductivity at the grain boundaries (GBs) in oxide ceramics is typically several orders of magnitude lower than that within the grain interior. This detrimental GB effect is the main bottleneck for designing high-performance ceramic electrolytes intended for use in solid-state Lithium-ion batteries, fuel cells, and electrolyzer cells. The macroscopic ionic conductivity in oxide ceramics is essentially governed by the underlying polycrystalline microstructures where GBs and grain morphology go hand in hand. This provides the possibility to enhance the ion conductivity by microstructure engineering. To this end, a thorough understanding of microstructure-property correlation is highly desirable. In this work, we investigate numerous polycrystalline microstructure samples with varying grain and grain boundary features. Their macroscopic ionic conductivities are numerically evaluated by the finite element homogenization method, whereby the GB resistance is explicitly regarded. The influence of different microstructural features on the effective ionic conductivity is systematically studied. The microstructure-property relationships are revealed. Additionally, a graph neural network-based machine learning model is constructed and trained. It can accurately predict the effective ionic conductivity for a given polycrystalline microstructure. This work provides crucial quantitative guidelines for optimizing the ionic conducting performance of oxide ceramics by tailoring microstructures. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09094 [pdf]

doi 10.1016/j.scriptamat.2022.115229

Magnetic interactions based on proton orbital motion in CH$_3$NH$_3$PbI$_3$ and CH$_3$NH$_3$PbBr$_3$

Authors: Lei Meng, Miao Miao Zhao, Yi Yang Xu, Chu Xin Peng, Yang Yang, Tian Tian Xing, Peng Ren, Fei Yen

Abstract: The microscopic origin of the remarkable optoelectronic properties of one of the most studied contemporary materials remains unclear. Here, we identify the existence of magnetic interactions between intermolecular proton orbitals in CH$_3$NH$_3$PbI$_3$ and CH$_3$NH$_3$PbBr$_3$. In particular, a unique sharp drop and a pronounced step-up discontinuity in the magnetic susceptibility at the tetragona… ▽ More The microscopic origin of the remarkable optoelectronic properties of one of the most studied contemporary materials remains unclear. Here, we identify the existence of magnetic interactions between intermolecular proton orbitals in CH$_3$NH$_3$PbI$_3$ and CH$_3$NH$_3$PbBr$_3$. In particular, a unique sharp drop and a pronounced step-up discontinuity in the magnetic susceptibility at the tetragonal-to-cubic phase transitions are identified in CH$_3$NH$_3$PbI$_3$ and CH$_3$NH$_3$PbBr$_3$, respectively. The magnetic interactions in the orthorhombic and tetragonal phases are dependent on thermal history and lattice orientation while nearly independent of the applied external magnetic field. In CH$_3$NH$_3$PbBr$_3$, the CH$_3$ and NH$_3$$^+$ components reorient in an uncorrelated fashion resulting the cubic phase to also exhibit magnetic anisotropy. Our findings provide a potential link connecting the highly light-absorbing CH$_3$NH$_3$$^+$ and the exceptional properties of the charge carriers of the inorganic framework in hybrid perovskite solar cells. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Manuscript + Supplementary Material file (17 + 6 pages, 4 + 2 figures)

Journal ref: Scripta Mater. 226, 115229 (2023)

arXiv:2405.09054 [pdf, other]

Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features and the feasibility based on temporal features of realizing effective detection. According to this analysis, we use a multi-frame as a detection unit and propose a detection method based on temporal energy selective scaling (TESS). Specifically, we investigated the composition of intensity temporal profiles (ITPs) formed by pixels on a multi-frame detection unit. For the target-present pixel, the target passing through the pixel will bring a weak transient disturbance on the ITP and introduce a change in the statistical properties of ITP. We use a well-designed function to amplify the transient disturbance, suppress the background and noise components, and output the trajectory of the target on the multi-frame detection unit. Subsequently, to solve the contradiction between the detection rate and the false alarm rate brought by the traditional threshold segmentation, we associate the temporal and spatial features of the output trajectory and propose a trajectory extraction method based on the 3D Hough transform. Finally, we model the trajectory of the target and propose a trajectory-based multi-target tracking method. Compared with the various state-of-the-art detection and tracking methods, experiments in multiple scenarios prove the superiority of our proposed methods. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06059 [pdf, other]

A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

Authors: Christopher Z. Cui, Xiangyu Peng, Mark O. Riedl

Abstract: Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a pr… ▽ More Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02886 [pdf]

The best whistler: a cavitating tip vortex

Authors: Zhaohui Qian, Weixiang Ye, Yongshun Zeng, Xiaoxing Peng, Xianwu Luo

Abstract: The discrete tone radiated from a cavitating tip vortex, known as "vortex singing", was first recognized in 1989, but its sound generation mechanism has remained a mystery for over thirty years. In this letter, by means of the correction for the cavitation bubble dynamics and the dispersion relation of cavity interfacial waves, we found that after the far-end disturbances propagate upstream, the w… ▽ More The discrete tone radiated from a cavitating tip vortex, known as "vortex singing", was first recognized in 1989, but its sound generation mechanism has remained a mystery for over thirty years. In this letter, by means of the correction for the cavitation bubble dynamics and the dispersion relation of cavity interfacial waves, we found that after the far-end disturbances propagate upstream, the whistling vortex should be triggered by near-end sound sources, the breathing mode waves. Further utilizing the theoretical solutions for singing lines and the potential singing cavitation number with frequency, we accurately identified all available tests for seeking the vortex singing over the past three decades, answering a long-standing perplexity: why such a best whistler is able to appear only within a narrow range of the cavitation number. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.02883 [pdf]

Trigger mechanism for a singing cavitating tip vortex

Authors: Zhaohui Qian, Yongshun Zeng, Xiaoxing Peng, Xianwu Luo

Abstract: The discrete tone radiated from tip vortex cavitation (TVC), known as 'vortex singing', was recognized in 1989, but its triggering remains unclear for over thirty years. In this study, the desinent cavitation number and viscous correction are applied to describe the dynamics of cavitation bubbles and the dispersion relation of cavity interfacial waves. The wavenumber-frequency spectrum of the cavi… ▽ More The discrete tone radiated from tip vortex cavitation (TVC), known as 'vortex singing', was recognized in 1989, but its triggering remains unclear for over thirty years. In this study, the desinent cavitation number and viscous correction are applied to describe the dynamics of cavitation bubbles and the dispersion relation of cavity interfacial waves. The wavenumber-frequency spectrum of the cavity radius from the experiment in CSSRC indicates that singing waves predominantly consist of the stationary double helical modes (kθ = 2- and -2+) and the breathing mode (kθ = 0-), rather than standing waves as assumed in previous literatures. Moreover, two trigger mechanisms, expressed by two triggering lines, are proposed: the twisted TVC, initially at rest, is driven into motion through the corrected natural frequency (fn) due to the step change of the far-field pressure. Subsequently, the frequency associated with the zero-group-velocity point (fzgv) at kθ = 0- is excited through fi, the frequency at the intersection of dispersion curves at kθ = 0- and -2+, or fj, the frequency at the intersection of dispersion curves at kθ = 0- and 2-, corresponding to two types of the vortex singing triggering. These solutions, without empirical parameters, are validated using singing conditions provided by CSSRC and G.T.H., respectively. Furthermore, the coherence and the cross-power spectral density spectrum indicates a large-scale breathing wave propagating along the singing cavity surface and travelling from downstream to hydrofoil tip, providing us a comprehensive understanding for the triggering of vortex singing. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.00414 [pdf, ps, other]

Ergodicity for 2D Navier-Stokes equations with a degenerate pure jump noise

Authors: Xuhui Peng, Jianliang Zhai, Tusheng Zhang

Abstract: In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant m… ▽ More In this paper, we establish the ergodicity for stochastic 2D Navier-Stokes equations driven by a highly degenerate pure jump Lévy noise. The noise could appear in as few as four directions. This gives an affirmative anwser to a longstanding problem. The case of Gaussian noise was treated in Hairer and Mattingly [\emph{Ann. of Math.}, 164(3):993--1032, 2006]. To obtain the uniqueness of invariant measure, we use Malliavin calculus and anticipating stochastic calculus to establish the equi-continuity of the semigroup, the so-called {\em e-property}, and prove some weak irreducibility of the solution process. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.19264 [pdf, other]

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.18947 [pdf, other]

Multimodal Fusion on Low-quality Data: A Comprehensive Survey

Authors: Qingyang Zhang, Yake Wei, Zongbo Han, Huazhu Fu, Xi Peng, Cheng Deng, Qinghua Hu, Cai Xu, Jie Wen, Di Hu, Changqing Zhang

Abstract: Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges… ▽ More Multimodal fusion focuses on integrating information from multiple modalities with the goal of more accurate prediction, which has achieved remarkable progress in a wide range of scenarios, including autonomous driving and medical diagnosis. However, the reliability of multimodal fusion remains largely unexplored especially under low-quality data settings. This paper surveys the common challenges and recent advances of multimodal fusion in the wild and presents them in a comprehensive taxonomy. From a data-centric view, we identify four main challenges that are faced by multimodal fusion on low-quality data, namely (1) noisy multimodal data that are contaminated with heterogeneous noises, (2) incomplete multimodal data that some modalities are missing, (3) imbalanced multimodal data that the qualities or properties of different modalities are significantly different and (4) quality-varying multimodal data that the quality of each modality dynamically changes with respect to different samples. This new taxonomy will enable researchers to understand the state of the field and identify several potential directions. We also provide discussion for the open problems in this field together with interesting future research directions. △ Less

Submitted 5 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: Feel free to comment on our manuscript: qingyangzhang@tju.edu.cn

Showing 1–50 of 942 results for author: Peng, X