subscribe to arXiv mailings

Machine learning accelerated prediction of Ce-based ternary compounds involving antagonistic pairs

Authors: Weiyi Xia, Wei-Shen Tee, Paul C. Canfield, Fernando Assis Garcia, Raquel D Ribeiro, Yongbin Lee, Liqin Ke, Rebecca Flint, Cai-Zhuang Wang

Abstract: The discovery of novel quantum materials within ternary phase spaces containing antagonistic pair such as Fe with Bi, Pb, In, and Ag, presents significant challenges yet holds great potential. In this work, we investigate the stabilization of these immiscible pairs through the integration of Cerium (Ce), an abundant rare-earth and cost-effective element. By employing a machine learning (ML)-guided… ▽ More The discovery of novel quantum materials within ternary phase spaces containing antagonistic pair such as Fe with Bi, Pb, In, and Ag, presents significant challenges yet holds great potential. In this work, we investigate the stabilization of these immiscible pairs through the integration of Cerium (Ce), an abundant rare-earth and cost-effective element. By employing a machine learning (ML)-guided framework, particularly crystal graph convolutional neural networks (CGCNN), combined with first-principles calculations, we efficiently explore the composition/structure space and predict 9 stable and 37 metastable Ce-Fe-X (X=Bi, Pb, In and Ag) ternary compounds. Our findings include the identification of multiple new stable and metastable phases, which are evaluated for their structural and energetic properties. These discoveries not only contribute to the advancement of quantum materials but also offer viable alternatives to critical rare earth elements, underscoring the importance of Ce-based intermetallic compounds in technological applications. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.07871 [pdf, other]

Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation

Authors: Wentao Xiao, Yueyang Zhan, Rui Xi, Mengshu Hou, Jianming Liao

Abstract: The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small Worl… ▽ More The approximate nearest neighbor search (ANNS) is a fundamental and essential component in data mining and information retrieval, with graph-based methodologies demonstrating superior performance compared to alternative approaches. Extensive research efforts have been dedicated to improving search efficiency by developing various graph-based indices, such as HNSW (Hierarchical Navigable Small World). However, the performance of HNSW and most graph-based indices become unacceptable when faced with a large number of real-time deletions, insertions, and updates. Furthermore, during update operations, HNSW can result in some data points becoming unreachable, a situation we refer to as the `unreachable points phenomenon'. This phenomenon could significantly affect the search accuracy of the graph in certain situations. To address these issues, we present efficient measures to overcome the shortcomings of HNSW, specifically addressing poor performance over long periods of delete and update operations and resolving the issues caused by the unreachable points phenomenon. Our proposed MN-RU algorithm effectively improves update efficiency and suppresses the growth rate of unreachable points, ensuring better overall performance and maintaining the integrity of the graph. Our results demonstrate that our methods outperform existing approaches. Furthermore, since our methods are based on HNSW, they can be easily integrated with existing indices widely used in the industrial field, making them practical for future real-world applications. Code is available at \url{https://github.com/xwt1/MN-RU.git} △ Less

Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07614 [pdf, other]

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Authors: Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang

Abstract: Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by in… ▽ More Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing the textual component while fine-tuning the visual component. This methodology preserves the NLP capabilities of LLMs while imbuing them with exceptional visual understanding. Building upon the powerful base of the pre-trained Qwen-7B, MARS stands out with its bilingual generative capabilities corresponding to both English and Chinese language prompts and the capacity for joint image and text generation. The flexibility of this framework lends itself to migration towards any-to-any task adaptability. Furthermore, MARS employs a multi-stage training strategy that first establishes robust image-text alignment through complementary bidirectional tasks and subsequently concentrates on refining the T2I generation process, significantly augmenting text-image synchrony and the granularity of image details. Notably, MARS requires only 9% of the GPU days needed by SD1.5, yet it achieves remarkable results across a variety of benchmarks, illustrating the training efficiency and the potential for swift deployment in various applications. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures

arXiv:2407.05984 [pdf, other]

MBA-Net: SAM-driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation

Authors: Yifan Gao, Wei Xia, Wenkui Wang, Xin Gao

Abstract: Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capa… ▽ More Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capabilities of the Segment Anything Model (SAM) with domain-specific knowledge for accurate and robust ovarian tumor segmentation. MBA-Net employs a hybrid encoder architecture, where the encoder consists of a prior branch, which inherits the SAM encoder to capture robust segmentation priors, and a domain branch, specifically designed to extract domain-specific features. The bidirectional flow of information between the two branches is facilitated by the robust feature injection network (RFIN) and the domain knowledge integration network (DKIN), enabling MBA-Net to leverage the complementary strengths of both branches. We extensively evaluate MBA-Net on the public multi-modality ovarian tumor ultrasound dataset and the in-house multi-site ovarian tumor MRI dataset. Our proposed method consistently outperforms state-of-the-art segmentation approaches. Moreover, MBA-Net demonstrates superior generalization capability across different imaging modalities and clinical sites. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: MICCAI 2024

arXiv:2407.02883 [pdf, other]

CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

Authors: Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Yichun Yin, Hao Zhang, Yong Liu, Yasheng Wang, Ruiming Tang

Abstract: Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this… ▽ More Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present \textbf{\name} (\textbf{Co}de \textbf{I}nformation \textbf{R}etrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. \name comprises \textbf{ten} meticulously curated code datasets, spanning \textbf{eight} distinctive retrieval tasks across \textbf{seven} diverse domains. We first discuss the construction of \name and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using \name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. To facilitate easy adoption and integration within existing research workflows, \name has been developed as a user-friendly Python framework, readily installable via pip. It shares same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through \name, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems\footnote{\url{ https://github.com/CoIR-team/coir}}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02875 [pdf, ps, other]

The structure of deformed double complexes on the Iwasawa manifold

Authors: Yan Hu, Wei Xia

Abstract: The Kuranishi family of the Iwasawa manifold give rise naturally to a family of (deformed) double complexes. By using the structure theorem of double complexes due to Stelzig and Qi-Khovanov, we show there are exactly $3$ isomorphism types in this family and determine explicitly structures of these $3$ types. As an application, we computed the Frölicher spectral sequence for each fiber in the Kura… ▽ More The Kuranishi family of the Iwasawa manifold give rise naturally to a family of (deformed) double complexes. By using the structure theorem of double complexes due to Stelzig and Qi-Khovanov, we show there are exactly $3$ isomorphism types in this family and determine explicitly structures of these $3$ types. As an application, we computed the Frölicher spectral sequence for each fiber in the Kuranishi family of the Iwasawa manifold. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 26 pages

MSC Class: 57T15; 32Q99; 32C35; 18G40

arXiv:2407.01245 [pdf, other]

SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

Authors: Lingyue Fu, Hao Guan, Kounianhua Du, Jianghao Lin, Wei Xia, Weinan Zhang, Ruiming Tang, Yasheng Wang, Yong Yu

Abstract: Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently a… ▽ More Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently arrive in the database. In addition, existing KT models only implicitly consider the correlation between concepts and questions, lacking direct modeling of the more complex relationships in the heterogeneous graph of concepts and questions. In this paper, we propose a Structure-aware Inductive Knowledge Tracing model with large language model (dubbed SINKT), which, for the first time, introduces large language models (LLMs) and realizes inductive knowledge tracing. Firstly, SINKT utilizes LLMs to introduce structural relationships between concepts and constructs a heterogeneous graph for concepts and questions. Secondly, by encoding concepts and questions with LLMs, SINKT incorporates semantic information to aid prediction. Finally, SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation. Experiments on four real-world datasets demonstrate that SINKT achieves state-of-the-art performance among 12 existing transductive KT models. Additionally, we explore the performance of SINKT on the inductive KT task and provide insights into various modules. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00772 [pdf, other]

Core-level signature of long-range density-wave order and short-range excitonic correlations probed by attosecond broadband spectroscopy

Authors: Alfred Zong, Sheng-Chih Lin, Shunsuke A. Sato, Emma Berger, Bailey R. Nebgen, Marcus Hui, B. Q. Lv, Yun Cheng, Wei Xia, Yanfeng Guo, Dao Xiang, Michael W. Zuerch

Abstract: Advances in attosecond core-level spectroscopies have successfully unlocked the fastest dynamics involving high-energy electrons. Yet, these techniques are not conventionally regarded as an appropriate probe for low-energy quasiparticle interactions that govern the ground state of quantum materials, nor for studying long-range order because of their limited sensitivity to local charge environments… ▽ More Advances in attosecond core-level spectroscopies have successfully unlocked the fastest dynamics involving high-energy electrons. Yet, these techniques are not conventionally regarded as an appropriate probe for low-energy quasiparticle interactions that govern the ground state of quantum materials, nor for studying long-range order because of their limited sensitivity to local charge environments. Here, by employing a unique cryogenic attosecond beamline, we identified clear core-level signatures of long-range charge-density-wave (CDW) formation in a quasi-2D excitonic insulator candidate, even though equilibrium photoemission and absorption measurements of the same core levels showed no spectroscopic singularity at the phase transition. Leveraging the high time resolution and intrinsic sensitivity to short-range charge excitations in attosecond core-level absorption, we observed compelling time-domain evidence for excitonic correlations in the normal-state of the material, whose presence has been subjected to a long-standing debate in equilibrium experiments because of interfering phonon fluctuations in a similar part of the phase space. Our findings support the scenario that short-range excitonic fluctuations prelude long-range order formation in the ground state, providing important insights in the mechanism of exciton condensation in a quasi-low-dimensional system. These results further demonstrate the importance of a simultaneous access to long- and short-range order with underlying dynamical processes spanning a multitude of time- and energy-scales, making attosecond spectroscopy an indispensable tool for both understanding the equilibrium phase diagram and for discovering novel, nonequilibrium states in strongly correlated materials. △ Less

Submitted 16 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19646 [pdf, other]

Time-optimal Flight in Cluttered Environments via Safe Reinforcement Learning

Authors: Wei Xiao, Zhaohan Feng, Ziyu Zhou, Jian Sun, Gang Wang, Jie Chen

Abstract: This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, t… ▽ More This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, thereby restricting the flexibility of movement. In this work, we present a safe reinforcement learning approach for autonomous drone racing with time-optimal flight in cluttered environments. The reinforcement learning policy, trained using safety and terminal rewards specifically designed to enforce near time-optimal and collision-free flight, outperforms current state-of-the-art algorithms. Additionally, experimental results demonstrate the efficacy of the proposed approach in achieving both minimum flight time and obstacle avoidance objectives in complex environments, with a commendable $66.7\%$ success rate in unseen, challenging settings. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 7 pages, 3 figures,

arXiv:2406.16935 [pdf, other]

Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

Authors: Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman

Abstract: We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \tex… ▽ More We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \textit{MacaqueITBench}, we investigated the impact of distribution shifts on models predicting neural activity by dividing the images into Out-Of-Distribution (OOD) train and test splits. The OOD splits included several different image-computable types including image contrast, hue, intensity, temperature, and saturation. Compared to the performance on in-distribution test images -- the conventional way these models have been evaluated -- models performed worse at predicting neuronal responses to out-of-distribution images, retaining as little as $20\%$ of the performance on in-distribution test images. The generalization performance under OOD shifts can be well accounted by a simple image similarity metric -- the cosine distance between image representations extracted from a pre-trained object recognition model is a strong predictor of neural predictivity under different distribution shifts. The dataset of images, neuronal firing rate recordings, and computational benchmarks are hosted publicly at: https://bit.ly/3zeutVd. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.14024 [pdf, other]

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Authors: Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

Abstract: Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale la… ▽ More Mathematical verfier achieves success in mathematical reasoning tasks by validating the correctness of solutions. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedbacks as rationale labels (i.e., the correctness of the current step and the explanations). In this paper, we propose \textbf{Math-Minos}, a natural language feedback enhanced verifier by constructing automatically-generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set (30k) of natural language feedbacks can significantly boost the performance of the verifier by the accuracy of 1.6\% (86.6\% $\rightarrow$ 88.2\%) on GSM8K and 0.8\% (37.8\% $\rightarrow$ 38.6\%) on MATH. We have released our code and data for further exploration. △ Less

Submitted 8 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.13702 [pdf]

doi 10.1038/s41563-024-01914-z

Van-Hove annihilation and nematic instability on a Kagome lattice

Authors: Yu-Xiao Jiang, Sen Shao, Wei Xia, M. Michael Denner, Julian Ingham, Md Shafayat Hossain, Qingzheng Qiu, Xiquan Zheng, Hongyu Chen, Zi-Jia Cheng, Xian P. Yang, Byunghoon Kim, Jia-Xin Yin, Songbo Zhang, Maksim Litskevich, Qi Zhang, Tyler A. Cochran, Yingying Peng, Guoqing Chang, Yanfeng Guo, Ronny Thomale, Titus Neupert, M. Zahid Hasan

Abstract: Novel states of matter arise in quantum materials due to strong interactions among electrons. A nematic phase breaks the point group symmetry of the crystal lattice and is known to emerge in correlated materials. Here we report the observation of an intra-unit-cell nematic order and signatures of Pomeranchuk instability in the Kagome metal ScV6Sn6. Using scanning tunneling microscopy and spectrosc… ▽ More Novel states of matter arise in quantum materials due to strong interactions among electrons. A nematic phase breaks the point group symmetry of the crystal lattice and is known to emerge in correlated materials. Here we report the observation of an intra-unit-cell nematic order and signatures of Pomeranchuk instability in the Kagome metal ScV6Sn6. Using scanning tunneling microscopy and spectroscopy, we reveal a stripe-like nematic order breaking the crystal rotational symmetry within the Kagome lattice itself. Moreover, we identify a set of van Hove singularities adhering to the Kagome layer electrons, which appear along one direction of the Brillouin zone while being annihilated along other high-symmetry directions, revealing a rotational symmetry breaking. Via detailed spectroscopic maps, we further observe an elliptical deformation of Fermi surface, which provides direct evidence for an electronically mediated nematic order. Our work not only bridges the gap between electronic nematicity and Kagome physics, but also sheds light on the potential mechanism for realizing symmetry-broken phases in correlated electron systems. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 19 pages, 5 figures, accepted for publication in Nature materials

Journal ref: Nat. Mater. (2024)

arXiv:2406.13025 [pdf, other]

ABNet: Attention BarrierNet for Safe and Scalable Robot Learning

Authors: Wei Xiao, Tsun-Hsuan Wang, Daniela Rus

Abstract: Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attentio… ▽ More Safe learning is central to AI-enabled robots where a single failure may lead to catastrophic results. Barrier-based method is one of the dominant approaches for safe robot learning. However, this method is not scalable, hard to train, and tends to generate unstable signals under noisy inputs that are challenging to be deployed for robots. To address these challenges, we propose a novel Attention BarrierNet (ABNet) that is scalable to build larger foundational safe models in an incremental manner. Each head of BarrierNet in the ABNet could learn safe robot control policies from different features and focus on specific part of the observation. In this way, we do not need to one-shotly construct a large model for complex tasks, which significantly facilitates the training of the model while ensuring its stable output. Most importantly, we can still formally prove the safety guarantees of the ABNet. We demonstrate the strength of ABNet in 2D robot obstacle avoidance, safe robot manipulation, and vision-based end-to-end autonomous driving, with results showing much better robustness and guarantees over existing models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 18 pages

arXiv:2406.12463 [pdf, other]

LFMamba: Light Field Image Super-Resolution with State Space Model

Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11162 [pdf, other]

How Good are LLMs at Relation Extraction under Low-Resource Scenario? Comprehensive Evaluation

Authors: Dawulie Jinensibieke, Mieradilijiang Maimaiti, Wentao Xiao, Yuanhang Zheng, Xiaobo Wang

Abstract: Relation Extraction (RE) serves as a crucial technology for transforming unstructured text into structured information, especially within the framework of Knowledge Graph development. Its importance is emphasized by its essential role in various downstream tasks. Besides the conventional RE methods which are based on neural networks and pre-trained language models, large language models (LLMs) are… ▽ More Relation Extraction (RE) serves as a crucial technology for transforming unstructured text into structured information, especially within the framework of Knowledge Graph development. Its importance is emphasized by its essential role in various downstream tasks. Besides the conventional RE methods which are based on neural networks and pre-trained language models, large language models (LLMs) are also utilized in the research field of RE. However, on low-resource languages (LRLs), both conventional RE methods and LLM-based methods perform poorly on RE due to the data scarcity issues. To this end, this paper constructs low-resource relation extraction datasets in 10 LRLs in three regions (Central Asia, Southeast Asia and Middle East). The corpora are constructed by translating the original publicly available English RE datasets (NYT10, FewRel and CrossRE) using an effective multilingual machine translation. Then, we use the language perplexity (PPL) to filter out the low-quality data from the translated datasets. Finally, we conduct an empirical study and validate the performance of several open-source LLMs on these generated LRL RE datasets. △ Less

Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10943 [pdf, other]

Rectified Iterative Disparity for Stereo Matching

Authors: Weiqing Xiao

Abstract: Both uncertainty-assisted and iteration-based methods have achieved great success in stereo matching. However, existing uncertainty estimation methods take a single image and the corresponding disparity as input, which imposes higher demands on the estimation network. In this paper, we propose Cost volume-based disparity Uncertainty Estimation (UEC). Based on the rich similarity information in the… ▽ More Both uncertainty-assisted and iteration-based methods have achieved great success in stereo matching. However, existing uncertainty estimation methods take a single image and the corresponding disparity as input, which imposes higher demands on the estimation network. In this paper, we propose Cost volume-based disparity Uncertainty Estimation (UEC). Based on the rich similarity information in the cost volume coming from the image pairs, the proposed UEC can achieve competitive performance with low computational cost. Secondly, we propose two methods of uncertainty-assisted disparity estimation, Uncertainty-based Disparity Rectification (UDR) and Uncertainty-based Disparity update Conditioning (UDC). These two methods optimise the disparity update process of the iterative-based approach without adding extra parameters. In addition, we propose Disparity Rectification loss that significantly improves the accuracy of small amount of disparity updates. We present a high-performance stereo architecture, DR Stereo, which is a combination of the proposed methods. Experimental results from SceneFlow, KITTI, Middlebury 2014, and ETH3D show that DR-Stereo achieves very competitive disparity estimation performance. △ Less

Submitted 2 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.08858 [pdf, other]

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Project page: https://omni.human2humanoid.com/

arXiv:2406.08839 [pdf, other]

NeRF Director: Revisiting View Selection in Neural Volume Rendering

Authors: Wenhui Xiao, Rodrigo Santa Cruz, David Ahmedt-Aristizabal, Olivier Salvado, Clinton Fookes, Leo Lebrat

Abstract: Neural Rendering representations have significantly contributed to the field of 3D computer vision. Given their potential, considerable efforts have been invested to improve their performance. Nonetheless, the essential question of selecting training views is yet to be thoroughly investigated. This key aspect plays a vital role in achieving high-quality results and aligns with the well-known tenet… ▽ More Neural Rendering representations have significantly contributed to the field of 3D computer vision. Given their potential, considerable efforts have been invested to improve their performance. Nonetheless, the essential question of selecting training views is yet to be thoroughly investigated. This key aspect plays a vital role in achieving high-quality results and aligns with the well-known tenet of deep learning: "garbage in, garbage out". In this paper, we first illustrate the importance of view selection by demonstrating how a simple rotation of the test views within the most pervasive NeRF dataset can lead to consequential shifts in the performance rankings of state-of-the-art techniques. To address this challenge, we introduce a unified framework for view selection methods and devise a thorough benchmark to assess its impact. Significant improvements can be achieved without leveraging error or uncertainty estimation but focusing on uniform view coverage of the reconstructed object, resulting in a training-free approach. Using this technique, we show that high-quality renderings can be achieved faster by using fewer views. We conduct extensive experiments on both synthetic datasets and realistic data to demonstrate the effectiveness of our proposed method compared with random, conventional error-based, and uncertainty-guided view selection. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: CVPR2024

arXiv:2406.08313 [pdf, other]

Searching for bound states in the open strangeness systems

Authors: C. W. Xiao, J. J. Wu

Abstract: Inspired by the recent findings of $Z_{cs}$ and $P_{cs}$ states, we investigate the strong interactions of the systems with open strangeness(es) from the light sector to the heavy sector (no beauty quark), where the interaction potential is derived from the vector meson exchange mechanism in $t$- and $u$-channels. In the current work, we discuss all of single channel cases for the open strangeness… ▽ More Inspired by the recent findings of $Z_{cs}$ and $P_{cs}$ states, we investigate the strong interactions of the systems with open strangeness(es) from the light sector to the heavy sector (no beauty quark), where the interaction potential is derived from the vector meson exchange mechanism in $t$- and $u$-channels. In the current work, we discuss all of single channel cases for the open strangeness in the systemic framework, where the resonances $X_0(2866)$, $D^*_{s0}(2317)$ and $D_{s1}(2460)$ are dynamically generated. Furthermore, there are many new exotics predicted. In addition, the left-hand cut problem in $t$- and $u$-channels is discussed in detail. △ Less

Submitted 19 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: More comments added

arXiv:2406.06953 [pdf, other]

Stepwise Regression and Pre-trained Edge for Robust Stereo Matching

Authors: Weiqing Xiao, Wei Zhao

Abstract: Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the presence of substantial disparity distributions and density variations across different datasets presents significant challenges for the generalization and fine-tuning of… ▽ More Due to the difficulty in obtaining real samples and ground truth, the generalization performance and the fine-tuned performance are critical for the feasibility of stereo matching methods in real-world applications. However, the presence of substantial disparity distributions and density variations across different datasets presents significant challenges for the generalization and fine-tuning of the model. In this paper, we propose a novel stereo matching method, called SR-Stereo, which mitigates the distributional differences across different datasets by predicting the disparity clips and uses a loss weight related to the regression target scale to improve the accuracy of the disparity clips. Moreover, this stepwise regression architecture can be easily extended to existing iteration-based methods to improve the performance without changing the structure. In addition, to mitigate the edge blurring of the fine-tuned model on sparse ground truth, we propose Domain Adaptation Based on Pre-trained Edges (DAPE). Specifically, we use the predicted disparity and RGB image to estimate the edge map of the target domain image. The edge map is filtered to generate edge map background pseudo-labels, which together with the sparse ground truth disparity on the target domain are used as a supervision to jointly fine-tune the pre-trained stereo matching model. These proposed methods are extensively evaluated on SceneFlow, KITTI, Middbury 2014 and ETH3D. The SR-Stereo achieves competitive disparity estimation performance and state-of-the-art cross-domain generalisation performance. Meanwhile, the proposed DAPE significantly improves the disparity estimation performance of fine-tuned models, especially in the textureless and detail regions. △ Less

Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06005 [pdf, other]

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Authors: Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

Abstract: Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still r… ▽ More Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Website and Videos: https://lecar-lab.github.io/wococo/

arXiv:2406.04594 [pdf, other]

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03243 [pdf, other]

Llumnix: Dynamic Scheduling for Large Language Model Serving

Authors: Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin

Abstract: Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamen… ▽ More Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamentally limited in handling these characteristics and cause problems such as severe queuing delays, poor tail latencies, and SLO violations. We introduce Llumnix, an LLM serving system that reacts to such heterogeneous and unpredictable requests by runtime rescheduling across multiple model instances. Similar to context switching across CPU cores in modern operating systems, Llumnix reschedules requests to improve load balancing and isolation, mitigate resource fragmentation, and differentiate request priorities and SLOs. Llumnix implements the rescheduling with an efficient and scalable live migration mechanism for requests and their in-memory states, and exploits it in a dynamic scheduling policy that unifies the multiple rescheduling scenarios elegantly. Our evaluations show that Llumnix improves tail latencies by an order of magnitude, accelerates high-priority requests by up to 1.5x, and delivers up to 36% cost savings while achieving similar tail latencies, compared against state-of-the-art LLM serving systems. Llumnix is publicly available at https://github.com/AlibabaPAI/llumnix. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: To appear at OSDI '24; open-source repo will be available in June 2024

arXiv:2406.02069 [pdf, other]

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

Authors: Zefan Cai., Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao

Abstract: In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately foc… ▽ More In this study, we investigate whether attention-based information flow inside large language models (LLMs) is aggregated through noticeable patterns for long context processing. Our observations reveal that LLMs aggregate information through Pyramidal Information Funneling where attention is scattering widely in lower layers, progressively consolidating within specific contexts, and ultimately focusin on critical tokens (a.k.a massive activation or attention sink) in higher layers. Motivated by these insights, we developed PyramidKV, a novel and effective KV cache compression method. This approach dynamically adjusts the KV cache size across different layers, allocating more cache in lower layers and less in higher ones, diverging from traditional methods that maintain a uniform KV cache size. Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage. In scenarios emphasizing memory efficiency, where only 0.7% of the KV cache is maintained, PyramidKV surpasses other KV cache compression techniques achieving up to a 20.5 absolute accuracy improvement on TREC. △ Less

Submitted 16 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00439 [pdf, other]

Learning Manipulation by Predicting Interaction

Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical interaction during the manipulation process, resulting in an inadequate understanding of the relationship between objects and the environment. To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation.Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively. These two learning objectives achieve superior comprehension towards "how-to-interact" and "where-to-interact". We conduct a comprehensive evaluation of several challenging robotic tasks.The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms as well as simulation environments. Code and checkpoints are publicly shared at https://github.com/OpenDriveLab/MPI. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

arXiv:2405.19728 [pdf, ps, other]

Legendre symbols related to $D_p(b,1)$

Authors: Xin-Qi Luo, Wei Xia

Abstract: Let $p$ be an odd prime. For any $b,c\in\mathbb{Z}$, Z.-W. Sun introduced the new-type determinant $$D_p(b,c)=|(i^2+bij+cj^2)^{p-2}|_{1\leqslant i,j\leqslant p-1},$$ and studied its arithmetic properties. In this paper we mainly prove that $$\left(\frac{D_p(b,1)}{p}\right)=\left(\frac{2b}{p}\right)$$ when $(\frac{b^2-4}{p})=-1$ and $p\equiv1\pmod 4$. As an application of our result, we confirm sev… ▽ More Let $p$ be an odd prime. For any $b,c\in\mathbb{Z}$, Z.-W. Sun introduced the new-type determinant $$D_p(b,c)=|(i^2+bij+cj^2)^{p-2}|_{1\leqslant i,j\leqslant p-1},$$ and studied its arithmetic properties. In this paper we mainly prove that $$\left(\frac{D_p(b,1)}{p}\right)=\left(\frac{2b}{p}\right)$$ when $(\frac{b^2-4}{p})=-1$ and $p\equiv1\pmod 4$. As an application of our result, we confirm several conjectures of Sun. △ Less

Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: 8pages

arXiv:2405.19586 [pdf, other]

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

Authors: Junjie Zhang, Chenjia Bai, Haoran He, Wenke Xia, Zhigang Wang, Bin Zhao, Xiu Li, Xuelong Li

Abstract: Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks… ▽ More Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks and inefficient execution in long-horizon reasoning. In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning. Specifically, we adopt Segment Anything (SAM) pre-trained on a huge number of images and promptable masks as the foundation model for extracting task-relevant features, and employ parameter-efficient fine-tuning on robot data for a better understanding of embodied scenarios. To address long-horizon reasoning, we develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass, notably enhancing execution efficiency. Experimental results from various instruction-following tasks demonstrate that SAM-E achieves superior performance with higher execution efficiency compared to the baselines, and also significantly improves generalization in few-shot adaptation to new tasks. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: ICML 2024. Project page: https://sam-embodied.github.io

arXiv:2405.19487 [pdf, other]

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

Authors: Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia

Abstract: We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allo… ▽ More We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allowing the system to simultaneously speak and listen to the user. The LLM generates textual tokens for inquiry responses and makes autonomous decisions to start responding to, wait for, or interrupt the user by emitting control tokens to the neural FSM. All these tasks of the LLM are carried out as next token prediction on a serialized view of the dialogue in real-time. In automatic quality evaluations simulating real-life interaction, the proposed system reduces the average conversation response latency by more than 3 folds compared with LLM-based half-duplex dialogue systems while responding within less than 500 milliseconds in more than 50% of evaluated interactions. Running a LLM with only 8 billion parameters, our system exhibits a 8% higher interruption precision rate than the best available commercial LLM for voice-based dialogue. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17627 [pdf, other]

Salutary Labeling with Zero Human Annotation

Authors: Wenxiao Xiao, Hongfu Liu

Abstract: Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly… ▽ More Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly labor annotations required for ground truth labels. In contrast to traditional ground truth labeling, this paper proposes salutary labeling, which automatically assigns the most beneficial labels to the most informative samples without human annotation. Specifically, we utilize the influence function, a tool for estimating sample influence, to select newly added samples and assign their salutary labels by choosing the category that maximizes their positive influence. This process eliminates the need for human annotation. Extensive experiments conducted on nine benchmark datasets demonstrate the superior performance of our salutary labeling approach over traditional active learning strategies. Additionally, we provide several in-depth explorations and practical applications of large language model (LLM) fine-tuning. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16765 [pdf, ps, other]

Study of Robust Direction Finding Based on Joint Sparse Representation

Authors: Y. Li, W. Xiao, L. Zhao, Z. Huang, Q. Li, L. Li, R. C. de Lamare

Abstract: Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences,… ▽ More Standard Direction of Arrival (DOA) estimation methods are typically derived based on the Gaussian noise assumption, making them highly sensitive to outliers. Therefore, in the presence of impulsive noise, the performance of these methods may significantly deteriorate. In this paper, we model impulsive noise as Gaussian noise mixed with sparse outliers. By exploiting their statistical differences, we propose a novel DOA estimation method based on sparse signal recovery (SSR). Furthermore, to address the issue of grid mismatch, we utilize an alternating optimization approach that relies on the estimated outlier matrix and the on-grid DOA estimates to obtain the off-grid DOA estimates. Simulation results demonstrate that the proposed method exhibits robustness against large outliers. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 6 pages, 4 figures

arXiv:2405.15210 [pdf]

Spin chirality engineering induced giant topological Hall effect in a kagome magnet

Authors: Wei Xia, Shihao Zhang, Jian Yuan, Yurui Wei, Haonan Wang, Hong Du, Xiangqi Liu, Jiangteng Guo, Zicheng Tao, Ke Qu, Xia Wang, Xuerong Liu, Wenbo Wang, Jinguang Cheng, Yulin Chen, Jianpeng Liu, Ruidan Zhong, Xuewen Fu, Zhenzhong Yang, Yanfeng Guo

Abstract: The ferrimagnet TbMn6Sn6 has attracted vast attention, because its pristine Mn kagome lattice with strong spin-orbit coupling and out-of-plane Tb-Mn exchange supports quantum-limit Chern topological magnetism which can be described by the simple spinless Haldane model. We unveil herein that engineering the pristine kagome lattice through partial replacement of Mn by nonmagnetic Cr which tends to c… ▽ More The ferrimagnet TbMn6Sn6 has attracted vast attention, because its pristine Mn kagome lattice with strong spin-orbit coupling and out-of-plane Tb-Mn exchange supports quantum-limit Chern topological magnetism which can be described by the simple spinless Haldane model. We unveil herein that engineering the pristine kagome lattice through partial replacement of Mn by nonmagnetic Cr which tends to concentrate into the single Mn1 layer in a unit cell breaks the collinear configuration of Mn spins and reduces the D6h point group symmetry to the C2 one. The nearly isolated Tb networks result in easily polarized Tb spins even under a weak magnetic field, and simultaneously, different spin chirality of the Tb-Mn1-Mn1 and Mn1-Mn1-Mn1. Such a peculiar spin structure leads to a plateau-like topological Hall effect with a record resistivity of 19.1 μOhm cm among bulk systems. Our direct visualization of the domain-wall structure and its evolution under external magnetic field fully support the picture, thus highlighting the pivotal role of broken kagome lattice symmetry in generating the peculiar spin chirality in real space. Our results set a paradigm for exploration of exotic properties in kagome topological magnets and would be a proof-of-principle strategy for investigating the correlation between magnetism and exotic topological properties in kagome lattice. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 33 pages,4 main figures and 16 SI figures

arXiv:2405.15202 [pdf, other]

Cross-Task Defense: Instruction-Tuning LLMs for Content Safety

Authors: Yu Fu, Wen Xiao, Jia Chen, Jiachen Li, Evangelos Papalexakis, Aichi Chien, Yue Dong

Abstract: Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robu… ▽ More Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robust defenses for LLMs in processing malicious documents alongside benign NLP task queries. We introduce a defense dataset comprised of safety-related examples and propose single-task and mixed-task losses for instruction tuning. Our empirical results demonstrate that LLMs can significantly enhance their capacity to safely manage dangerous content with appropriate instruction tuning. Additionally, strengthening the defenses of tasks most susceptible to misuse is effective in protecting LLMs against processing harmful information. We also observe that trade-offs between utility and safety exist in defense strategies, where Llama2, utilizing our proposed approach, displays a significantly better balance compared to Llama1. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: accepted to NAACL2024 TrustNLP workshop

arXiv:2405.12442 [pdf, other]

Learning Structure and Knowledge Aware Representation with Large Language Models for Concept Recommendation

Authors: Qingyao Li, Wei Xia, Kounianhua Du, Qiji Zhang, Weinan Zhang, Ruiming Tang, Yong Yu

Abstract: Concept recommendation aims to suggest the next concept for learners to study based on their knowledge states and the human knowledge system. While knowledge states can be predicted using knowledge tracing models, previous approaches have not effectively integrated the human knowledge system into the process of designing these educational models. In the era of rapidly evolving Large Language Model… ▽ More Concept recommendation aims to suggest the next concept for learners to study based on their knowledge states and the human knowledge system. While knowledge states can be predicted using knowledge tracing models, previous approaches have not effectively integrated the human knowledge system into the process of designing these educational models. In the era of rapidly evolving Large Language Models (LLMs), many fields have begun using LLMs to generate and encode text, introducing external knowledge. However, integrating LLMs into concept recommendation presents two urgent challenges: 1) How to construct text for concepts that effectively incorporate the human knowledge system? 2) How to adapt non-smooth, anisotropic text encodings effectively for concept recommendation? In this paper, we propose a novel Structure and Knowledge Aware Representation learning framework for concept Recommendation (SKarREC). We leverage factual knowledge from LLMs as well as the precedence and succession relationships between concepts obtained from the knowledge graph to construct textual representations of concepts. Furthermore, we propose a graph-based adapter to adapt anisotropic text embeddings to the concept recommendation task. This adapter is pre-trained through contrastive learning on the knowledge graph to get a smooth and structure-aware concept representation. Then, it's fine-tuned through the recommendation task, forming a text-to-knowledge-to-recommendation adaptation pipeline, which effectively constructs a structure and knowledge-aware concept representation. Our method does a better job than previous adapters in transforming text encodings for application in concept recommendation. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed approach. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.11024 [pdf, other]

GraSS: Combining Graph Neural Networks with Expert Knowledge for SAT Solver Selection

Authors: Zhanguang Zhang, Didier Chetelat, Joseph Cotnareanu, Amur Ghose, Wenyi Xiao, Hui-Ling Zhen, Yingxue Zhang, Jianye Hao, Mark Coates, Mingxuan Yuan

Abstract: Boolean satisfiability (SAT) problems are routinely solved by SAT solvers in real-life applications, yet solving time can vary drastically between solvers for the same instance. This has motivated research into machine learning models that can predict, for a given SAT instance, which solver to select among several options. Existing SAT solver selection methods all rely on some hand-picked instance… ▽ More Boolean satisfiability (SAT) problems are routinely solved by SAT solvers in real-life applications, yet solving time can vary drastically between solvers for the same instance. This has motivated research into machine learning models that can predict, for a given SAT instance, which solver to select among several options. Existing SAT solver selection methods all rely on some hand-picked instance features, which are costly to compute and ignore the structural information in SAT graphs. In this paper we present GraSS, a novel approach for automatic SAT solver selection based on tripartite graph representations of instances and a heterogeneous graph neural network (GNN) model. While GNNs have been previously adopted in other SAT-related tasks, they do not incorporate any domain-specific knowledge and ignore the runtime variation introduced by different clause orders. We enrich the graph representation with domain-specific decisions, such as novel node feature design, positional encodings for clauses in the graph, a GNN architecture tailored to our tripartite graphs and a runtime-sensitive loss function. Through extensive experiments, we demonstrate that this combination of raw representations and domain-specific choices leads to improvements in runtime for a pool of seven state-of-the-art solvers on both an industrial circuit design benchmark, and on instances from the 20-year Anniversary Track of the 2022 SAT Competition. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024

arXiv:2405.07156 [pdf]

Direct visualization of the impurity occupancy roadmap in Ni-substituted van der Waals ferromagnet Fe3GaTe2

Authors: Jian Yuan, Haonan Wang, Xiaofei Hou, Binshuo Zhang, Yurui Wei, Jiangteng Guo, Lu Sun, Zhenhai Yu, Zhikai Li, Xiangqi Liu, Wei Xia, Xia Wang, Xuerong Liu, Yulin Chen, Shihao Zhang, Xuewen Fu, Ke Qu, Zhenzhong Yang, Yanfeng Guo

Abstract: Impurity substitution is a general strategy to study the intrinsic properties of a quantum material. However, when the target element has more than one Wyckoff position in the lattice, it is a big challenge but with extreme necessity to know the exact position and order of the occupancy of impurity atoms. Via comprehensive experimental and theoretical investigations, we establish herein the roadma… ▽ More Impurity substitution is a general strategy to study the intrinsic properties of a quantum material. However, when the target element has more than one Wyckoff position in the lattice, it is a big challenge but with extreme necessity to know the exact position and order of the occupancy of impurity atoms. Via comprehensive experimental and theoretical investigations, we establish herein the roadmap for Ni substitution in Fe3GaTe2, a van der Waals ferromagnet with the Curie temperature TC even reaching ~ 380 K. The results unambiguously reveal that in (Fe1-xNix)3GaTe2, Ni atoms initially form an van der Waals interlayer gap Ni3 sites when x < 0.1, and then gradually occupy the Fe2 sites. After replacing the Fe2 sites at x of ~ 0.75, they start to substitute for the Fe1 sites and eventually realize a full occupation at x = 1.0. Accordingly, TC and saturation magnetic moments of (Fe1-xNix)3GaTe2 both show nonlinear decrease, which is tightly tied to the Ni occupancy order as well as the different roles of Ni3, Fe1 and Fe2 sites in the spin Hamiltonian. The results not only yield fruitful insights into the essential roles of different Fe sites in producing the above room temperature high TC, but also set a paradigm for future impurity substitution study on other quantum materials. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 24 pages,5 main figures+4 SI figures+2 SI tables

arXiv:2405.05739 [pdf]

Preliminary Exploration on the Low-Pressure Ar-O2 Plasma Generated by Low-Frequency Alternating Current (AC) Power Supply

Authors: Niaz Wali, W. W. Xiao, Q. U. Din, N. U. Rehman, C. Y. Wang, J. T. Ma, W. J. Zhong, Q. W. Yang

Abstract: This study reports a low-frequency alternating current (AC) power supply as a novel approach for generating low-pressure capacitively coupled Ar-O2 plasma, offering advantages in cost, compactness, and operational simplicity, which are crucial for both material science and biological applications. The effectiveness of low-frequency AC-generated plasma against traditional RF systems by examining ke… ▽ More This study reports a low-frequency alternating current (AC) power supply as a novel approach for generating low-pressure capacitively coupled Ar-O2 plasma, offering advantages in cost, compactness, and operational simplicity, which are crucial for both material science and biological applications. The effectiveness of low-frequency AC-generated plasma against traditional RF systems by examining key plasma parameters such as electron density, electron temperature, and electron energy distribution function (EEDF), are investigated. Experimental results revealed that AC power supply could effectively produce low pressure Ar-O2 plasma with comparable properties to RF systems. Most notably, the AC-generated plasma achieved a significant reduction in bacterial growth, suggesting its potential as a more economical and flexible alternative for enhancing plasma-assisted applications in sterilization and material processing. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 16 pages, 7 figures

arXiv:2405.04434 [pdf, other]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02355 [pdf, other]

CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

Authors: Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are… ▽ More Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.02180 [pdf, other]

A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction

Authors: Weijie Xia, Chenguang Wang, Peter Palensky, Pedro P. Vergara

Abstract: Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional a… ▽ More Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it demonstrates superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models. △ Less

Submitted 9 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.16147 [pdf, other]

Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model

Authors: Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger

Abstract: The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving sc… ▽ More The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving scenarios. By inputting descriptive texts of driving conditions and specifying the criticality metric thresholds, the framework efficiently searches for desired scenarios and converts them into ASAM OpenSCENARIO and IPG CarMaker text files. This methodology streamlines the scenario extraction process and enhances efficiency. Simulations are executed to validate the efficiency of the approach. The framework is presented based on a user-friendly web app and is accessible via the following link: https://github.com/ftgTUGraz/Chat2Scenario. △ Less

Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: IEEE Intelligent Vehicles Symposium (IV 2024)

arXiv:2404.14233 [pdf, other]

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Authors: Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Hao Jiang, Fei Wu, Linchao Zhu

Abstract: The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g.… ▽ More The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g., labeling by proprietary models or human experts). To address these issues, we propose detecting and mitigating hallucinations in LVLMs via fine-grained AI feedback. The basic idea is that we generate a small-size sentence-level hallucination annotation dataset by proprietary models, whereby we train a hallucination detection model which can perform sentence-level hallucination detection, covering primary hallucination types (i.e., object, attribute, and relationship). Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model. Furthermore, we propose differentiating the severity of hallucinations, and introducing a Hallucination Severity-Aware Direct Preference Optimization (HSA-DPO) for mitigating hallucination in LVLMs by incorporating the severity of hallucinations into preference learning. Extensive experiments demonstrate the effectiveness of our method. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13804 [pdf, other]

Adaptive Heterogeneous Client Sampling for Federated Learning over Wireless Networks

Authors: Bing Luo, Wenli Xiao, Shiqiang Wang, Jianwei Huang, Leandros Tassiulas

Abstract: Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high deg… ▽ More Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm for FL over wireless networks that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probability. Based on the bound, we analytically establish the relationship between the total learning time and sampling probability with an adaptive bandwidth allocation scheme, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Our solution reveals the impact of system and statistical heterogeneity parameters on the optimal client sampling design. Moreover, our solution shows that as the number of sampled clients increases, the total convergence time first decreases and then increases because a larger sampling number reduces the number of rounds for convergence but results in a longer expected time per-round due to limited wireless bandwidth. Experimental results from both hardware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: Published in IEEE Transactions on Mobile Computing (TMC). arXiv admin note: substantial text overlap with arXiv:2112.11256

arXiv:2404.13033 [pdf, other]

Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs

Authors: Biyang Guo, He Wang, Wenyilin Xiao, Hong Chen, Zhuxin Lee, Songqiao Han, Hailiang Huang

Abstract: In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical appro… ▽ More In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical approach to enhancing LLMs' post-tuning performance by refining input, output, and reasoning designs. We conduct a series of in-domain (ID) and out-of-domain (OOD) experiments to assess the impact of various design options on LLMs' downstream performance, revealing several intriguing patterns that hold consistently across different LLMs. Based on these insights, we propose an integrated SDE strategy, combining the most effective options, and validate its consistent superiority over heuristic sample designs in complex downstream tasks like multi-aspect sentiment analysis, event extraction, and nested entity recognition. Additionally, analyses of LLMs' inherent prompt/output perplexity, zero-shot, and ICL abilities illustrate that good PE strategies may not always translate to good SDE strategies. Code available at https://github.com/beyondguo/LLM-Tuning. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 23 pages, 12 figures, 14 tables

arXiv:2404.12728 [pdf, other]

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

Authors: Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context… ▽ More Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two improved methods with significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts. △ Less

Submitted 23 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.08055 [pdf, other]

Complexity enriched dynamical phases for fermions on graphs

Authors: Wei Xia, Jie Zou, Xiaopeng Li

Abstract: Dynamical quantum phase transitions, encompassing phenomena like many-body localization transitions and measurement-induced phase transitions, are often characterized and identified through the analysis of quantum entanglement. Here, we highlight that the dynamical phases defined by entanglement are further enriched by complexity. We investigate both the entanglement and Krylov complexity for ferm… ▽ More Dynamical quantum phase transitions, encompassing phenomena like many-body localization transitions and measurement-induced phase transitions, are often characterized and identified through the analysis of quantum entanglement. Here, we highlight that the dynamical phases defined by entanglement are further enriched by complexity. We investigate both the entanglement and Krylov complexity for fermions on regular graphs, which can be implemented by systems like $^6$Li atoms confined by optical tweezers. Our investigations unveil that while entanglement follows volume laws on both types of regular graphs with degree $d = 2$ and $d = 3$, the Krylov complexity exhibits distinctive behaviors. We analyze both free fermions and interacting fermions models. In the absence of interaction, both numerical results and theoretical analysis confirm that the dimension of the Krylov space scales as $D\sim N$ for regular graphs of degree $d = 2$ with $N$ sites, and we have $D\sim N^2$ for $d = 3$. The qualitative distinction also persists in interacting fermions on regular graphs. For interacting fermions, our theoretical analyses find the dimension scales as $D\sim 4^{N^α}$ for regular graphs of $d = 2$ with $0.38\leqα\leq0.59$, whereas it scales as $D\sim 4^N$ for $d = 3$. The distinction in the complexity of quantum dynamics for fermions on graphs with different connectivity can be probed in experiments by measuring the out-of-time-order correlators. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07202 [pdf, other]

UMBRAE: Unified Multimodal Decoding of Brain Signals

Authors: Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue

Abstract: We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficie… ▽ More We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Project Page: https://weihaox.github.io/UMBRAE

arXiv:2404.02507 [pdf, other]

Lifelong Event Detection with Embedding Space Separation and Compaction

Authors: Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty

Abstract: To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new… ▽ More To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new data and the previously learned embedding space. Moreover, the model suffers from overfitting on the few memory samples rather than effectively remembering learned patterns. To address the challenges of forgetting and overfitting, we propose a novel method based on embedding space separation and compaction. Our method alleviates forgetting of previously learned tasks by forcing the feature distribution of new data away from the previous embedding space. It also mitigates overfitting by a memory calibration mechanism that encourages memory data to be close to its prototype to enhance intra-class compactness. In addition, the learnable parameters of the new task are initialized by drawing upon acquired knowledge from the previously learned task to facilitate forward knowledge transfer. With extensive experiments, we demonstrate that our method can significantly outperform previous state-of-the-art approaches. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: NAACL 2024 main conference

arXiv:2404.00881 [pdf, other]

Auxiliary-Variable Adaptive Control Lyapunov Barrier Functions for Spatio-Temporally Constrained Safety-Critical Applications

Authors: Shuo Liu, Wei Xiao, Calin A. Belta

Abstract: Recent work has shown that stabilizing an affine control system while optimizing a quadratic cost subject to state and control constraints can be mapped to a sequence of Quadratic Programs (QPs) using Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs). One of the main challenges in this method is that the QPs could easily become infeasible under safety and spatio-temporal const… ▽ More Recent work has shown that stabilizing an affine control system while optimizing a quadratic cost subject to state and control constraints can be mapped to a sequence of Quadratic Programs (QPs) using Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs). One of the main challenges in this method is that the QPs could easily become infeasible under safety and spatio-temporal constraints with tight control bounds. In our own recent work, we defined Auxiliary-Variable Adaptive CBFs (AVCBFs) to improve the feasibility of the CBF-based QP, while avoiding extensive parameter tuning. In this paper, we consider spatio-temporal constraints as finite-time reachability requirements. In order to satisfy these requirements, we generalize AVCBFs to Auxiliary-Variable Adaptive Control Lyapunov Barrier Functions (AVCLBFs) that work for systems and constraints with arbitrary relative degrees. We show that our method has fewer conflicts with safety and input constraints, and outperforms the state of the art in term of adaptivity and feasibility in solving the QP. We illustrate our approach on an optimal control problem for a unicycle. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:2310.00238

arXiv:2403.16006 [pdf, other]

Crypto Inverse-Power Options and Fractional Stochastic Volatility

Authors: Boyi Li, Weixuan Xia

Abstract: Recent empirical evidence has highlighted the crucial role of jumps in both price and volatility within the cryptocurrency market. In this paper, we introduce an analytical model framework featuring fractional stochastic volatility, accommodating price--volatility co-jumps and volatility short-term dependency concurrently. We particularly focus on inverse options, including the emerging Quanto inv… ▽ More Recent empirical evidence has highlighted the crucial role of jumps in both price and volatility within the cryptocurrency market. In this paper, we introduce an analytical model framework featuring fractional stochastic volatility, accommodating price--volatility co-jumps and volatility short-term dependency concurrently. We particularly focus on inverse options, including the emerging Quanto inverse options and their power-type generalizations, aimed at mitigating cryptocurrency exchange rate risk and adjusting inherent risk exposure. Characteristic function-based pricing--hedging formulas are derived for these inverse options. The general model framework is then applied to asymmetric Laplace jump-diffusions and Gaussian-mixed tempered stable-type processes, employing three types of fractional kernels, for an extensive empirical analysis involving model calibration on two independent Bitcoin options data sets, during and after the COVID-19 pandemic. Key insights from our theoretical analysis and empirical findings include: (1) the superior performance of fractional stochastic-volatility models compared to various benchmark models, including those incorporating jumps and stochastic volatility, (2) the practical necessity of jumps in both price and volatility, along with their co-jumps and rough volatility, in the cryptocurrency market, (3) stability of calibrated parameter values in line with stylized facts, and (4) the suggestion that a piecewise kernel offers much higher computational efficiency relative to the commonly used Riemann--Liouville kernel in constructing fractional models, yet maintaining the same accuracy level, thanks to its potential for obtaining explicit model characteristic functions. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: 42 pages, 2 tables, 5 figures

MSC Class: 60G22; 60G51; 60E10

arXiv:2403.12959 [pdf, other]

WHAC: World-grounded Humans and Cameras

Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our approach is founded on two key observations. Firstly, camera-frame SMPL-X estimation methods readily recover absolute human depth. Secondly, human motions inherently provide absolute spatial cues. By integrating these insights, we introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation (EHPS) alongside camera pose estimation, without relying on traditional optimization techniques. Additionally, we present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras, and features diverse interactive human motions as well as realistic camera trajectories. Extensive experiments on both standard and newly established benchmarks highlight the superiority and efficacy of our framework. We will make the code and dataset publicly available. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Homepage: https://wqyin.github.io/projects/WHAC/

Showing 1–50 of 606 results for author: Xiao, W