-
Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1062 additional authors not shown)
Abstract:
A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi…
▽ More
A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
DeepGate3: Towards Scalable Circuit Representation Learning
Authors:
Zhengyuan Shi,
Ziyang Zheng,
Sadaf Khan,
Jianyuan Zhong,
Min Li,
Qiang Xu
Abstract:
Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA). Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings. However, the scalability of GNN-based models is fundamentally constrained by architectural limitations, impacting their ability to gen…
▽ More
Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA). Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings. However, the scalability of GNN-based models is fundamentally constrained by architectural limitations, impacting their ability to generalize across diverse and complex circuit designs. To address these challenges, we introduce DeepGate3, an enhanced architecture that integrates Transformer modules following the initial GNN processing. This novel architecture not only retains the robust gate-level representation capabilities of its predecessor, DeepGate2, but also enhances them with the ability to model subcircuits through a novel pooling transformer mechanism. DeepGate3 is further refined with multiple innovative supervision tasks, significantly enhancing its learning process and enabling superior representation of both gate-level and subcircuit structures. Our experiments demonstrate marked improvements in scalability and generalizability over traditional GNN-based approaches, establishing a significant step forward in circuit representation learning technology.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation
Authors:
Kaiming Shen,
Xichen Ding,
Zixiang Zheng,
Yuqi Gong,
Qianqian Li,
Zhongyi Liu,
Guannan Zhang
Abstract:
The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personal…
▽ More
The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personalized search ranking (PSR) is extremely difficult due to the insufficient learning problem of ID embedding, especially when the IDs in the lifelong sequence features do not exist in the samples of training dataset. Additionally, existing target attention mechanisms struggle to learn the multi-modal representations of items in the sequence well. The distribution of multi-modal embedding (text, image and attributes) output of user's interacted items are not properly aligned and there exist divergence across modalities. We also observe that users' search query sequences and item browsing sequences can fully depict users' intents and benefit from each other. To address these challenges, we propose a unified lifelong multi-modal sequence model called SEMINAR-Search Enhanced Multi-Modal Interest Network and Approximate Retrieval. Specifically, a network called Pretraining Search Unit (PSU) learns the lifelong sequences of multi-modal query-item pairs in a pretraining-finetuning manner with multiple objectives: multi-modal alignment, next query-item pair prediction, query-item relevance prediction, etc. After pretraining, the downstream model restores the pretrained embedding as initialization and finetunes the network. To accelerate the online retrieval speed of multi-modal embedding, we propose a multi-modal codebook-based product quantization strategy to approximate the exact attention calculati
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Uncertainty Quantification in Reduced-Order Gas-Phase Atmospheric Chemistry Modeling using Ensemble SINDy
Authors:
Lin Guo,
Xiaokai Yang,
Zhonghua Zheng,
Nicole Riemer,
Christopher W. Tessum
Abstract:
Uncertainty quantification during atmospheric chemistry modeling is computationally expensive as it typically requires a large number of simulations using complex models. As large-scale modeling is typically performed with simplified chemical mechanisms for computational tractability, we describe a probabilistic surrogate modeling method using principal components analysis (PCA) and Ensemble Spars…
▽ More
Uncertainty quantification during atmospheric chemistry modeling is computationally expensive as it typically requires a large number of simulations using complex models. As large-scale modeling is typically performed with simplified chemical mechanisms for computational tractability, we describe a probabilistic surrogate modeling method using principal components analysis (PCA) and Ensemble Sparse Identification of Nonlinear Dynamics (E-SINDy) to both automatically simplify a gas-phase chemistry mechanism and to quantify the uncertainty introduced when doing so. We demonstrate the application of this method on a small photochemical box model for ozone formation. With 100 ensemble members, the calibration $R$-squared value is 0.96 among the three latent species on average and 0.98 for ozone, demonstrating that predicted model uncertainty aligns well with actual model error. In addition to uncertainty quantification, this probabilistic method also improves accuracy as compared to an equivalent deterministic version, by $\sim$60% for the ensemble prediction mean or $\sim$50% for deterministic prediction by the best-performing single ensemble member. Overall, the ozone testing root mean square error (RMSE) is 15.1% of its root mean square (RMS) concentration. Although our probabilistic ensemble simulation ends up being slower than the reference model it emulates, we expect that use of a more complex reference model in future work will result in additional opportunities for acceleration. Versions of this approach applied to full-scale chemical mechanisms may result in improved uncertainty quantification in models of atmospheric composition, leading to enhanced atmospheric understanding and improved support for air quality control and regulation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing
Authors:
Jun Zhu,
Zihao Du,
Haotian Xu,
Fengbo Lan,
Zilong Zheng,
Bo Ma,
Shengjie Wang,
Tao Zhang
Abstract:
Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerat…
▽ More
Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerator door). Humans intuitively navigate to objects with the right orientation using semantics and common sense. For instance, when opening a refrigerator, we naturally stand in front of it rather than to the side. Recent advances suggest that Vision-Language Models (VLMs) can provide robots with similar common sense. Therefore, we develop a VLM-driven method called Navigation-to-Gaze (Navi2Gaze) for efficient navigation and object gazing based on task descriptions. This method uses the VLM to score and select the best pose from numerous candidates automatically. In evaluations on multiple photorealistic simulation benchmarks, Navi2Gaze significantly outperforms existing approaches and precisely determines the optimal orientation relative to target objects.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
HPC: Hierarchical Progressive Coding Framework for Volumetric Video
Authors:
Zihan Zheng,
Houqiang Zhong,
Qiang Hu,
Xiaoyun Zhang,
Li Song,
Ya Zhang,
Yanfeng Wang
Abstract:
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hie…
▽ More
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications, but its substantial data volume poses significant challenges for compression and transmission. Current NeRF compression lacks the flexibility to adjust video quality and bitrate within a single model for various network and device capacities. To address these issues, we propose HPC, a novel hierarchical progressive volumetric video coding framework achieving variable bitrate using a single model. Specifically, HPC introduces a hierarchical representation with a multi-resolution residual radiance field to reduce temporal redundancy in long-duration sequences while simultaneously generating various levels of detail. Then, we propose an end-to-end progressive learning approach with a multi-rate-distortion loss function to jointly optimize both hierarchical representation and compression. Our HPC trained only once can realize multiple compression levels, while the current methods need to train multiple fixed-bitrate models for different rate-distortion (RD) tradeoffs. Extensive experiments demonstrate that HPC achieves flexible quality levels with variable bitrate by a single model and exhibits competitive RD performance, even outperforming fixed-bitrate models across various datasets.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Machine Learning in High Volume Media Manufacturing
Authors:
Siddarth Reddy Karuka,
Abhinav Sunderrajan,
Zheng Zheng,
Yong Woon Tiean,
Ganesh Nagappan,
Allan Luk
Abstract:
Errors or failures in a high-volume manufacturing environment can have significant impact that can result in both the loss of time and money. Identifying such failures early has been a top priority for manufacturing industries and various rule-based algorithms have been developed over the years. However, catching these failures is time consuming and such algorithms cannot adapt well to changes in…
▽ More
Errors or failures in a high-volume manufacturing environment can have significant impact that can result in both the loss of time and money. Identifying such failures early has been a top priority for manufacturing industries and various rule-based algorithms have been developed over the years. However, catching these failures is time consuming and such algorithms cannot adapt well to changes in designs, and sometimes variations in everyday behavior. More importantly, the number of units to monitor in a high-volume manufacturing environment is too big for manual monitoring or for a simple program. Here we develop a novel program that combines both rule-based decisions and machine learning models that can not only learn and adapt to such day-to-day variations or long-term design changes, but also can be applied at scale to the high number of manufacturing units in use today. Using the current state-of-the-art technologies, we then deploy this program at-scale to handle the needs of ever-increasing demand from the manufacturing environment.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
Authors:
Ruiyang Zhang,
Hu Zhang,
Hang Yu,
Zhedong Zheng
Abstract:
The unsupervised 3D object detection is to accurately detect objects in unstructured environments with no explicit supervisory signals. This task, given sparse LiDAR point clouds, often results in compromised performance for detecting distant or small objects due to the inherent sparsity and limited spatial resolution. In this paper, we are among the early attempts to integrate LiDAR data with 2D…
▽ More
The unsupervised 3D object detection is to accurately detect objects in unstructured environments with no explicit supervisory signals. This task, given sparse LiDAR point clouds, often results in compromised performance for detecting distant or small objects due to the inherent sparsity and limited spatial resolution. In this paper, we are among the early attempts to integrate LiDAR data with 2D images for unsupervised 3D detection and introduce a new method, dubbed LiDAR-2D Self-paced Learning (LiSe). We argue that RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues, particularly when scarce LiDAR points are available for certain objects. Considering the unique characteristics of both modalities, our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies. The adaptive sampling strategy dynamically tunes the distribution of pseudo labels during training, countering the tendency of models to overfit easily detected samples, such as nearby and large-sized objects. By doing so, it ensures a balanced learning trajectory across varying object scales and distances. The weak model aggregation component consolidates the strengths of models trained under different pseudo label distributions, culminating in a robust and powerful final model. Experimental evaluations validate the efficacy of our proposed LiSe method, manifesting significant improvements of +7.1% AP$_{BEV}$ and +3.4% AP$_{3D}$ on nuScenes, and +8.3% AP$_{BEV}$ and +7.4% AP$_{3D}$ on Lyft compared to existing techniques.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
Authors:
Yushuo Chen,
Zerong Zheng,
Zhe Li,
Chao Xu,
Yebin Liu
Abstract:
We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method repres…
▽ More
We present a novel pipeline for learning high-quality triangular human avatars from multi-view videos. Recent methods for avatar learning are typically based on neural radiance fields (NeRF), which is not compatible with traditional graphics pipeline and poses great challenges for operations like editing or synthesizing under different environments. To overcome these limitations, our method represents the avatar with an explicit triangular mesh extracted from an implicit SDF field, complemented by an implicit material field conditioned on given poses. Leveraging this triangular avatar representation, we incorporate physics-based rendering to accurately decompose geometry and texture. To enhance both the geometric and appearance details, we further employ a 2D UNet as the network backbone and introduce pseudo normal ground-truth as additional supervision. Experiments show that our method can learn triangular avatars with high-quality geometry reconstruction and plausible material decomposition, inherently supporting editing, manipulation or relighting operations.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM
Authors:
Neng Wang,
Xieyuanli Chen,
Chenghao Shi,
Zhiqiang Zheng,
Hongshan Yu,
Huimin Lu
Abstract:
Loop closing is a crucial component in SLAM that helps eliminate accumulated errors through two main steps: loop detection and loop pose correction. The first step determines whether loop closing should be performed, while the second estimates the 6-DoF pose to correct odometry drift. Current methods mostly focus on developing robust descriptors for loop closure detection, often neglecting loop po…
▽ More
Loop closing is a crucial component in SLAM that helps eliminate accumulated errors through two main steps: loop detection and loop pose correction. The first step determines whether loop closing should be performed, while the second estimates the 6-DoF pose to correct odometry drift. Current methods mostly focus on developing robust descriptors for loop closure detection, often neglecting loop pose estimation. A few methods that do include pose estimation either suffer from low accuracy or incur high computational costs. To tackle this problem, we introduce SGLC, a real-time semantic graph-guided full loop closing method, with robust loop closure detection and 6-DoF pose estimation capabilities. SGLC takes into account the distinct characteristics of foreground and background points. For foreground instances, it builds a semantic graph that not only abstracts point cloud representation for fast descriptor generation and matching but also guides the subsequent loop verification and initial pose estimation. Background points, meanwhile, are exploited to provide more geometric features for scan-wise descriptor construction and stable planar information for further pose refinement. Loop pose estimation employs a coarse-fine-refine registration scheme that considers the alignment of both instance points and background points, offering high efficiency and accuracy. We evaluate the loop closing performance of SGLC through extensive experiments on the KITTI and KITTI-360 datasets, demonstrating its superiority over existing state-of-the-art methods. Additionally, we integrate SGLC into a SLAM system, eliminating accumulated errors and improving overall SLAM performance. The implementation of SGLC will be released at https://github.com/nubot-nudt/SGLC.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields
Authors:
Weiyi Xue,
Zehan Zheng,
Fan Lu,
Haiyun Wei,
Guang Chen,
Changjun Jiang
Abstract:
Although recent efforts have extended Neural Radiance Fields (NeRF) into LiDAR point cloud synthesis, the majority of existing works exhibit a strong dependence on precomputed poses. However, point cloud registration methods struggle to achieve precise global pose estimation, whereas previous pose-free NeRFs overlook geometric consistency in global reconstruction. In light of this, we explore the…
▽ More
Although recent efforts have extended Neural Radiance Fields (NeRF) into LiDAR point cloud synthesis, the majority of existing works exhibit a strong dependence on precomputed poses. However, point cloud registration methods struggle to achieve precise global pose estimation, whereas previous pose-free NeRFs overlook geometric consistency in global reconstruction. In light of this, we explore the geometric insights of point clouds, which provide explicit registration priors for reconstruction. Based on this, we propose Geometry guided Neural LiDAR Fields(GeoNLF), a hybrid framework performing alternately global neural reconstruction and pure geometric pose optimization. Furthermore, NeRFs tend to overfit individual frames and easily get stuck in local minima under sparse-view inputs. To tackle this issue, we develop a selective-reweighting strategy and introduce geometric constraints for robust optimization. Extensive experiments on NuScenes and KITTI-360 datasets demonstrate the superiority of GeoNLF in both novel view synthesis and multi-view registration of low-frequency large-scale point clouds.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Fine-grained Dynamic Network for Generic Event Boundary Detection
Authors:
Ziwei Zheng,
Lijun He,
Le Yang,
Fan Li
Abstract:
Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their di…
▽ More
Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their distinctive characteristics and detection difficulties, resulting in suboptimal performance. Intuitively, a more intelligent and reasonable way is to adaptively detect boundaries by considering their special properties. In light of this, we propose a novel dynamic pipeline for generic event boundaries named DyBDet. By introducing a multi-exit network architecture, DyBDet automatically learns the subnet allocation to different video snippets, enabling fine-grained detection for various boundaries. Besides, a multi-order difference detector is also proposed to ensure generic boundaries can be effectively identified and adaptively processed. Extensive experiments on the challenging Kinetics-GEBD and TAPOS datasets demonstrate that adopting the dynamic strategy significantly benefits GEBD tasks, leading to obvious improvements in both performance and efficiency compared to the current state-of-the-art.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Authors:
Le Yang,
Ziwei Zheng,
Yizeng Han,
Hao Cheng,
Shiji Song,
Gao Huang,
Fan Li
Abstract:
Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo…
▽ More
Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneously adapt kernel weights and receptive fields at different timestamps. Based on DFA, the proposed dynamic encoder layer aggregates the temporal features within the action time ranges and guarantees the discriminability of the extracted representations. Moreover, using DFA helps to develop a Dynamic TAD head (DyHead), which adaptively aggregates the multi-scale features with adjusted parameters and learned receptive fields better to detect the action instances with diverse ranges from videos. With the proposed encoder layer and DyHead, a new dynamic TAD model, DyFADet, achieves promising performance on a series of challenging TAD benchmarks, including HACS-Segment, THUMOS14, ActivityNet-1.3, Epic-Kitchen 100, Ego4D-Moment QueriesV1.0, and FineAction. Code is released to https://github.com/yangle15/DyFADet-pytorch.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Normalization and effective learning rates in reinforcement learning
Authors:
Clare Lyle,
Zeyu Zheng,
Khimya Khetarpal,
James Martens,
Hado van Hasselt,
Razvan Pascanu,
Will Dabney
Abstract:
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network paramet…
▽ More
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature, with several works highlighting diverse benefits such as improving loss landscape conditioning and combatting overestimation bias. However, normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. This becomes problematic in continual learning settings, where the resulting effective learning rate schedule may decay to near zero too quickly relative to the timescale of the learning problem. We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. This technique reveals itself as a powerful analytical tool to better understand learning rate schedules in deep reinforcement learning, and as a means of improving robustness to nonstationarity in synthetic plasticity loss benchmarks along with both the single-task and sequential variants of the Arcade Learning Environment. We also show that our approach can be easily applied to popular architectures such as ResNets and transformers while recovering and in some cases even slightly improving the performance of the base model in common stationary benchmarks.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models
Authors:
Yanlin Wang,
Tianyue Jiang,
Mingwei Liu,
Jiachi Chen,
Zibin Zheng
Abstract:
Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style betw…
▽ More
Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation
Authors:
Hang Li,
Qian Feng,
Zhi Zheng,
Jianxiang Feng,
Alois Knoll
Abstract:
Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enable…
▽ More
Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enables strong generalization in various aspects, such as background changes, visual ambiguity and can avoid novel obstacles that are unseen during the demonstration process. Specifically, We first train a diffusion policy conditioned on point clouds of target objects and then harness a large language model to decompose the user instruction into task related units consisting of target objects and obstacles, which can be used as visual observation for the policy network or converted to a cost function, guiding the generation of trajectory towards collision free region at test time. Our proposed method shows training efficiency and higher success rates compared with the baselines in simulation experiments. In real world experiments, our method exhibits strong generalization performance towards unseen instances, cluttered scenes, scenes of multiple similar objects and demonstrates training free capability of obstacle avoidance.
△ Less
Submitted 4 July, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems
Authors:
Zhi Zheng,
Changliang Zhou,
Tong Xialiang,
Mingxuan Yuan,
Zhenkun Wang
Abstract:
Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without needing expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods with divide-and-conquer strategies have shown superiorities in addres…
▽ More
Single-stage neural combinatorial optimization solvers have achieved near-optimal results on various small-scale combinatorial optimization (CO) problems without needing expert knowledge. However, these solvers exhibit significant performance degradation when applied to large-scale CO problems. Recently, two-stage neural methods with divide-and-conquer strategies have shown superiorities in addressing large-scale CO problems. Nevertheless, the efficiency of these methods highly relies on problem-specific heuristics in either the divide or the conquer procedure, which limits their applicability to general CO problems. Moreover, these methods employ separate training schemes and ignore the interdependencies between the dividing and conquering strategies, which often leads to sub-optimal solutions. To tackle these drawbacks, this article develops a unified neural divide-and-conquer framework (i.e., UDC) for solving general large-scale CO problems. UDC offers a Divide-Conquer-Reunion (DCR) training method to eliminate the negative impact of a sub-optimal dividing policy. Employing a high-efficiency Graph Neural Network (GNN) for global dividing and a fixed-length sub-path solver for conquering sub-problems, the proposed UDC framework demonstrates extensive applicability, achieving superior performance in 10 representative large-scale CO problems.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
A Survey on Failure Analysis and Fault Injection in AI Systems
Authors:
Guangba Yu,
Gou Tan,
Haojia Huang,
Zhenyu Zhang,
Pengfei Chen,
Roberto Natella,
Zibin Zheng
Abstract:
The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens…
▽ More
The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps exist between the simulated faults and real-world failures. Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures. Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Multiple Kronecker RLS fusion-based link propagation for drug-side effect prediction
Authors:
Yuqing Qian,
Ziyu Zheng,
Prayag Tiwari,
Yijie Ding,
Quan Zou
Abstract:
Drug-side effect prediction has become an essential area of research in the field of pharmacology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the rela…
▽ More
Drug-side effect prediction has become an essential area of research in the field of pharmacology. As the use of medications continues to rise, so does the importance of understanding and mitigating the potential risks associated with them. At present, researchers have turned to data-driven methods to predict drug-side effects. Drug-side effect prediction is a link prediction problem, and the related data can be described from various perspectives. To process these kinds of data, a multi-view method, called Multiple Kronecker RLS fusion-based link propagation (MKronRLSF-LP), is proposed. MKronRLSF-LP extends the Kron-RLS by finding the consensus partitions and multiple graph Laplacian constraints in the multi-view setting. Both of these multi-view settings contribute to a higher quality result. Extensive experiments have been conducted on drug-side effect datasets, and our empirical results provide evidence that our approach is effective and robust.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
FAST survey of H I and OH absorption towards extragalactic radio sources
Authors:
Yogesh Chandola,
D. J. Saikia,
Yin-Zhe Ma,
Zheng Zheng,
Chao-Wei Tsai,
Di Li,
Denis Tramonte,
Hengxing Pan
Abstract:
Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminos…
▽ More
Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminosity ($\sim$10$^{23}$-10$^{26}$ W Hz$^{-1}$ at 1.4 GHz), red mid-infrared color (W2[4.6 $μ$m]$-$W3[12 $μ$m] $>$ 2.5 mag) and redshift up to 0.35. From 13 sources with good data at H I observing frequencies, we report the detection of H I absorption towards 8 sources, 5 of which are new detections including 4 in the redshift range 0.25 to 0.35. Our detection rates are consistent with our previous results with dependence on the star-formation history of the host galaxy reflected in the mid-infrared \textit{WISE} W2$-$W3 colors and the compactness of the radio source. We find no significant dependence of detection rates on radio luminosity or redshift. We also find that H I column densities are anti-correlated with the low-frequency spectral indices ($α_{\rm 150 MHz}^{\rm 1.4 GHz}$, $S_ν\propto ν^{-α}$). We do not have any detection from 23 sources with good data at OH observing frequencies. However, by stacking the spectra we estimate the 3$σ$ upper limit of OH column density to be 2.27$\times$10$^{14}$$T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$. By stacking the OH spectra for 7 associated H I absorbers, we get a 3$σ$ upper limit of 3.47$\times$10$^{14}$ $T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$ on OH column density and 1.78$\times$10$^{-7}$ on [OH]/[H I] ratio.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Chandra detects low-luminosity AGN with $M_\mathrm{BH}=10^{4}-10^{6}~M_\mathrm{\odot}$ in nearby ($z<0.5$), dwarf and star-forming galaxies
Authors:
Mainak Singha,
Julissa Sarmiento,
Sangeeta Malhotra,
James E. Rhoads,
L. Y. Aaron Yung,
Junxian Wang,
Zhen-Ya Zheng,
Ruqiu Lin,
Keunho Kim,
Jialai Kang,
Santosh Harish
Abstract:
We searched the Chandra and XMM archives for observations of 900 green pea galaxies to find AGN signatures. Green peas are low-mass galaxies with prominent emission lines, similar in size and star formation rate to high-redshift dwarf galaxies. Of the 29 observations found, 9 show X-ray detections with $S/N>3$. The 2-10 keV X-ray luminosity for these 9 sources exceeds…
▽ More
We searched the Chandra and XMM archives for observations of 900 green pea galaxies to find AGN signatures. Green peas are low-mass galaxies with prominent emission lines, similar in size and star formation rate to high-redshift dwarf galaxies. Of the 29 observations found, 9 show X-ray detections with $S/N>3$. The 2-10 keV X-ray luminosity for these 9 sources exceeds $10^{40}~\mathrm{erg~s}^{-1}$, with 2 sources exceeding $10^{41}~\mathrm{erg~s}^{-1}$, suggesting the presence of intermediate-mass black holes (IMBH) or low-luminosity AGN (LLAGN) with BH masses between $100-10^6M_\mathrm{\odot}$. All X-ray detected sources (plus 6 additional sources) show He~II$\lambda4686$ emission and a broad component of the H$α$ emission line, indicating winds. The line widths of the broad H$α$ and He II$\lambda4686$ emitting gas clouds are weakly correlated ($R^{2}=0.15$), suggesting He II$\lambda4686$ emission is inconsistent with winds from super-Eddington accretors. However, the ratio of X-ray luminosity to star formation rate shows an anti-correlation with metallicity in 5 out of 9 X-ray detected sources, implying ultraluminous X-ray sources are key contributors to the observed X-ray luminosity. This could be due to super-Eddington accretors or IMBH. The X-ray emission is much higher than that produced by Wolf-Rayet stars and supernovae-driven winds. Thus, the X-ray luminosity in these 9 sources can only be explained by black holes with masses over $100~M_\mathrm{\odot}$. Our findings suggest the presence of LLAGN in these galaxies, with broad H$α$ line widths implying BH masses of $10^4-10^6M_\mathrm{\odot}$. Given Green Peas' role as significant Lyman Continuum leakers, LLAGN in these galaxies could have contributed significantly to cosmic reionization.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
AI-native Memory: A Pathway from LLMs Towards AGI
Authors:
Jingbo Shang,
Zai Zheng,
Xiang Ying,
Felix Tao,
Mindverse Team
Abstract:
Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective conte…
▽ More
Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
Authors:
Zhuo Zheng,
Stefano Ermon,
Dongjun Kim,
Liangpei Zhang,
Yanfei Zhong
Abstract:
Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on gene…
▽ More
Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., change event simulation and semantic change synthesis. To solve these two problems, we present Changen2, a GPCM with a resolution-scalable diffusion transformer which can generate time series of images and their semantic and change labels from labeled or unlabeled single-temporal images. Changen2 is a generative change foundation model that can be trained at scale via self-supervision, and can produce change supervisory signals from unlabeled single-temporal images. Unlike existing foundation models, Changen2 synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Experiments suggest Changen2 has superior spatiotemporal scalability, e.g., Changen2 model trained on 256$^2$ pixel single-temporal images can yield time series of any length and resolutions of 1,024$^2$ pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterparts) and transferability across multiple types of change tasks.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
PANDA: A self-driving lab for studying electrodeposited polymer films
Authors:
Harley Quinn,
Gregory A. Robben,
Zhaoyi Zheng,
Alan L. Gardner,
Jörg G. Werner,
Keith A. Brown
Abstract:
We introduce the polymer analysis and discovery array (PANDA), an automated system for high-throughput electrodeposition and functional characterization of polymer films. The PANDA is a custom, modular, and low-cost system based on a CNC gantry that we have modified to include a syringe pump, potentiostat, and camera with a telecentric lens. This system can perform fluid handling, electrochemistry…
▽ More
We introduce the polymer analysis and discovery array (PANDA), an automated system for high-throughput electrodeposition and functional characterization of polymer films. The PANDA is a custom, modular, and low-cost system based on a CNC gantry that we have modified to include a syringe pump, potentiostat, and camera with a telecentric lens. This system can perform fluid handling, electrochemistry, and transmission optical measurements on samples in custom 96-well plates that feature transparent and conducting bottoms. We begin by validating this platform through a series of control fluid handling and electrochemistry experiments to quantify the repeatability, lack of cross-contamination, and accuracy of the system. As a proof-of-concept experimental campaign to study the functional properties of a model polymer film, we optimize the electrochromic switching of electrodeposited poly(3,4-ethylenedioxythiophene):poly(styrene sulfonate) (PEDOT:PSS) films. In particular, we explore the monomer concentration, deposition time, and deposition voltage using an array of experiments selected by Latin hypercube sampling. Subsequently, we run an active learning campaign based upon Bayesian optimization to find the processing conditions that lead to the highest electrochromic switching of PEDOT:PSS. This self-driving lab integrates optical and electrochemical characterization to constitute a novel, automated approach for studying functional polymer films.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Probing the nature of the $χ_{c1}(3872)$ state using radiative decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1094 additional authors not shown)
Abstract:
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and…
▽ More
The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and the ratio of its partial width to that of the $χ_{c1}(3872)\rightarrow J/ψγ$ decay is measured to be $$ \frac{Γ_{χ_{c1}(3872)\rightarrow ψ(2S)γ}}
{Γ_{χ_{c1}(3872)\rightarrow J/ψγ}} = 1.67 \pm 0.21 \pm 0.12 \pm0.04 , $$ where the first uncertainty is statistical, the second systematic and the third is due to the uncertainties on the branching fractions of the $ψ(2S)$ and $J/ψ$ mesons. The measured ratio makes the interpretation of the $χ_{c1}(3872)$ state as a~pure $D^0\bar{D}^{*0}+\bar{D}^0D^{*0}$ molecule questionable and strongly indicates a sizeable compact charmonium or tetraquark component within the $χ_{c1}(3872)$ state.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Authors:
Chao Lou,
Zixia Jia,
Zilong Zheng,
Kewei Tu
Abstract:
Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and me…
▽ More
Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and memory obstacles while maintaining performance. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query, thereby enabling gradient-based optimization. As a result, SPARSEK Attention offers linear time complexity and constant memory footprint during generation. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods and provides significant speed improvements during both training and inference, particularly in language modeling and downstream tasks. Furthermore, our method can be seamlessly integrated into pre-trained Large Language Models (LLMs) with minimal fine-tuning, offering a practical solution for effectively managing long-range dependencies in diverse applications.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Authors:
Yuxuan Wang,
Yueqian Wang,
Dongyan Zhao,
Cihang Xie,
Zilong Zheng
Abstract:
Recent advancements in Multimodal Large Language Models (MLLMs) have extended their capabilities to video understanding. Yet, these models are often plagued by "hallucinations", where irrelevant or nonsensical content is generated, deviating from the actual video context. This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language model…
▽ More
Recent advancements in Multimodal Large Language Models (MLLMs) have extended their capabilities to video understanding. Yet, these models are often plagued by "hallucinations", where irrelevant or nonsensical content is generated, deviating from the actual video context. This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs). VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis, including object-relation, temporal, semantic detail, extrinsic factual, and extrinsic non-factual hallucinations. We adopt an adversarial binary VideoQA method for comprehensive evaluation, where pairs of basic and hallucinated questions are crafted strategically. By evaluating eleven LVLMs on VideoHallucer, we reveal that i) the majority of current models exhibit significant issues with hallucinations; ii) while scaling datasets and parameters improves models' ability to detect basic visual cues and counterfactuals, it provides limited benefit for detecting extrinsic factual hallucinations; iii) existing models are more adept at detecting facts than identifying hallucinations. As a byproduct, these analyses further instruct the development of our self-PEP framework, achieving an average of 5.38% improvement in hallucination resistance across all model architectures.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Relaxing Continuous Constraints of Equivariant Graph Neural Networks for Physical Dynamics Learning
Authors:
Zinan Zheng,
Yang Liu,
Jia Li,
Jianhua Yao,
Yu Rong
Abstract:
Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either overlook necess…
▽ More
Incorporating Euclidean symmetries (e.g. rotation equivariance) as inductive biases into graph neural networks has improved their generalization ability and data efficiency in unbounded physical dynamics modeling. However, in various scientific and engineering applications, the symmetries of dynamics are frequently discrete due to the boundary conditions. Thus, existing GNNs either overlook necessary symmetry, resulting in suboptimal representation ability, or impose excessive equivariance, which fails to generalize to unobserved symmetric dynamics. In this work, we propose a general Discrete Equivariant Graph Neural Network (DEGNN) that guarantees equivariance to a given discrete point group. Specifically, we show that such discrete equivariant message passing could be constructed by transforming geometric features into permutation-invariant embeddings. Through relaxing continuous equivariant constraints, DEGNN can employ more geometric feature combinations to approximate unobserved physical object interaction functions. Two implementation approaches of DEGNN are proposed based on ranking or pooling permutation-invariant functions. We apply DEGNN to various physical dynamics, ranging from particle, molecular, crowd to vehicle dynamics. In twenty scenarios, DEGNN significantly outperforms existing state-of-the-art approaches. Moreover, we show that DEGNN is data efficient, learning with less data, and can generalize across scenarios such as unobserved orientation.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments
Authors:
Zixia Jia,
Mengmeng Wang,
Baichen Tong,
Song-Chun Zhu,
Zilong Zheng
Abstract:
Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 represen…
▽ More
Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs. However, it remains unclear how well LLMs can function as few-shot or zero-shot embodied agents in dynamic interactive environments. To address this gap, we introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE (i) offers adaptability to diverse environments without multiple simulation engines, (ii) evaluates agents' capacity to develop ``internalized world knowledge'' with embodied observations, and (iii) allows easy customization of communication and action strategies. To address the embodiment challenge, we devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information. Comprehensive benchmark results illustrate challenges and insights of embodied planning. LangSuitE represents a significant step toward building embodied generalists in the context of language models.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels
Authors:
Zixia Jia,
Junpeng Li,
Shichuan Zhang,
Anji Liu,
Zilong Zheng
Abstract:
Traditional supervised learning heavily relies on human-annotated datasets, especially in data-hungry neural approaches. However, various tasks, especially multi-label tasks like document-level relation extraction, pose challenges in fully manual annotation due to the specific domain knowledge and large class sets. Therefore, we address the multi-label positive-unlabelled learning (MLPUL) problem,…
▽ More
Traditional supervised learning heavily relies on human-annotated datasets, especially in data-hungry neural approaches. However, various tasks, especially multi-label tasks like document-level relation extraction, pose challenges in fully manual annotation due to the specific domain knowledge and large class sets. Therefore, we address the multi-label positive-unlabelled learning (MLPUL) problem, where only a subset of positive classes is annotated. We propose Mixture Learner for Partially Annotated Classification (MLPAC), an RL-based framework combining the exploration ability of reinforcement learning and the exploitation ability of supervised learning. Experimental results across various tasks, including document-level relation extraction, multi-label image classification, and binary PU learning, demonstrate the generalization and effectiveness of our framework.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
SegNet4D: Effective and Efficient 4D LiDAR Semantic Segmentation in Autonomous Driving Environments
Authors:
Neng Wang,
Ruibin Guo,
Chenghao Shi,
Hui Zhang,
Huimin Lu,
Zhiqiang Zheng,
Xieyuanli Chen
Abstract:
4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles. It entails identifying the semantic category of each point in the LiDAR scan and distinguishing whether it is dynamic, a critical aspect in downstream tasks such as path planning and autonomous navigation. Exist…
▽ More
4D LiDAR semantic segmentation, also referred to as multi-scan semantic segmentation, plays a crucial role in enhancing the environmental understanding capabilities of autonomous vehicles. It entails identifying the semantic category of each point in the LiDAR scan and distinguishing whether it is dynamic, a critical aspect in downstream tasks such as path planning and autonomous navigation. Existing methods for 4D semantic segmentation often rely on computationally intensive 4D convolutions for multi-scan input, resulting in poor real-time performance. In this article, we introduce SegNet4D, a novel real-time multi-scan semantic segmentation method leveraging a projection-based approach for fast motion feature encoding, showcasing outstanding performance. SegNet4D treats 4D semantic segmentation as two distinct tasks: single-scan semantic segmentation and moving object segmentation, each addressed by dedicated head. These results are then fused in the proposed motion-semantic fusion module to achieve comprehensive multi-scan semantic segmentation. Besides, we propose extracting instance information from the current scan and incorporating it into the network for instance-aware segmentation. Our approach exhibits state-of-the-art performance across multiple datasets and stands out as a real-time multi-scan semantic segmentation method. The implementation of SegNet4D will be made available at \url{https://github.com/nubot-nudt/SegNet4D}.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
SmartAxe: Detecting Cross-Chain Vulnerabilities in Bridge Smart Contracts via Fine-Grained Static Analysis
Authors:
Zeqin Liao,
Yuhong Nan,
Henglong Liang,
Sicheng Hao,
Juan Zhai,
Jiajing Wu,
Zibin Zheng
Abstract:
With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as th…
▽ More
With the increasing popularity of blockchain, different blockchain platforms coexist in the ecosystem (e.g., Ethereum, BNB, EOSIO, etc.), which prompts the high demand for cross-chain communication. Cross-chain bridge is a specific type of decentralized application for asset exchange across different blockchain platforms. Securing the smart contracts of cross-chain bridges is in urgent need, as there are a number of recent security incidents with heavy financial losses caused by vulnerabilities in bridge smart contracts, as we call them Cross-Chain Vulnerabilities (CCVs). However, automatically identifying CCVs in smart contracts poses several unique challenges. Particularly, it is non-trivial to (1) identify application-specific access control constraints needed for cross-bridge asset exchange, and (2) identify inconsistent cross-chain semantics between the two sides of the bridge.
In this paper, we propose SmartAxe, a new framework to identify vulnerabilities in cross-chain bridge smart contracts. Particularly, to locate vulnerable functions that have access control incompleteness, SmartAxe models the heterogeneous implementations of access control and finds necessary security checks in smart contracts through probabilistic pattern inference. Besides, SmartAxe constructs cross-chain control-flow graph (xCFG) and data-flow graph (xDFG), which help to find semantic inconsistency during cross-chain data communication. To evaluate SmartAxe, we collect and label a dataset of 88 CCVs from real-attacks cross-chain bridge contracts. Evaluation results show that SmartAxe achieves a precision of 84.95% and a recall of 89.77%. In addition, SmartAxe successfully identifies 232 new/unknown CCVs from 129 real-world cross-chain bridge applications (i.e., from 1,703 smart contracts). These identified CCVs affect a total amount of digital assets worth 1,885,250 USD.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
SmartState: Detecting State-Reverting Vulnerabilities in Smart Contracts via Fine-Grained State-Dependency Analysis
Authors:
Zeqin Liao,
Sicheng Hao,
Yuhong Nan,
Zibin Zheng
Abstract:
Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contra…
▽ More
Smart contracts written in Solidity are widely used in different blockchain platforms such as Ethereum, TRON and BNB Chain. One of the unique designs in Solidity smart contracts is its state-reverting mechanism for error handling and access control. Unfortunately, a number of recent security incidents showed that adversaries also utilize this mechanism to manipulate critical states of smart contracts, and hence, bring security consequences such as illegal profit-gain and Deny-of-Service (DoS). In this paper, we call such vulnerabilities as the State-reverting Vulnerability (SRV). Automatically identifying SRVs poses unique challenges, as it requires an in-depth analysis and understanding of the state-dependency relations in smart contracts.
This paper presents SmartState, a new framework for detecting state-reverting vulnerability in Solidity smart contracts via fine-grained state-dependency analysis. SmartState integrates a set of novel mechanisms to ensure its effectiveness. Particularly, Smart-State extracts state dependencies from both contract bytecode and historical transactions. Both of them are critical for inferring dependencies related to SRVs. Further, SmartState models the generic patterns of SRVs (i.e., profit-gain and DoS) as SRV indicators, and hence effectively identify SRVs based on the constructed state-dependency graph. To evaluate SmartState, we manually annotated a ground-truth dataset which contains 91 SRVs in the real world. Evaluation results showed that SmartState achieves a precision of 87.23% and a recall of 89.13%. In addition, SmartState successfully identifies 406 new SRVs from 47,351 real-world smart contracts. 11 of these SRVs are from popular smart contracts with high transaction amounts (i.e., top 2000). In total, our reported SRVs affect a total amount of digital assets worth 428,600 USD.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection
Authors:
Zhuo Zheng,
Yanfei Zhong,
Ailong Ma,
Liangpei Zhang
Abstract:
Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (…
▽ More
Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (STAR) for universal remote sensing change detection from a new perspective of exploiting changes between unpaired images as supervisory signals. STAR enables us to train a high-accuracy change detector only using unpaired labeled images and can generalize to real-world bitemporal image pairs. To demonstrate the flexibility and scalability of STAR, we design a simple yet unified change detector, termed ChangeStar2, capable of addressing binary change detection, object change detection, and semantic change detection in one architecture. ChangeStar2 achieves state-of-the-art performances on eight public remote sensing change detection datasets, covering above two supervised settings, multiple change types, multiple scenarios. The code is available at https://github.com/Z-Zheng/pytorch-change-models.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystalline Symmetry Classification
Authors:
Bin Cao,
Yang Liu,
Zinan Zheng,
Ruifeng Tan,
Jia Li,
Tong-yi Zhang
Abstract:
Spectroscopic data, particularly diffraction data, contain detailed crystal and microstructure information and thus are crucial for materials discovery. Powder X-ray diffraction (XRD) patterns are greatly effective in identifying crystals. Although machine learning (ML) has significantly advanced the analysis of powder XRD patterns, the progress is hindered by a lack of training data. To address t…
▽ More
Spectroscopic data, particularly diffraction data, contain detailed crystal and microstructure information and thus are crucial for materials discovery. Powder X-ray diffraction (XRD) patterns are greatly effective in identifying crystals. Although machine learning (ML) has significantly advanced the analysis of powder XRD patterns, the progress is hindered by a lack of training data. To address this, we introduce SimXRD, the largest open-source simulated XRD pattern dataset so far, to accelerate the development of crystallographic informatics. SimXRD comprises 4,065,346 simulated powder X-ray diffraction patterns, representing 119,569 distinct crystal structures under 33 simulated conditions that mimic real-world variations. We find that the crystal symmetry inherently follows a long-tailed distribution and evaluate 21 sequence learning models on SimXRD. The results indicate that existing neural networks struggle with low-frequency crystal classifications. The present work highlights the academic significance and the engineering novelty of simulated XRD patterns in this interdisciplinary field.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Safely Learning with Private Data: A Federated Learning Framework for Large Language Model
Authors:
JiaYing Zheng,
HaiNan Zhang,
LingXiang Wang,
WangJie Qiu,
HongWei Zheng,
ZhiMing Zheng
Abstract:
Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuit…
▽ More
Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training request at a time hinders parallel training, severely impacting training efficiency. In this paper, we propose a Federated Learning framework for LLM, named FL-GLM, which prevents data leakage caused by both server-side and peer-client attacks while improving training efficiency. Specifically, we first place the input block and output block on local client to prevent embedding gradient attacks from server. Secondly, we employ key-encryption during client-server communication to prevent reverse engineering attacks from peer-clients. Lastly, we employ optimization methods like client-batching or server-hierarchical, adopting different acceleration methods based on the actual computational capabilities of the server. Experimental results on NLU and generation tasks demonstrate that FL-GLM achieves comparable metrics to centralized chatGLM model, validating the effectiveness of our federated learning framework.
△ Less
Submitted 26 June, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Finite Groups of Symplectic Automorphisms of Supersingular K3 surfaces in Odd Characteristics
Authors:
Bin Wang,
Zhiwei Zheng
Abstract:
In 2009, Dolgachev-Keum classify finite groups of tame symplectic automorphisms of K3 surfaces in positive characteristics. They show all such groups are subgroups of the Mathieu group of degree 23. In this paper, we utilize lattice-theoretic methods to investigate symplectic actions of finite groups G on K3 surfaces in odd characteristics. In the tame cases (i.e., the order of G is coprime with p…
▽ More
In 2009, Dolgachev-Keum classify finite groups of tame symplectic automorphisms of K3 surfaces in positive characteristics. They show all such groups are subgroups of the Mathieu group of degree 23. In this paper, we utilize lattice-theoretic methods to investigate symplectic actions of finite groups G on K3 surfaces in odd characteristics. In the tame cases (i.e., the order of G is coprime with p) and all the superspecial cases, one can associate with the action a Leech pair which can be detected via Höhn-Mason's list. Notice that the superspecial case has been recently resolved by Ohashi-Schütt. If the K3 surface is supersingular with Artin invariant at least two, we develop a new machinery called p-root pairs to detect possible symplectic finite group actions (without the assumption of tameness). The concept of p-root pair is closely related to root systems and Weyl groups. In particular, we recover many results by Dolgachev-Keum with a different method and give an upper bound for the exponent of p in |G|.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Communication with Quantum Catalysts
Authors:
Yuqi Li,
Junjing Xing,
Dengke Qu,
Lei Xiao,
Zhaobing Fan,
Zhu-Jun Zheng,
Haitao Ma,
Peng Xue,
Kishor Bharti,
Dax Enshan Koh,
Yunlong Xiao
Abstract:
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling cataly…
▽ More
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling catalysts, can surprisingly outperform their non-deactivating counterparts despite experiencing slight alterations. In this work, we employ embezzling quantum catalysts to enhance the transmission of both quantum and classical information. Our results reveal that using embezzling catalysts augments the efficiency of information transmission across noisy quantum channels, ensuring a non-zero catalytic channel capacity. Furthermore, we introduce catalytic superdense coding, demonstrating how embezzling catalysts can enhance the transmission of classical information. Finally, we explore methods to reduce the dimensionality of catalysts, a step toward making quantum catalysis a practical reality.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Synthetic spin-orbit coupling for the multi-spin models in optical lattices
Authors:
Zhen Zheng,
Yan-Qing Zhu,
Shanchao Zhang,
Shi-Liang Zhu,
Z. D. Wang
Abstract:
The essential role of synthetic spin-orbit coupling in discovering new topological matter phases with cold atoms is widely acknowledged. However, the engineering of spin-orbit coupling remains unclear for arbitrary-spin models due to the complexity of spin matrices. In this work, we develop a more general but relatively straightforward method to achieve spin-orbit coupling for multi-spin models. O…
▽ More
The essential role of synthetic spin-orbit coupling in discovering new topological matter phases with cold atoms is widely acknowledged. However, the engineering of spin-orbit coupling remains unclear for arbitrary-spin models due to the complexity of spin matrices. In this work, we develop a more general but relatively straightforward method to achieve spin-orbit coupling for multi-spin models. Our approach hinges on controlling the coupling between distinct pseudo-spins through two intermediary states, resulting in tunneling with spin flips that have direction-dependent strength. The engineered spin-orbit coupling can facilitate topological phase transitions with Chern numbers over 1, a unique characteristic of multi-spin models compared to spin-1/2 models. By utilizing existing cold atom techniques, our proposed method provides an ideal platform for investigating topological properties related to large Chern numbers.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
One Fits All: Learning Fair Graph Neural Networks for Various Sensitive Attributes
Authors:
Yuchang Zhu,
Jintang Li,
Yatao Bian,
Zibin Zheng,
Liang Chen
Abstract:
Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate re…
▽ More
Recent studies have highlighted fairness issues in Graph Neural Networks (GNNs), where they produce discriminatory predictions against specific protected groups categorized by sensitive attributes such as race and age. While various efforts to enhance GNN fairness have made significant progress, these approaches are often tailored to specific sensitive attributes. Consequently, they necessitate retraining the model from scratch to accommodate changes in the sensitive attribute requirement, resulting in high computational costs. To gain deeper insights into this issue, we approach the graph fairness problem from a causal modeling perspective, where we identify the confounding effect induced by the sensitive attribute as the underlying reason. Motivated by this observation, we formulate the fairness problem in graphs from an invariant learning perspective, which aims to learn invariant representations across environments. Accordingly, we propose a graph fairness framework based on invariant learning, namely FairINV, which enables the training of fair GNNs to accommodate various sensitive attributes within a single training session. Specifically, FairINV incorporates sensitive attribute partition and trains fair GNNs by eliminating spurious correlations between the label and various sensitive attributes. Experimental results on several real-world datasets demonstrate that FairINV significantly outperforms state-of-the-art fairness approaches, underscoring its effectiveness. Our code is available via: https://github.com/ZzoomD/FairINV/.
△ Less
Submitted 2 July, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
Precision measurement of the $Ξ^-_b$ baryon lifetime
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
F. Alessio,
M. Alexander,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1064 additional authors not shown)
Abstract:
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys…
▽ More
A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second systematic. This value is averaged with the corresponding value from Run 1 to obtain ${r_τ^{\rm Run\,1,2} = 1.078\pm0.012\pm0.007}$. Multiplying by the world-average value of the $Λ^0_b$ lifetime yields $τ_{Ξ^-_b}^{\rm Run~1,2} = 1.578\pm0.018\pm0.010\pm0.011$ ps, where the uncertainties are statistical, systematic, and due to the limited knowledge of the $Λ^0_b$ lifetime. This measurement improves the precision of the current world average of the $Ξ^-_b$ lifetime by about a factor of two, and is in good agreement with the most recent theoretical predictions.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
CoSQA+: Enhancing Code Search Dataset with Matching Code
Authors:
Jing Gong,
Yanghui Wu,
Linxi Liang,
Zibin Zheng,
Yanlin Wang
Abstract:
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets are problematic: either using unrealistic queries, or with mismatched codes, and typically using one-to-one query-code pairing, which fails to reflect the reality that a query might have multiple valid code matches. T…
▽ More
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets are problematic: either using unrealistic queries, or with mismatched codes, and typically using one-to-one query-code pairing, which fails to reflect the reality that a query might have multiple valid code matches. This paper introduces CoSQA+, pairing high-quality queries (reused from CoSQA) with multiple suitable codes. We collect code candidates from diverse sources and form candidate pairs by pairing queries with these codes. Utilizing the power of large language models (LLMs), we automate pair annotation, filtering, and code generation for queries without suitable matches. Through extensive experiments, CoSQA+ has demonstrated superior quality over CoSQA. Models trained on CoSQA+ exhibit improved performance. Furthermore, we propose a new metric Mean Multi-choice Reciprocal Rank (MMRR), to assess one-to-N code search performance. We provide the code and data at https://github.com/DeepSoftwareAnalytics/CoSQA_Plus.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Authors:
Siyuan Qi,
Bangcheng Yang,
Kailin Jiang,
Xiaobo Wang,
Jiaqi Li,
Yifan Zhong,
Yaodong Yang,
Zilong Zheng
Abstract:
The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context l…
▽ More
The existing fine-tuning paradigm for language models is brittle in knowledge editing scenarios, where the model must incorporate new information without extensive retraining. This brittleness often results in overfitting, reduced performance, and unnatural language generation. To address this, we propose Consistent In-Context Editing (ICE), a novel approach that leverages the model's in-context learning capability to tune toward a contextual distribution rather than a one-hot target. ICE introduces a straightforward optimization framework that includes both a target and a procedure, enhancing the robustness and effectiveness of gradient-based tuning methods. We provide analytical insights into ICE across four critical aspects of knowledge editing: accuracy, locality, generalization, and linguistic quality, showing its advantages. Experimental results across four datasets confirm the effectiveness of ICE and demonstrate its potential for continual editing, ensuring that updated information is incorporated while preserving the integrity of the model.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
SmartOracle: Generating Smart Contract Oracle via Fine-Grained Invariant Detection
Authors:
Jianzhong Su,
Jiachi Chen,
Zhiyuan Fang,
Xingwei Lin,
Yutian Tang,
Zibin Zheng
Abstract:
As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose or…
▽ More
As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose oracles that may be inadequate for vulnerability detection, or require user-specified oracles, which are labor-intensive to create. In this paper, we introduce SmartOracle, a dynamic invariant detector that automatically generates fine-grained invariants as application-specific oracles for vulnerability detection. From historical transactions, SmartOracle uses pattern-based detection and advanced inference to construct comprehensive properties, and mines multi-layer likely invariants to accommodate the complicated contract functionalities. After that, SmartOracle identifies smart contract vulnerabilities by hunting the violated invariants in new transactions. In the field of invariant detection, SmartOracle detects 50% more ERC20 invariants than existing dynamic invariant detection and achieves 96% precision rate. Furthermore, we build a dataset that contains vulnerable contracts from real-world security incidents. SmartOracle successfully detects 466 abnormal transactions with an acceptable precision rate 96%, involving 31 vulnerable contracts. The experimental results demonstrate its effectiveness in detecting smart contract vulnerabilities, especially those related to complicated contract functionalities.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Authors:
Ziyang Ma,
Mingjie Chen,
Hezhao Zhang,
Zhisheng Zheng,
Wenxi Chen,
Xiquan Li,
Jiaxin Ye,
Xie Chen,
Thomas Hain
Abstract:
Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers nu…
▽ More
Speech emotion recognition (SER) is an important part of human-computer interaction, receiving extensive attention from both industry and academia. However, the current research field of SER has long suffered from the following problems: 1) There are few reasonable and universal splits of the datasets, making comparing different models and methods difficult. 2) No commonly used benchmark covers numerous corpus and languages for researchers to refer to, making reproduction a burden. In this paper, we propose EmoBox, an out-of-the-box multilingual multi-corpus speech emotion recognition toolkit, along with a benchmark for both intra-corpus and cross-corpus settings. For intra-corpus settings, we carefully designed the data partitioning for different datasets. For cross-corpus settings, we employ a foundation SER model, emotion2vec, to mitigate annotation errors and obtain a test set that is fully balanced in speakers and emotions distributions. Based on EmoBox, we present the intra-corpus SER results of 10 pre-trained speech models on 32 emotion datasets with 14 languages, and the cross-corpus SER results on 4 datasets with the fully balanced test sets. To the best of our knowledge, this is the largest SER benchmark, across language scopes and quantity scales. We hope that our toolkit and benchmark can facilitate the research of SER in the community.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement
Authors:
Tong Wu,
Yanpeng Zhao,
Zilong Zheng
Abstract:
Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian…
▽ More
Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle (CREAM), which interpolates positional encodings by manipulating position indices. Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (eg, Llama 2-4K) and can extend LLMs to a much longer target context length (eg, 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the ``Lost-in-the-Middle'' problem faced by long-context LLMs. Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of $\texttt{Llama2-7B}$ with ``Never Miss A Beat''. Our code will be publicly available soon.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
TraceMesh: Scalable and Streaming Sampling for Distributed Traces
Authors:
Zhuangbin Chen,
Zhihan Jiang,
Yuxin Su,
Michael R. Lyu,
Zibin Zheng
Abstract:
Distributed tracing serves as a fundamental element in the monitoring of cloud-based and datacenter systems. It provides visibility into the full lifecycle of a request or operation across multiple services, which is essential for understanding system dependencies and performance bottlenecks. To mitigate computational and storage overheads, most tracing frameworks adopt a uniform sampling strategy…
▽ More
Distributed tracing serves as a fundamental element in the monitoring of cloud-based and datacenter systems. It provides visibility into the full lifecycle of a request or operation across multiple services, which is essential for understanding system dependencies and performance bottlenecks. To mitigate computational and storage overheads, most tracing frameworks adopt a uniform sampling strategy, which inevitably captures overlapping and redundant information. More advanced methods employ learning-based approaches to bias the sampling toward more informative traces. However, existing methods fall short of considering the high-dimensional and dynamic nature of trace data, which is essential for the production deployment of trace sampling. To address these practical challenges, in this paper we present TraceMesh, a scalable and streaming sampler for distributed traces. TraceMesh employs Locality-Sensitivity Hashing (LSH) to improve sampling efficiency by projecting traces into a low-dimensional space while preserving their similarity. In this process, TraceMesh accommodates previously unseen trace features in a unified and streamlined way. Subsequently, TraceMesh samples traces through evolving clustering, which dynamically adjusts the sampling decision to avoid over-sampling of recurring traces. The proposed method is evaluated with trace data collected from both open-source microservice benchmarks and production service systems. Experimental results demonstrate that TraceMesh outperforms state-of-the-art methods by a significant margin in both sampling accuracy and efficiency.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond
Authors:
Dewu Zheng,
Yanlin Wang,
Ensheng Shi,
Ruikai Zhang,
Yuchi Ma,
Hongyu Zhang,
Zibin Zheng
Abstract:
To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of…
▽ More
To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of software projects over time, which we refer to as evolving-ignored situation, leading to issues of future context leakage and useful context missing. This in turn results in inaccurate evaluation of LLMs' performance. In this paper, we conduct an empirical study to deeply understand LLMs' code generation performance within settings that reflect the evolving nature of software development. To achieve this, we first construct an evolving-aware repository-level code generation dataset, namely HumanEvo, equipped with an automated execution-based evaluation tool. Second, we manually categorize HumanEvo according to dependency levels to more comprehensively analyze the model's performance in generating functions with different dependency levels. Third, we conduct extensive experiments on HumanEvo with seven representative and diverse LLMs to verify the effectiveness of the proposed benchmark. We obtain many important findings through our experimental study. For example, we find that previous evolving-ignored evaluation approaches lead to inflated performance of the LLMs, ranging from 10.0% to 61.1%. Based on the findings, we give actionable suggestions on more realistic evaluation of LLMs on code generation. We also build a shared evolving-aware code generation toolbox to facilitate future research. Replication package including source code, datasets and appendix is available at https://github.com/DeepSoftwareAnalytics/EvoEval.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
How is the Pilot Doing: VTOL Pilot Workload Estimation by Multimodal Machine Learning on Psycho-physiological Signals
Authors:
Jong Hoon Park,
Lawrence Chen,
Ian Higgins,
Zhaobo Zheng,
Shashank Mehrotra,
Kevin Salubre,
Mohammadreza Mousaei,
Steven Willits,
Blain Levedahl,
Timothy Buker,
Eliot Xing,
Teruhisa Misu,
Sebastian Scherer,
Jean Oh
Abstract:
Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a cr…
▽ More
Vertical take-off and landing (VTOL) aircraft do not require a prolonged runway, thus allowing them to land almost anywhere. In recent years, their flexibility has made them popular in development, research, and operation. When compared to traditional fixed-wing aircraft and rotorcraft, VTOLs bring unique challenges as they combine many maneuvers from both types of aircraft. Pilot workload is a critical factor for safe and efficient operation of VTOLs. In this work, we conduct a user study to collect multimodal data from 28 pilots while they perform a variety of VTOL flight tasks. We analyze and interpolate behavioral patterns related to their performance and perceived workload. Finally, we build machine learning models to estimate their workload from the collected data. Our results are promising, suggesting that quantitative and accurate VTOL pilot workload monitoring is viable. Such assistive tools would help the research field understand VTOL operations and serve as a stepping stone for the industry to ensure VTOL safe operations and further remote operations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models
Authors:
Ziqiang Liu,
Feiteng Fang,
Xi Feng,
Xinrun Du,
Chenhao Zhang,
Zekun Wang,
Yuelin Bai,
Qixuan Zhao,
Liyang Fan,
Chengguang Gan,
Hongquan Lin,
Jiaming Li,
Yuansheng Ni,
Haihong Wu,
Yaswanth Narsupalli,
Zhigang Zheng,
Chengming Li,
Xiping Hu,
Ruifeng Xu,
Xiaojun Chen,
Min Yang,
Jiaheng Liu,
Ruibo Liu,
Wenhao Huang,
Ge Zhang
, et al. (1 additional authors not shown)
Abstract:
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,…
▽ More
The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench.
△ Less
Submitted 11 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.