subscribe to arXiv mailings

SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, vertebrae detection serving as the first step, is affected by the pathology and mental implants. Thus, incorrect detections cause biased patches before segmentation, then lead to inconsistent labeling and segmentation. In our work, motivated by the perspective of instance segmentation, we try to label individual and complete binary masks to address this limitation. Specifically, a contour-based network is proposed based on Structural Low-Rank Descriptors for shape consistency, termed SLoRD. These contour descriptors are acquired in a data-driven manner in advance. For a more precise representation of contour descriptors, we adopt the spherical coordinate system and devise the spherical centroid. Besides, the contour loss is designed to impose explicit consistency constraints, facilitating regressed contour points close to vertebral boundaries. Quantitative and qualitative evaluations on VerSe 2019 demonstrate the superior performance of our framework over other single-stage and multi-stage state-of-the-art (SOTA) methods. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Under review

arXiv:2407.02817 [pdf]

Operando monitoring of strain field distribution in lithium battery anode via ultra-high spatial resolution optical frequency domain reflectometer

Authors: Kaijun Liu, Zhijuan Zou, Guolu Yin, Yingze Song, Zeheng Zhang, Yuyang Lou, Zixuan Zhong, Huafeng Lu, Duidui Li, Tao Zhu

Abstract: The cycling performance of lithium-ion batteries is closely related to the expansion effect of anode materials during charge and discharge processes. Studying the mechanical field evolution of anode materials is crucial for evaluating battery per-formance. Here, we propose a phase-sensitive ultra-high spatial resolution optical frequency domain reflectometry tech-nique, in which the test fiber is… ▽ More The cycling performance of lithium-ion batteries is closely related to the expansion effect of anode materials during charge and discharge processes. Studying the mechanical field evolution of anode materials is crucial for evaluating battery per-formance. Here, we propose a phase-sensitive ultra-high spatial resolution optical frequency domain reflectometry tech-nique, in which the test fiber is embedded into the anode of a lithium-ion battery to monitor the mechanical evolution of the anode material during cycling. We investigated the strain evolution of the anode material under different loading levels and used this method to infer the morphological changes of the material. Furthermore, combining this with battery capacity in-formation provides a new approach for assessing the performance of lithium-ion batteries. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures

arXiv:2407.02095 [pdf, other]

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Authors: Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng

Abstract: Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined type… ▽ More Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference. △ Less

Submitted 16 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by ICSE'25

arXiv:2406.15806 [pdf, other]

Robust Dynamic Control Barrier Function Based Trajectory Planning for Mobile Manipulator

Authors: Lihao Xu, Xiaogang Xiong, Bai Yang, Yunjiang Lou

Abstract: High-dimensional robot dynamic trajectory planning poses many challenges for traditional planning algorithms. Existing planning methods suffer from issues such as long computation times, limited capacity to address intricate obstacle models, and lack of consideration for external disturbances and measurement inaccuracies in these high-dimensional systems. To tackle these challenges, this paper pro… ▽ More High-dimensional robot dynamic trajectory planning poses many challenges for traditional planning algorithms. Existing planning methods suffer from issues such as long computation times, limited capacity to address intricate obstacle models, and lack of consideration for external disturbances and measurement inaccuracies in these high-dimensional systems. To tackle these challenges, this paper proposes a novel trajectory planning approach that combines Dynamic Control Barrier Function (DCBF) with a disturbance observer to create a Robust Dynamic Control Barrier Function (RDCBF) planner. This approach successfully plans trajectories in environments with complex dynamic obstacles while accounting for external disturbances and measurement uncertainties, ensuring system safety and enabling precise obstacle avoidance. Experimental results on a mobile manipulator demonstrate outstanding performance of the proposed approach. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.11707 [pdf, other]

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jianping Wang

Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

arXiv:2406.11147 [pdf, other]

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77. △ Less

Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10018 [pdf, other]

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

Authors: Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou

Abstract: Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based rep… ▽ More Repository-level code completion is challenging as it involves complicated contexts from multiple files in the repository. To date, researchers have proposed two technical categories to enhance LLM-based repository-level code completion, i.e., retrieval-augmented generation (RAG) and static analysis integration. This work performs the first study on the static analysis integration in LLM-based repository-level code completion by investigating both the effectiveness and efficiency of static analysis integration strategies across different phases of code completion. We first implement a framework STALL+, which supports an extendable and customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion; and based on STALL+, we perform extensive experiments by including different code LLMs on the latest repository-level code completion benchmark CrossCodeEval. Our findings show that integrating file-level dependencies in prompting phase performs the best while the integration in post-processing phase performs the worse. Additionally, we observe different improvements from static analysis between dynamic languages and static languages, i.e., the best combination is prompting-phase with decoding-phase integration for Java while the best combination is prompting-phase with post-processing-phase integration for Python given the limitations of statically analyzing dynamic languages. Additionally, we find the complementarity between RAG and static analysis integration as well as their cost-effectiveness after combination. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 5 figures

arXiv:2406.03803 [pdf, ps, other]

Determining the Weight Spectrum of the Reed--Muller Codes RM(m-6,m)

Authors: Yueying Lou, Qichun Wang

Abstract: The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructi… ▽ More The weight spectra of the Reed-Muller codes $RM(r,m)$ were unknown for $r=3,...,m-5$. In IEEE Trans. Inform. Theory 2024, Carlet determined the weight spectrum of $RM(m-5,m)$ for $m\ge10$ using the Maiorana-McFarland construction, where the result was tried to be extended to $RM(m-6,m)$, but many problems occurred and much work needed to be done. In this paper, we propose a novel way of constructing Reed--Muller codewords and determine the weight spectrum of $RM(m-6,m)$ for $m\ge12$, which gives a positive answer to an open question on the weight spectrum of $RM(m-c,m)$ for $c=6$. Moreover, we put forward a conjecture and verify it for some cases. If the conjecture is true, then that open question can be completely solved. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01030 [pdf, other]

Neutrino observables in gauged $U(1)_{L_α-L_β}$ models with two Higgs doublet and one singlet scalars

Authors: Yuanchao Lou, Takaaki Nomura

Abstract: We discuss neutrino sector in models with two Higgs doublet and one singlet scalar fields under local $U(1)_{L_α- L_β}$ symmetry. A neutrino mass matrix is formulated for these models where the matrix is generated via type-I seesaw mechanism introducing right-handed neutrinos. The neutrino mass matrix has more degrees of freedom compared to minimal scenarios which have only one new scalar field, b… ▽ More We discuss neutrino sector in models with two Higgs doublet and one singlet scalar fields under local $U(1)_{L_α- L_β}$ symmetry. A neutrino mass matrix is formulated for these models where the matrix is generated via type-I seesaw mechanism introducing right-handed neutrinos. The neutrino mass matrix has more degrees of freedom compared to minimal scenarios which have only one new scalar field, but its structure is still restricted by the symmetry. Then it is find that sum of neutrino mass can be lower than minimal scenarios and it is easier to satisfy observed constraints. In addition, we can fit neutrino data for $U(1)_{L_e - L_{μ(τ)}}$ cases which are disfavored in minimal models. Furthermore, some correlations among sum of neutrino mass and CP violating phases are still found although we have more free parameters. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 17 pages, 5 figures

arXiv:2406.00208 [pdf, ps, other]

Progresses on some open problems related to infinitely many symmetries

Authors: S. Y. Lou

Abstract: The quest to reveal the physical essence of the infinitely many symmetries and conservation laws that are intrinsic to integrable systems has historically posed a significant challenge at the confluence of physics and mathematics. This scholarly investigation delves into five open problems related to these boundless symmetries within integrable systems by scrutinizing their multi-wave solutions, e… ▽ More The quest to reveal the physical essence of the infinitely many symmetries and conservation laws that are intrinsic to integrable systems has historically posed a significant challenge at the confluence of physics and mathematics. This scholarly investigation delves into five open problems related to these boundless symmetries within integrable systems by scrutinizing their multi-wave solutions, employing a fresh analytical methodology. For a specified integrable system, there exist various categories of $n$-wave solutions. Each sub-wave comprising the $n$-wave solution may possess free parameters, including center, width, and periodic parameters. It is evident that these solutions are translation invariant with respect to all these free parameters. We postulate that the entirety of the recognized infinite symmetries merely constitute linear combinations of these finite wave parameter translation symmetries. The conjecture intimates that the currently known infinitely many symmetries are not exhaustive, and an indeterminate number of symmetries remain to be discovered. This conjecture further indicates that by imposing an infinite array of symmetry constraints, it becomes feasible to derive exact multi-wave solutions. By considering the renowned KdV equation and the Burgers equation as simple examples, the conjecture is substantiated for the $n$-soliton solutions. It is unequivocal that any linear combination of the wave parameter translation symmetries retains its status as a symmetry associated with the particular solution. This observation suggests that by introducing a ren-variable and a ren-symmetric derivative which serve as generalizations of the Grassmann variable and the super derivative, it may be feasible to unify classical integrable systems, supersymmetric integrable systems, and ren-symmetric integrable systems within a cohesive hierarchical framework. △ Less

Submitted 3 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.09752 [pdf, other]

Time-Varying Graph Signal Recovery Using High-Order Smoothness and Adaptive Low-rankness

Authors: Weihong Guo, Yifei Lou, Jing Qin, Ming Yan

Abstract: Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a… ▽ More Time-varying graph signal recovery has been widely used in many applications, including climate change, environmental hazard monitoring, and epidemic studies. It is crucial to choose appropriate regularizations to describe the characteristics of the underlying signals, such as the smoothness of the signal over the graph domain and the low-rank structure of the spatial-temporal signal modeled in a matrix form. As one of the most popular options, the graph Laplacian is commonly adopted in designing graph regularizations for reconstructing signals defined on a graph from partially observed data. In this work, we propose a time-varying graph signal recovery method based on the high-order Sobolev smoothness and an error-function weighted nuclear norm regularization to enforce the low-rankness. Two efficient algorithms based on the alternating direction method of multipliers and iterative reweighting are proposed, and convergence of one algorithm is shown in detail. We conduct various numerical experiments on synthetic and real-world data sets to demonstrate the proposed method's effectiveness compared to the state-of-the-art in graph signal recovery. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.09031 [pdf, other]

Principal eigenvalue for some elliptic operators with large drift: Neumann boundary conditions

Authors: Shuang Liu, Yuan Lou, Maolin Zhou

Abstract: The paper is concerned with the principal eigenvalue of some linear elliptic operators with drift in two dimensional space. We provide a refined description of the asymptotic behavior for the principal eigenvalue as the drift rate approaches infinity. Under some non-degeneracy assumptions, our results illustrate that these asymptotic behaviors are completely determined by some connected components… ▽ More The paper is concerned with the principal eigenvalue of some linear elliptic operators with drift in two dimensional space. We provide a refined description of the asymptotic behavior for the principal eigenvalue as the drift rate approaches infinity. Under some non-degeneracy assumptions, our results illustrate that these asymptotic behaviors are completely determined by some connected components in the omega-limit set of the system of ordinary differential equations associated with the drift term, which includes stable fixed points, stable limit cycles, hyperbolic saddles connecting homoclinic orbits, and families of closed orbits. Some discussions on degenerate cases are also included. △ Less

Submitted 15 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: 53 pages, 9 figures

MSC Class: 35P15; 35P20; 34C25

arXiv:2405.07915 [pdf, other]

Discovery of highly anisotropic dielectric crystals with equivariant graph neural networks

Authors: Yuchen Lou, Alex M. Ganose

Abstract: Anisotropy in crystals plays a pivotal role in many technological applications. For example, anisotropic electronic and thermal transport are thought to be beneficial for thermoelectric applications, while anisotropic mechanical properties are of interest for emerging metamaterials, and anisotropic dielectric materials have been suggested as a novel platform for dark matter detection. Understandin… ▽ More Anisotropy in crystals plays a pivotal role in many technological applications. For example, anisotropic electronic and thermal transport are thought to be beneficial for thermoelectric applications, while anisotropic mechanical properties are of interest for emerging metamaterials, and anisotropic dielectric materials have been suggested as a novel platform for dark matter detection. Understanding and tailoring anisotropy in crystals is therefore essential for the design of next-generation functional materials. To date, however, most data-driven approaches have focused on the prediction of scalar crystal properties, such as the spherically averaged dielectric tensor or the bulk and shear elastic moduli. Here, we adopt the latest approaches in equivariant graph neural networks to develop a model that can predict the full dielectric tensor of crystals. Our model, trained on the Materials Project dataset of c.a. 6,700 dielectric tensors, achieves state-of-the-art accuracy in scalar dielectric prediction in addition to capturing the directional response. We showcase the performance of the model by discovering crystals with almost isotropic connectivity but highly anisotropic dielectric tensors, thereby broadening our knowledge of the structure-property relationships in dielectric crystals. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.14294 [pdf, other]

A Survey on Efficient Inference for Large Language Models

Authors: Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

Abstract: Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This p… ▽ More Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within the field have been directed towards developing techniques aimed at enhancing the efficiency of LLM inference. This paper presents a comprehensive survey of the existing literature on efficient LLM inference. We start by analyzing the primary causes of the inefficient LLM inference, i.e., the large model size, the quadratic-complexity attention operation, and the auto-regressive decoding approach. Then, we introduce a comprehensive taxonomy that organizes the current literature into data-level, model-level, and system-level optimization. Moreover, the paper includes comparative experiments on representative methods within critical sub-fields to provide quantitative insights. Last but not least, we provide some knowledge summary and discuss future research directions. △ Less

Submitted 8 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11978 [pdf, other]

EVIT: Event-Oriented Instruction Tuning for Event Reasoning

Authors: Zhengwei Tao, Xiancai Chen, Zhi Jin, Xiaoying Bai, Haiyan Zhao, Yiwei Lou

Abstract: Events refer to specific occurrences, incidents, or happenings that take place under a particular background. Event reasoning aims to infer events according to certain relations and predict future events. The cutting-edge techniques for event reasoning play a crucial role in various natural language processing applications. Large language models (LLMs) have made significant advancements in event r… ▽ More Events refer to specific occurrences, incidents, or happenings that take place under a particular background. Event reasoning aims to infer events according to certain relations and predict future events. The cutting-edge techniques for event reasoning play a crucial role in various natural language processing applications. Large language models (LLMs) have made significant advancements in event reasoning owing to their wealth of knowledge and reasoning capabilities. However, smaller instruction-tuned models currently in use do not consistently demonstrate exceptional proficiency in managing these tasks. This discrepancy arises from the absence of explicit modeling of events and the interconnections of them within their instruction data. Consequently, these models face challenges in comprehending event structures and semantics while struggling to bridge the gap between their interpretations and human understanding of events. Additionally, their limitations in grasping event relations lead to constrained event reasoning abilities to effectively deduce and incorporate pertinent event knowledge. In this paper, we propose Event-Oriented Instruction Tuning (EvIT) to train our LLM. Specifically, we first propose a novel structure named event quadruple which contains the structure and semantics of events and is complete in the event representation. We then design event-relation learning based on the structures. We encapsulate the learning into the instruction-tuning formulation to better stimulate the event reasoning capacity of our model. We design a heuristic unsupervised method to mine event quadruple from a large-scale corpus. At last, we finetune a Llama model on our Event-Oriented Instruction Tuning. We conduct extensive experiments on event reasoning tasks on several datasets. Automatic and human evaluations demonstrate EvIT achieves competitive performances on event reasoning. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.05952 [pdf, other]

Robot Safe Planning In Dynamic Environments Based On Model Predictive Control Using Control Barrier Function

Authors: Zetao Lu, Kaijun Feng, Jun Xu, Haoyao Chen, Yunjiang Lou

Abstract: Implementing obstacle avoidance in dynamic environments is a challenging problem for robots. Model predictive control (MPC) is a popular strategy for dealing with this type of problem, and recent work mainly uses control barrier function (CBF) as hard constraints to ensure that the system state remains in the safe set. However, in crowded scenarios, effective solutions may not be obtained due to i… ▽ More Implementing obstacle avoidance in dynamic environments is a challenging problem for robots. Model predictive control (MPC) is a popular strategy for dealing with this type of problem, and recent work mainly uses control barrier function (CBF) as hard constraints to ensure that the system state remains in the safe set. However, in crowded scenarios, effective solutions may not be obtained due to infeasibility problems, resulting in degraded controller performance. We propose a new MPC framework that integrates CBF to tackle the issue of obstacle avoidance in dynamic environments, in which the infeasibility problem induced by hard constraints operating over the whole prediction horizon is solved by softening the constraints and introducing exact penalty, prompting the robot to actively seek out new paths. At the same time, generalized CBF is extended as a single-step safety constraint of the controller to enhance the safety of the robot during navigation. The efficacy of the proposed method is first shown through simulation experiments, in which a double-integrator system and a unicycle system are employed, and the proposed method outperforms other controllers in terms of safety, feasibility, and navigation efficiency. Furthermore, real-world experiment on an MR1000 robot is implemented to demonstrate the effectiveness of the proposed method. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.16362 [pdf, other]

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Authors: Yihao Qin, Shangwen Wang, Yiling Lou, Jinhao Dong, Kaixin Wang, Xiaoling Li, Xiaoguang Mao

Abstract: Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code sc… ▽ More Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code scope (i.e., a method or a class), which struggles to diagnose bugs for a large code scope (i.e., an entire software system). To address the limitation, this paper presents AgentFL, a multi-agent system based on ChatGPT for automated fault localization. By simulating the behavior of a human developer, AgentFL models the FL task as a three-step process, which involves comprehension, navigation, and confirmation. Within each step, AgentFL hires agents with diversified expertise, each of which utilizes different tools to handle specific tasks. Particularly, we adopt a series of auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome the challenges in each step. The evaluation on the widely used Defects4J-V1.2.0 benchmark shows that AgentFL can localize 157 out of 395 bugs within Top-1, which outperforms the other LLM-based approaches and exhibits complementarity to the state-of-the-art learning-based techniques. Additionally, we confirm the indispensability of the components in AgentFL with the ablation study and demonstrate the usability of AgentFL through a user study. Finally, the cost analysis shows that AgentFL spends an average of only 0.074 dollars and 97 seconds for a single bug. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15282 [pdf, other]

A story of viral co-infection, co-transmission and co-feeding in ticks: how to compute an invasion reproduction number

Authors: Giulia Belluccini, Qianying Lin, Bevelynn Williams, Yijun Lou, Zati Vatansever, Martín López-García, Grant Lythe, Thomas Leitner, Ethan Romero-Severson, Carmen Molina-París

Abstract: With a single circulating vector-borne virus, the basic reproduction number incorporates contributions from tick-to-tick (co-feeding), tick-to-host and host-to-tick transmission routes. With two different circulating vector-borne viral strains, resident and invasive, and under the assumption that co-feeding is the only transmission route in a tick population, the invasion reproduction number depen… ▽ More With a single circulating vector-borne virus, the basic reproduction number incorporates contributions from tick-to-tick (co-feeding), tick-to-host and host-to-tick transmission routes. With two different circulating vector-borne viral strains, resident and invasive, and under the assumption that co-feeding is the only transmission route in a tick population, the invasion reproduction number depends on whether the model system of ordinary differential equations possesses the property of neutrality. We show that a simple model, with two populations of ticks infected with one strain, resident or invasive, and one population of co-infected ticks, does not have Alizon's neutrality property. We present model alternatives that are capable of representing the invasion potential of a novel strain by including populations of ticks dually infected with the same strain. The invasion reproduction number is analysed with the next-generation method and via numerical simulations. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 37 pages and 4 figures

arXiv:2402.03610 [pdf, other]

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

Authors: Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You

Abstract: Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning… ▽ More Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.17786 [pdf, other]

A Graph-Native Query Optimization Framework

Authors: Bingqing Lyu, Xiaoli Zhou, Longbin Lai, Yufan Yang, Yunkai Lou, Wenyuan Yu, Jingren Zhou

Abstract: Graph queries that combine pattern matching with relational operations, referred as PatRelQuery, are widely used in many real-world applications. It allows users to identify arbitrary patterns in a graph and further perform in-depth relational analysis on the results. To effectively support PatRelQuery, two key challenges need to be addressed: (1) how to optimize PatRelQuery in a unified framework… ▽ More Graph queries that combine pattern matching with relational operations, referred as PatRelQuery, are widely used in many real-world applications. It allows users to identify arbitrary patterns in a graph and further perform in-depth relational analysis on the results. To effectively support PatRelQuery, two key challenges need to be addressed: (1) how to optimize PatRelQuery in a unified framework, and (2) how to handle the arbitrary type constraints in patterns in PatRelQuery. In this paper, we present a graph-native query optimization framework named GOpt, to tackle these issues. GOpt is built on top of a unified intermediate representation (IR) that is capable of capturing both graph and relational operations, thereby streamlining the optimization of PatRelQuery. To handle the arbitrary type constraints, GOpt employs an automatic type inference approach to identify implicit type constraints. Additionally, GOpt introduces a graph-native optimizer, which encompasses an extensive collection of optimization rules along with cost-based techniques tailored for arbitrary patterns, to optimize PatRelQuery. Through comprehensive experiments, we demonstrate that GOpt can achieve significant query performance improvements, in both crafted benchmarks and real-world applications. △ Less

Submitted 5 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15007 [pdf, other]

Noise-Tolerant Optimization Methods for the Solution of a Robust Design Problem

Authors: Yuchen Lou, Shigeng Sun, Jorge Nocedal

Abstract: The development of nonlinear optimization algorithms capable of performing reliably in the presence of noise has garnered considerable attention lately. This paper advocates for strategies to create noise-tolerant nonlinear optimization algorithms by adapting classical deterministic methods. These adaptations follow certain design guidelines described here, which make use of estimates of the noise… ▽ More The development of nonlinear optimization algorithms capable of performing reliably in the presence of noise has garnered considerable attention lately. This paper advocates for strategies to create noise-tolerant nonlinear optimization algorithms by adapting classical deterministic methods. These adaptations follow certain design guidelines described here, which make use of estimates of the noise level in the problem. The application of our methodology is illustrated by the development of a line search gradient projection method, which is tested on an engineering design problem. It is shown that a new self-calibrated line search and noise-aware finite-difference techniques are effective even in the high noise regime. Numerical experiments investigate the resiliency of key algorithmic components. A convergence analysis of the line search gradient projection method establishes convergence to a neighborhood of the solution. △ Less

Submitted 26 January, 2024; originally announced January 2024.

MSC Class: 90C30; 90C15; 93B51; 65K05

arXiv:2401.01738 [pdf, other]

doi 10.1109/TWC.2024.3351856

Integrated Sensing and Communication with Massive MIMO: A Unified Tensor Approach for Channel and Target Parameter Estimation

Authors: Ruoyu Zhang, Lei Cheng, Shuai Wang, Yi Lou, Yulong Gao, Wen Wu, Derrick Wing Kwan Ng

Abstract: Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channe… ▽ More Benefitting from the vast spatial degrees of freedom, the amalgamation of integrated sensing and communication (ISAC) and massive multiple-input multiple-output (MIMO) is expected to simultaneously improve spectral and energy efficiencies as well as the sensing capability. However, a large number of antennas deployed in massive MIMO-ISAC raises critical challenges in acquiring both accurate channel state information and target parameter information. To overcome these two challenges with a unified framework, we first analyze their underlying system models and then propose a novel tensor-based approach that addresses both the channel estimation and target sensing problems. Specifically, by parameterizing the high-dimensional communication channel exploiting a small number of physical parameters, we associate the channel state information with the sensing parameters of targets in terms of angular, delay, and Doppler dimensions. Then, we propose a shared training pattern adopting the same time-frequency resources such that both the channel estimation and target parameter estimation can be formulated as a canonical polyadic decomposition problem with a similar mathematical expression. On this basis, we first investigate the uniqueness condition of the tensor factorization and the maximum number of resolvable targets by utilizing the specific Vandermonde △ Less

Submitted 3 January, 2024; originally announced January 2024.

Journal ref: IEEE Transactions on Wireless Communications, 2024

arXiv:2312.14538 [pdf, ps, other]

doi 10.1016/j.physd.2024.134199

Extensions of dark KdV equations: nonhomogeneous classifications, bosonizations of fermionic systems and supersymmetric dark systems

Authors: S. Y. Lou

Abstract: Dark equations are defined as some kinds of integrable couplings with some fields being homogeneously and linearly coupled to others. In this paper, dark equations are extended in several aspects. Taking the Korteweg-de Vrise (KdV) equation as an example, the dark KdV systems are extended to nonhomogenous forms, nonlinear couplings and graded linear cases. The two-component nonhomogeneous linear c… ▽ More Dark equations are defined as some kinds of integrable couplings with some fields being homogeneously and linearly coupled to others. In this paper, dark equations are extended in several aspects. Taking the Korteweg-de Vrise (KdV) equation as an example, the dark KdV systems are extended to nonhomogenous forms, nonlinear couplings and graded linear cases. The two-component nonhomogeneous linear coupled dark KdV systems are completely classified. The nonlinear coupled dark KdV systems may be obtained through the decompositions from higher dimensional integrable systems like the B-type KP equation. Graded linear coupled dark KdV systems may be produced by introducing dark parameters (including the Grassmann parameters) to usual integrable systems. Especially, applying the bosonization approach to the integrable systems with fermion fields such as the supersymmetric integrable systems and super-integrable models, infinitely many graded linear dark systems can be generated. Finally, the dark KdV systems are extended to supersymmetric ones. The full classifications for the supersymmetric dark KdV systems are obtained related to two types of usual supersymmetric KdV equations. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 10 pages, 0 figures

Journal ref: Physica D 464(2024)134199

arXiv:2312.14361 [pdf, ps, other]

A Gradient-Based Optimization Method Using the Koopman Operator

Authors: Mengqi Hu, Bian Li, Yi-An Ma, Yifei Lou, Xiu Yang

Abstract: In this paper, we propose a novel approach to solving optimization problems by reformulating the optimization problem into a dynamical system, followed by the adaptive spectral Koopman (ASK) method. The Koopman operator, employed in our approach, approximates the evolution of an ordinary differential equation (ODE) using a finite number of eigenfunctions and eigenvalues. We begin by providing a br… ▽ More In this paper, we propose a novel approach to solving optimization problems by reformulating the optimization problem into a dynamical system, followed by the adaptive spectral Koopman (ASK) method. The Koopman operator, employed in our approach, approximates the evolution of an ordinary differential equation (ODE) using a finite number of eigenfunctions and eigenvalues. We begin by providing a brief overview of the Koopman operator and the ASK method. Subsequently, we adapt the ASK method for solving a general optimization problem. Moreover, we provide an error bound to aid in understanding the performance of the proposed approach, marking the initial step in a more comprehensive numerical analysis. Experimentally, we demonstrate the applicability and accuracy of our method across a diverse range of optimization problems, including min-max problems. Our approach consistently yields smaller gradient norms and higher success rates in finding critical points compared to state-of-the-art gradient-based methods. We also observe the proposed method works particularly well when the dynamical properties of the system can be effectively modeled by the system's behaviors in a neighborhood of critical points. △ Less

Submitted 21 December, 2023; originally announced December 2023.

MSC Class: 37N30; 37N40; 37Mxx; 46N10; 47N10

arXiv:2312.10448 [pdf, other]

Resolving Crash Bugs via Large Language Models: An Empirical Study

Authors: Xueying Du, Mingwei Liu, Juntao Li, Hanlin Wang, Xin Peng, Yiling Lou

Abstract: Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution. However, manually resolving crash bugs is challenging and labor-intensive, and researchers have proposed various techniques for their automated localization and repair. ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across vario… ▽ More Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution. However, manually resolving crash bugs is challenging and labor-intensive, and researchers have proposed various techniques for their automated localization and repair. ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across various domains. This work performs the first investigation into ChatGPT's capability in resolve real-world crash bugs, focusing on its effectiveness in both localizing and repairing code-related and environment-related crash bugs. Specifically, we initially assess ChatGPT's fundamental ability to resolve crash bugs with basic prompts in a single iteration. We observe that ChatGPT performs better at resolving code-related crash bugs compared to environment-related ones, and its primary challenge in resolution lies in inaccurate localization. Additionally, we explore ChatGPT's potential with various advanced prompts. Furthermore, by stimulating ChatGPT's self-planning, it methodically investigates each potential crash-causing environmental factor through proactive inquiry, ultimately identifying the root cause of the crash. Based on our findings, we propose IntDiagSolver, an interaction methodology designed to facilitate precise crash bug resolution through continuous interaction with LLMs. Evaluating IntDiagSolver on multiple LLMs reveals consistent enhancement in the accuracy of crash bug resolution, including ChatGPT, Claude, and CodeLlama. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2312.08367 [pdf, other]

ViLA: Efficient Video-Language Alignment for Video Question Answering

Authors: Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang

Abstract: In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a new cross-modal distillation (QFormer-Distiller) module. Pre-trained large image-language models have shown promising resu… ▽ More In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a new cross-modal distillation (QFormer-Distiller) module. Pre-trained large image-language models have shown promising results on problems such as visual question answering (VQA). However, how to efficiently and effectively sample video frames when adapting pre-trained large image-language model to video-language alignment is still the major challenge. Compared with prior work, our ViLA model demonstrates the capability of selecting key frames with critical contents, thus improving the video-language alignment accuracy while reducing the inference latency +3.3% on NExT-QA Temporal with 3.0X speed up). Overall, our ViLA network outperforms the state-of-the-art methods on the video question-answering benchmarks: +4.6% on STAR Interaction, +2.2% on STAR average with 3.0X speed up, ours 2-frames out-perform SeViLA 4-frames on the VLEP dataset with 4.2X speed-up. △ Less

Submitted 29 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.05795 [pdf, other]

Improvements on Uncertainty Quantification for Node Classification via Distance-Based Regularization

Authors: Russell Alan Hart, Linlin Yu, Yifei Lou, Feng Chen

Abstract: Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are intere… ▽ More Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are interested in uncertainty quantification for interdependent node-level classification. We start our analysis based on graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function. We describe the theoretical limitations of the widely-used UCE loss. To alleviate the identified drawbacks, we propose a distance-based regularization that encourages clustered OOD nodes to remain clustered in the latent space. We conduct extensive comparison experiments on eight standard datasets and demonstrate that the proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Neurips 2023

arXiv:2311.04726 [pdf, other]

Social Motion Prediction with Cognitive Hierarchies

Authors: Wentao Zhu, Jason Qin, Yuke Lou, Hang Ye, Xiaoxuan Ma, Hai Ci, Yizhou Wang

Abstract: Humans exhibit a remarkable capacity for anticipating the actions of others and planning their own actions accordingly. In this study, we strive to replicate this ability by addressing the social motion prediction problem. We introduce a new benchmark, a novel formulation, and a cognition-inspired framework. We present Wusi, a 3D multi-person motion dataset under the context of team sports, which… ▽ More Humans exhibit a remarkable capacity for anticipating the actions of others and planning their own actions accordingly. In this study, we strive to replicate this ability by addressing the social motion prediction problem. We introduce a new benchmark, a novel formulation, and a cognition-inspired framework. We present Wusi, a 3D multi-person motion dataset under the context of team sports, which features intense and strategic human interactions and diverse pose distributions. By reformulating the problem from a multi-agent reinforcement learning perspective, we incorporate behavioral cloning and generative adversarial imitation learning to boost learning efficiency and generalization. Furthermore, we take into account the cognitive aspects of the human social action planning process and develop a cognitive hierarchy framework to predict strategic human social interactions. We conduct comprehensive experiments to validate the effectiveness of our proposed dataset and approach. Code and data are available at https://walter0807.github.io/Social-CH/. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023

arXiv:2311.04448 [pdf, other]

Inferring Resource-Oriented Intentions using LLMs for Static Resource Leak Detection

Authors: Chong Wang, Jianan Liu, Xin Peng, Yang Liu, Yiling Lou

Abstract: Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisiti… ▽ More Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources, suffering from both (1) false negatives caused by the incompleteness of predefined resource acquisition/release APIs and (2) false positives caused by the incompleteness of resource reachability validation identification. To overcome these challenges, we propose InferROI, a novel approach that leverages the exceptional code comprehension capability of large language models (LLMs) to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code. InferROI first prompts the LLM to infer involved intentions for a given code snippet, and then incorporates a two-stage static analysis approach to check control-flow paths for resource leak detection based on the inferred intentions. We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection. Experimental results on the DroidLeaks and JLeaks datasets demonstrate InferROI achieves promising bug detection rate (59.3% and 64.8%) and false alarm rate (18.6% and 24.0%). Compared to three industrial static detectors, InferROI detects 14~45 and 167~503 more bugs in DroidLeaks and JLeaks, respectively. When applied to real-world open-source projects, InferROI identifies 26 unknown resource leak bugs, with 7 of them being confirmed by developers. Finally, manual annotation indicated that InferROI achieved a precision of 74.6% and a recall of 81.8% in intention inference, covering more than 60% resource types involved in the datasets. The results of an ablation study underscores the importance of combining LLM-based inference with static analysis. △ Less

Submitted 2 July, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00964 [pdf, other]

doi 10.1145/3637528.3671521

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

Authors: Chengyao Wen, Yin Lou

Abstract: Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset acc… ▽ More Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions; Stage 1 generates a potentially large pool of rules and Stage 2 aims to produce a refined rule subset according to some criteria (typically based on precision and recall). This paper focuses on improving the flexibility and efficacy of this two-stage framework, and is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall). To this end, we first introduce a novel algorithm called SpectralRules that directly generates a compact pool of rules in Stage 1 with high diversity. We empirically find such diversity improves the quality of the final rule subset. In addition, we introduce an intermediate stage between Stage 1 and 2 that adopts the concept of Pareto optimality and aims to find a set of non-dominated rule subsets, which constitutes a Pareto front. This intermediate stage greatly simplifies the selection criteria and increases the flexibility of Stage 2. For this intermediate stage, we propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology over existing work. △ Less

Submitted 27 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.19586 [pdf, other]

doi 10.1109/TAC.2023.3321368

Generalized Multi-kernel Maximum Correntropy Kalman Filter for Disturbance Estimation

Authors: Shilei Li, Dawei Shi, Yunjiang Lou, Wulin Zou, Ling Shi

Abstract: Disturbance observers have been attracting continuing research efforts and are widely used in many applications. Among them, the Kalman filter-based disturbance observer is an attractive one since it estimates both the state and the disturbance simultaneously, and is optimal for a linear system with Gaussian noises. Unfortunately, The noise in the disturbance channel typically exhibits a heavy-tai… ▽ More Disturbance observers have been attracting continuing research efforts and are widely used in many applications. Among them, the Kalman filter-based disturbance observer is an attractive one since it estimates both the state and the disturbance simultaneously, and is optimal for a linear system with Gaussian noises. Unfortunately, The noise in the disturbance channel typically exhibits a heavy-tailed distribution because the nominal disturbance dynamics usually do not align with the practical ones. To handle this issue, we propose a generalized multi-kernel maximum correntropy Kalman filter for disturbance estimation, which is less conservative by adopting different kernel bandwidths for different channels and exhibits excellent performance both with and without external disturbance. The convergence of the fixed point iteration and the complexity of the proposed algorithm are given. Simulations on a robotic manipulator reveal that the proposed algorithm is very efficient in disturbance estimation with moderate algorithm complexity. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: in IEEE Transactions on Automatic Control (2023)

arXiv:2310.05372 [pdf]

doi 10.1016/j.scitotenv.2023.166714

The role of hydrodynamics for the spatial distribution of high-temperature hydrothermal vent-endemic fauna in the deep ocean environment

Authors: Zhiguo He, Yingzhong Lou, Haoyang Zhang, Xiqiu Han, Thomas Pähtz, Pengcheng Jiao, Peng Hu, Yadong Zhou, Yejian Wang, Zhongyan Qiu

Abstract: Active hydrothermal vents provide the surrounding submarine environment with substantial amounts of matter and energy, thus serving as important habitats for diverse megabenthic communities in the deep ocean and constituting a unique, highly productive chemosynthetic ecosystem on Earth. Vent-endemic biological communities gather near the venting site and are usually not found beyond a distance of… ▽ More Active hydrothermal vents provide the surrounding submarine environment with substantial amounts of matter and energy, thus serving as important habitats for diverse megabenthic communities in the deep ocean and constituting a unique, highly productive chemosynthetic ecosystem on Earth. Vent-endemic biological communities gather near the venting site and are usually not found beyond a distance of the order of 100 m from the vent. This is surprising because one would actually expect matter ejected from high-temperature vents, which generate highly turbulent buoyancy plumes, to be suspended and carried far away by the plume flows and deep-sea currents. Here, we study this problem from a fluid dynamics perspective by simulating the vent hydrodynamics using a numerical model that couples the plume flow with induced matter and energy transport. We find that both low- and high-temperature vents deposit most vent matter relatively close to the plume. In particular, the tendency of turbulent buoyancy plumes to carry matter far away is strongly counteracted by generated entrainment flows back into the plume stem. The deposition ranges of organic and inorganic hydrothermal particles obtained from the simulations for various natural high-temperature vents are consistent with the observed maximum spatial extent of biological communities, evidencing that plume hydrodynamics exercises strong control over the spatial distribution of vent-endemic fauna. While other factors affecting the spatial distribution of vent-endemic fauna, such as geology and geochemistry, are site-specific, the main physical features of plume hydrodynamics unraveled in this study are largely site-unspecific and therefore universal across vent sites on Earth. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Journal ref: Science of the Total Environment 904, 166714 (2023)

arXiv:2310.04551 [pdf, other]

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

Authors: Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

Abstract: Pre-training has been an important ingredient in developing strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By exam… ▽ More Pre-training has been an important ingredient in developing strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By examining the layer-wise representations, we demonstrate significant changes in these later layers during fine-tuning, indicating the ineffectiveness of their pre-trained features for depth estimation. To address these limitations, we propose MeSa, a comprehensive framework that leverages the complementary strengths of masked, geometric, and supervised pre-training. Hence, MeSa benefits from not only general-purpose representations learnt via masked pre training but also specialized depth-specific features acquired via geometric and supervised pre-training. Our CKA layer-wise analysis confirms that our pre-training strategy indeed produces improved representations for the later layers, overcoming the drawbacks of the SOTA SSL method. Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings. We also investigate the influence of the pre-training dataset and demonstrate the efficacy of pre-training on LSUN, which yields significantly better pre-trained representations. Overall, our approach surpasses the masked pre-training SSL method by a substantial margin of 17.1% on the RMSE. Moreover, even without utilizing any recently proposed techniques, MeSa also outperforms the most recent methods and establishes a new state-of-the-art for monocular depth estimation on the challenging NYUv2 dataset. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.06729 [pdf, ps, other]

From one to infinity: symmetries of integrable systems

Authors: S. Y. Lou, M. Jia

Abstract: Integrable systems constitute an essential part of modern physics. Traditionally, to approve a model is integrable one has to find its infinitely many symmetries or conserved quantities. In this letter, taking the well known Korteweg-de Vries and Boussinesq equations as examples, we show that it is enough to find only one nonlocal key-symmetry to guarantee the integrability. Starting from the nonl… ▽ More Integrable systems constitute an essential part of modern physics. Traditionally, to approve a model is integrable one has to find its infinitely many symmetries or conserved quantities. In this letter, taking the well known Korteweg-de Vries and Boussinesq equations as examples, we show that it is enough to find only one nonlocal key-symmetry to guarantee the integrability. Starting from the nonlocal key-symmetry, recursion operator(s) and then infinitely many symmetries and Lax pairs can be successfully found. △ Less

Submitted 10 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 6 pages

arXiv:2309.01372 [pdf, other]

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

Authors: Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

Abstract: We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity.Despite the recent significant process in text-based human motion generation,existing methods often prioritize fitting training motions at the expense of action diversity. Consequently, striking a balance between motion quality and diversity rem… ▽ More We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity.Despite the recent significant process in text-based human motion generation,existing methods often prioritize fitting training motions at the expense of action diversity. Consequently, striking a balance between motion quality and diversity remains an unresolved challenge. This problem is compounded by two key factors: 1) the lack of diversity in motion-caption pairs in existing benchmarks and 2) the unilateral and biased semantic understanding of the text prompt, focusing primarily on the verb component while neglecting the nuanced distinctions indicated by other words.In response to the first issue, we construct a large-scale Wild Motion-Caption dataset (WMC) to extend the restricted action boundary of existing well-annotated datasets, enabling the learning of diverse motions through a more extensive range of actions. To this end, a motion BLIP is trained upon a pretrained vision-language model, then we automatically generate diverse motion captions for the collected motion sequences. As a result, we finally build a dataset comprising 8,888 motions coupled with 141k text.To comprehensively understand the text command, we propose a Hierarchical Semantic Aggregation (HSA) module to capture the fine-grained semantics.Finally,we involve the above two designs into an effective Motion Discrete Diffusion (MDD) framework to strike a balance between motion quality and diversity. Extensive experiments on HumanML3D and KIT-ML show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity. Dataset, code, and pretrained models will be released to reproduce all of our results. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 12 pages, 7 figures

arXiv:2308.13780 [pdf, other]

doi 10.3847/1538-4357/ac6ef4

Measurement of the Gamma-Ray Energy Spectrum beyond 100 TeV from the HESS J1843$-$033 Region

Authors: M. Amenomori, S. Asano, Y. W. Bao, X. J. Bi, D. Chen, T. L. Chen, W. Y. Chen, Xu Chen, Y. Chen, Cirennima, S. W. Cui, Danzengluobu, L. K. Ding, J. H. Fang, K. Fang, C. F. Feng, Zhaoyang Feng, Z. Y. Feng, Qi Gao, A. Gomi, Q. B. Gou, Y. Q. Guo, Y. Y. Guo, H. H. He, Z. T. He , et al. (91 additional authors not shown)

Abstract: HESS J1843$-$033 is a very-high-energy gamma-ray source whose origin remains unidentified. This work presents, for the first time, the energy spectrum of gamma rays beyond $100\, {\rm TeV}$ from the HESS J1843$-$033 region using the data recorded by the Tibet air shower array and its underground muon detector array. A gamma-ray source with an extension of $0.34^{\circ} \pm 0.12^{\circ}$ is success… ▽ More HESS J1843$-$033 is a very-high-energy gamma-ray source whose origin remains unidentified. This work presents, for the first time, the energy spectrum of gamma rays beyond $100\, {\rm TeV}$ from the HESS J1843$-$033 region using the data recorded by the Tibet air shower array and its underground muon detector array. A gamma-ray source with an extension of $0.34^{\circ} \pm 0.12^{\circ}$ is successfully detected above $25\, {\rm TeV}$ at $(α,\, δ) = (281.09^{\circ}\pm 0.10^{\circ},\, -3.76^{\circ}\pm 0.09^{\circ})$ near HESS J1843$-$033 with a statistical significance of $6.2\, σ$, and the source is named TASG J1844$-$038. The position of TASG J1844$-$038 is consistent with those of HESS J1843$-$033, eHWC J1842$-$035, and LHAASO J1843$-$0338. The measured gamma-ray energy spectrum in $25\, {\rm TeV} < E < 130\, {\rm TeV}$ is described with ${\rm d}N/{\rm d}E = (9.70\pm 1.89)\times 10^{-16} (E/40\, {\rm TeV})^{-3.26\pm 0.30}\, {\rm TeV}^{-1} {\rm cm}^{-2} {\rm s}^{-1}$, and the spectral fit to the combined spectra of HESS J1843$-$033, LHAASO J1843$-$0338, and TASG J1844$-$038 implies the existence of a cutoff at $49.5\pm 9.0\, {\rm TeV}$. Associations of TASG J1844-038 with SNR G28.6$-$0.1 and PSR J1844-0346 are also discussed in detail for the first time. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 11 pages, 4 figures, 1 table

arXiv:2308.13561 [pdf, other]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data. △ Less

Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.11501 [pdf]

doi 10.1002/rob.22256

Four years of multi-modal odometry and mapping on the rail vehicles

Authors: Yusheng Wang, Weiwei Song, Yi Zhang, Fei Huang, Zhiyong Tu, Ruoying Li, Shimin Zhang, Yidong Lou

Abstract: Precise, seamless, and efficient train localization as well as long-term railway environment monitoring is the essential property towards reliability, availability, maintainability, and safety (RAMS) engineering for railroad systems. Simultaneous localization and mapping (SLAM) is right at the core of solving the two problems concurrently. In this end, we propose a high-performance and versatile m… ▽ More Precise, seamless, and efficient train localization as well as long-term railway environment monitoring is the essential property towards reliability, availability, maintainability, and safety (RAMS) engineering for railroad systems. Simultaneous localization and mapping (SLAM) is right at the core of solving the two problems concurrently. In this end, we propose a high-performance and versatile multi-modal framework in this paper, targeted for the odometry and mapping task for various rail vehicles. Our system is built atop an inertial-centric state estimator that tightly couples light detection and ranging (LiDAR), visual, optionally satellite navigation and map-based localization information with the convenience and extendibility of loosely coupled methods. The inertial sensors IMU and wheel encoder are treated as the primary sensor, which achieves the observations from subsystems to constrain the accelerometer and gyroscope biases. Compared to point-only LiDAR-inertial methods, our approach leverages more geometry information by introducing both track plane and electric power pillars into state estimation. The Visual-inertial subsystem also utilizes the environmental structure information by employing both lines and points. Besides, the method is capable of handling sensor failures by automatic reconfiguration bypassing failure modules. Our proposed method has been extensively tested in the long-during railway environments over four years, including general-speed, high-speed and metro, both passenger and freight traffic are investigated. Further, we aim to share, in an open way, the experience, problems, and successes of our group with the robotics community so that those that work in such environments can avoid these errors. In this view, we open source some of the datasets to benefit the research community. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11492 [pdf]

A LiDAR-Inertial SLAM Tightly-Coupled with Dropout-Tolerant GNSS Fusion for Autonomous Mine Service Vehicles

Authors: Yusheng Wang, Yidong Lou, Weiwei Song, Bing Zhan, Feihuang Xia, Qigeng Duan

Abstract: Multi-modal sensor integration has become a crucial prerequisite for the real-world navigation systems. Recent studies have reported successful deployment of such system in many fields. However, it is still challenging for navigation tasks in mine scenes due to satellite signal dropouts, degraded perception, and observation degeneracy. To solve this problem, we propose a LiDAR-inertial odometry me… ▽ More Multi-modal sensor integration has become a crucial prerequisite for the real-world navigation systems. Recent studies have reported successful deployment of such system in many fields. However, it is still challenging for navigation tasks in mine scenes due to satellite signal dropouts, degraded perception, and observation degeneracy. To solve this problem, we propose a LiDAR-inertial odometry method in this paper, utilizing both Kalman filter and graph optimization. The front-end consists of multiple parallel running LiDAR-inertial odometries, where the laser points, IMU, and wheel odometer information are tightly fused in an error-state Kalman filter. Instead of the commonly used feature points, we employ surface elements for registration. The back-end construct a pose graph and jointly optimize the pose estimation results from inertial, LiDAR odometry, and global navigation satellite system (GNSS). Since the vehicle has a long operation time inside the tunnel, the largely accumulated drift may be not fully by the GNSS measurements. We hereby leverage a loop closure based re-initialization process to achieve full alignment. In addition, the system robustness is improved through handling data loss, stream consistency, and estimation error. The experimental results show that our system has a good tolerance to the long-period degeneracy with the cooperation different LiDARs and surfel registration, achieving meter-level accuracy even for tens of minutes running during GNSS dropouts. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11422 [pdf, other]

Recommending Analogical APIs via Knowledge Graph Embedding

Authors: Mingwei Liu, Yanjun Yang, Yiling Lou, Xin Peng, Zhong Zhou, Xueying Du, Tianyong Yang

Abstract: Library migration, which re-implements the same software behavior by using a different library instead of using the current one, has been widely observed in software evolution. One essential part of library migration is to find an analogical API that could provide the same functionality as current ones. However, given the large number of libraries/APIs, manually finding an analogical API could be… ▽ More Library migration, which re-implements the same software behavior by using a different library instead of using the current one, has been widely observed in software evolution. One essential part of library migration is to find an analogical API that could provide the same functionality as current ones. However, given the large number of libraries/APIs, manually finding an analogical API could be very time-consuming and error-prone. Researchers have developed multiple automated analogical API recommendation techniques. Documentation-based methods have particularly attracted significant interest. Despite their potential, these methods have limitations, such as a lack of comprehensive semantic understanding in documentation and scalability challenges. In this work, we propose KGE4AR, a novel documentation-based approach that leverages knowledge graph (KG) embedding to recommend analogical APIs during library migration. Specifically, KGE4AR proposes a novel unified API KG to comprehensively and structurally represent three types of knowledge in documentation, which can better capture the high-level semantics. Moreover, KGE4AR then proposes to embed the unified API KG into vectors, enabling more effective and scalable similarity calculation. We build KGE4AR' s unified API KG for 35,773 Java libraries and assess it in two API recommendation scenarios: with and without target libraries. Our results show that KGE4AR substantially outperforms state-of-the-art documentation-based techniques in both evaluation scenarios in terms of all metrics (e.g., 47.1%-143.0% and 11.7%-80.6% MRR improvements in each scenario). Additionally, we explore KGE4AR' s scalability, confirming its effective scaling with the growing number of libraries. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by FSE 2023

arXiv:2308.09119 [pdf, other]

ICAR: Image-based Complementary Auto Reasoning

Authors: Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang

Abstract: Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture,… ▽ More Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture, and etc.) and complementarity (different items like table vs chair completing a group). Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation. We introduce a "Flexible Bidirectional Transformer (FBT)" consisting of an encoder with flexible masking, a category prediction arm, and an auto-regressive visual embedding prediction arm. And the inputs for FBT are cross-domain visual similarity invariant embeddings, making this framework quite generalizable. Furthermore, our proposed FBT model learns the inter-object compatibility from a large set of scene images in a self-supervised way. Compared with the SOTA methods, this approach achieves up to 5.3% and 9.6% in FITB score and 22.3% and 31.8% SFID improvement on fashion and furniture, respectively. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.04095 [pdf, ps, other]

Minimizing Quotient Regularization Model

Authors: Chao Wang, Jean-Francois Aujol, Guy Gilboa, Yifei Lou

Abstract: Quotient regularization models (QRMs) are a class of powerful regularization techniques that have gained considerable attention in recent years, due to their ability to handle complex and highly nonlinear data sets. However, the nonconvex nature of QRM poses a significant challenge in finding its optimal solution. We are interested in scenarios where both the numerator and the denominator of QRM a… ▽ More Quotient regularization models (QRMs) are a class of powerful regularization techniques that have gained considerable attention in recent years, due to their ability to handle complex and highly nonlinear data sets. However, the nonconvex nature of QRM poses a significant challenge in finding its optimal solution. We are interested in scenarios where both the numerator and the denominator of QRM are absolutely one-homogeneous functions, which is widely applicable in the fields of signal processing and image processing. In this paper, we utilize a gradient flow to minimize such QRM in combination with a quadratic data fidelity term. Our scheme involves solving a convex problem iteratively.The convergence analysis is conducted on a modified scheme in a continuous formulation, showing the convergence to a stationary point. Numerical experiments demonstrate the effectiveness of the proposed algorithm in terms of accuracy, outperforming the state-of-the-art QRM solvers. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 20 pages

MSC Class: 49N45; 65K10; 90C05; 90C26

arXiv:2308.01861 [pdf, other]

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

Authors: Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, Yiling Lou

Abstract: In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level co… ▽ More In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level code generation. Based on our results, we have the following main findings. First, we find that all existing LLMs show much worse performance on class-level code generation compared to on standalone method-level code generation benchmarks like HumanEval; and the method-level coding ability cannot equivalently reflect the class-level coding ability among LLMs. Second, we find that GPT-4 and GPT-3.5 still exhibit dominate superior than other LLMs on class-level code generation, and the second-tier models includes Instruct-Starcoder, Instruct-Codegen, and Wizardcoder with very similar performance. Third, we find that generating the entire class all at once (i.e. holistic generation strategy) is the best generation strategy only for GPT-4 and GPT-3.5, while method-by-method generation (i.e. incremental and compositional) is better strategies for the other models with limited ability of understanding long instructions and utilizing the middle information. Lastly, we find the limited model ability of generating method-dependent code and discuss the frequent error types in generated classes. Our benchmark is available at https://github.com/FudanSELab/ClassEval. △ Less

Submitted 14 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

arXiv:2308.01240 [pdf, other]

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Authors: Zhiqiang Yuan, Junwei Liu, Qiancheng Zi, Mingwei Liu, Xin Peng, Yiling Lou

Abstract: In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instr… ▽ More In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.16121 [pdf, other]

doi 10.3233/FAIA230441

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

Authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang

Abstract: Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertain… ▽ More Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: In proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023. 8 pages + 2 appendix pages

arXiv:2307.08838 [pdf, other]

doi 10.1109/IROS55552.2023.10341608

Dynamic Object Tracking for Quadruped Manipulator with Spherical Image-Based Approach

Authors: Tianlin Zhang, Sikai Guo, Xiaogang Xiong, Wanlei Li, Zezheng Qi, Yunjiang Lou

Abstract: Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based… ▽ More Exactly estimating and tracking the motion of surrounding dynamic objects is one of important tasks for the autonomy of a quadruped manipulator. However, with only an onboard RGB camera, it is still a challenging work for a quadruped manipulator to track the motion of a dynamic object moving with unknown and changing velocities. To address this problem, this manuscript proposes a novel image-based visual servoing (IBVS) approach consisting of three elements: a spherical projection model, a robust super-twisting observer, and a model predictive controller (MPC). The spherical projection model decouples the visual error of the dynamic target into linear and angular ones. Then, with the presence of the visual error, the robustness of the observer is exploited to estimate the unknown and changing velocities of the dynamic target without depth estimation. Finally, the estimated velocity is fed into the model predictive controller (MPC) to generate joint torques for the quadruped manipulator to track the motion of the dynamical target. The proposed approach is validated through hardware experiments and the experimental results illustrate the approach's effectiveness in improving the autonomy of the quadruped manipulator. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 727-734

arXiv:2307.06729 [pdf, ps, other]

Landscape of wave focusing and localisation at low frequencies

Authors: Bryn Davies, Yiqi Lou

Abstract: High-contrast scattering problems are special among classical wave systems as they allow for strong wave focusing and localisation at low frequencies. We use an asymptotic framework to develop a landscape theory for high-contrast systems that resonate in a subwavelength regime. Our from-first-principles asymptotic analysis yields a characterisation in terms of the generalised capacitance matrix, g… ▽ More High-contrast scattering problems are special among classical wave systems as they allow for strong wave focusing and localisation at low frequencies. We use an asymptotic framework to develop a landscape theory for high-contrast systems that resonate in a subwavelength regime. Our from-first-principles asymptotic analysis yields a characterisation in terms of the generalised capacitance matrix, giving a discrete approximation of the three-dimensional scattering problem. We develop landscape theory for the generalised capacitance matrix and use it to predict the positions of three-dimensional wave focusing and localisation in random and non-periodic systems of subwavelength resonators. △ Less

Submitted 31 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.00439 [pdf, other]

Weighted Anisotropic-Isotropic Total Variation for Poisson Denoising

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating… ▽ More Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating the weighted anisotropic-isotropic total variation (AITV) as a regularization. We then develop an alternating direction method of multipliers with a combination of a proximal operator for an efficient implementation. Lastly, numerical experiments demonstrate that our algorithm outperforms other Poisson denoising methods in terms of image quality and computational efficiency. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: accepted to ICIP 2023

arXiv:2306.03174 [pdf, other]

doi 10.1145/3528223.3530162

Computational Design of Passive Grippers

Authors: Milin Kodnongbua, Ian Good Yu Lou, Jeffrey Lipton, Adriana Schulz

Abstract: This work proposes a novel generative design tool for passive grippers -- robot end effectors that have no additional actuation and instead leverage the existing degrees of freedom in a robotic arm to perform grasping tasks. Passive grippers are used because they offer interesting trade-offs between cost and capabilities. However, existing designs are limited in the types of shapes that can be gra… ▽ More This work proposes a novel generative design tool for passive grippers -- robot end effectors that have no additional actuation and instead leverage the existing degrees of freedom in a robotic arm to perform grasping tasks. Passive grippers are used because they offer interesting trade-offs between cost and capabilities. However, existing designs are limited in the types of shapes that can be grasped. This work proposes to use rapid-manufacturing and design optimization to expand the space of shapes that can be passively grasped. Our novel generative design algorithm takes in an object and its positioning with respect to a robotic arm and generates a 3D printable passive gripper that can stably pick the object up. To achieve this, we address the key challenge of jointly optimizing the shape and the insert trajectory to ensure a passively stable grasp. We evaluate our method on a testing suite of 22 objects (23 experiments), all of which were evaluated with physical experiments to bridge the virtual-to-real gap. Code and data are at https://homes.cs.washington.edu/~milink/passive-gripper/ △ Less

Submitted 5 June, 2023; originally announced June 2023.

Journal ref: ACM Transactions on Graphics, Volume 41, Issue 4, July 2022, Article No.: 149, pp 2-12

arXiv:2306.00772 [pdf, other]

Manipulating spatial structure of high-order quantum coherence with entangled photons

Authors: Shuang-Yin Huang, Jing Gao, Zhi-Cheng Ren, Zi-Mo Cheng, Wen-Zheng Zhu, Shu-Tian Xue, Yan-Chao Lou, Zhi-Feng Liu, Chao Chen, Fei Zhu, Li-Ping Yang, Xi-Lin Wang, Hui-Tian Wang

Abstract: High-order quantum coherence reveals the statistical correlation of quantum particles. Manipulation of quantum coherence of light in temporal domain enables to produce single-photon source, which has become one of the most important quantum resources. High-order quantum coherence in spatial domain plays a crucial role in a variety of applications, such as quantum imaging, holography and microscopy… ▽ More High-order quantum coherence reveals the statistical correlation of quantum particles. Manipulation of quantum coherence of light in temporal domain enables to produce single-photon source, which has become one of the most important quantum resources. High-order quantum coherence in spatial domain plays a crucial role in a variety of applications, such as quantum imaging, holography and microscopy. However, the active control of high-order spatial quantum coherence remains a challenging task. Here we predict theoretically and demonstrate experimentally the first active manipulation of high-order spatial quantum coherence by mapping the entanglement of spatially structured photons. Our results not only enable to inject new strength into current applications, but also provide new possibilities towards more wide applications of high-order quantum coherence. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 11 pages, 5 figures

Showing 1–50 of 313 results for author: Lou, Y