subscribe to arXiv mailings

Supernova Pointing Capabilities of DUNE

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr… ▽ More The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 25 pages, 16 figures

Report number: FERMILAB-PUB-24-0319-LBNF

arXiv:2407.07805 [pdf, other]

SUMix: Mixup with Semantic and Uncertain Information

Authors: Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounîm A. El-Yacoubi, Xinbo Gao

Abstract: Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combined by a fixed ratio $λ$ by l. The objects in two i… ▽ More Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combined by a fixed ratio $λ$ by l. The objects in two images may be overlapped during the mixing process, so some semantic information is corrupted in the mixed samples. In this case, the mixed image does not match the mixed label information. Besides, such a label may mislead the deep learning model training, which results in poor performance. To solve this problem, we proposed a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process. First, we design a learnable similarity function to compute an accurate mix ratio. Second, an approach is investigated as a regularized term to model the uncertainty of the mixed samples. We conduct experiments on five image benchmarks, and extensive experimental results imply that our method is capable of improving the performance of classifiers with different cutting-based mixup approaches. The source code is available at https://github.com/JinXins/SUMix. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV2024 [Camera Ready] (16 pages, 5 figures) with the source code at https://github.com/JinXins/SUMix

arXiv:2407.07020 [pdf, other]

Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Chunlin Tian, Yuming Huang, Zilin Bian, Kaiqun Zhu, Guofa Li, Ziyuan Pu, Jia Hu, Zhiyong Cui, Chengzhong Xu

Abstract: Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an… ▽ More Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an adaptive visual sector, mimics the dynamic allocation of attention human drivers exhibit based on factors like spatial orientation, proximity, and driving speed. On the other hand, the "student" model focuses on real-time interaction and human decision-making, drawing parallels to the human memory storage mechanism. Furthermore, we improve the model's efficiency by introducing a new Fourier Adaptive Spike Neural Network (FA-SNN), allowing for faster and more precise predictions with fewer parameters. Evaluated using the NGSIM, HighD, and MoCAD benchmarks, HLTP++ demonstrates superior performance compared to existing models, which reduces the predicted trajectory error with over 11% on the NGSIM dataset and 25% on the HighD datasets. Moreover, HLTP++ demonstrates strong adaptability in challenging environments with incomplete input data. This marks a significant stride in the journey towards fully AD systems. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.19251

arXiv:2407.05554 [pdf, other]

PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization

Authors: Qingyao Tian, Zhen Chen, Huai Liao, Xinyan Huang, Bingyu Yang, Lujie Li, Hongbin Liu

Abstract: Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic… ▽ More Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic Airway Navigation System (PANS), leveraging Monte-Carlo method with pose hypotheses and likelihoods to achieve robust and real-time bronchoscope localization. Specifically, our PANS incorporates diverse visual representations (\textit{e.g.}, odometry and landmarks) by leveraging two key modules, including the Depth-based Motion Inference (DMI) and the Bronchial Semantic Analysis (BSA). To generate the pose hypotheses of bronchoscope for PANS, we devise the DMI to accurately propagate the estimation of pose hypotheses over time. Moreover, to estimate the accurate pose likelihood, we devise the BSA module by effectively distinguishing between similar bronchial regions in endoscopic images, along with a novel metric to assess the congruence between estimated depth maps and the segmented airway structure. Under this probabilistic formulation, our PANS is capable of achieving the 6-DOF bronchoscope localization with superior accuracy and robustness. Extensive experiments on the collected pulmonary intervention dataset comprising 10 clinical cases confirm the advantage of our PANS over state-of-the-arts, in terms of both robustness and generalization in localizing deeper airway branches and the efficiency of real-time inference. The proposed PANS reveals its potential to be a reliable tool in the operating room, promising to enhance the quality and safety of pulmonary interventions. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.04206 [pdf, other]

Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parameter sensitivity analysis is complex and inefficient. Inspired by differentiable programming and leveraging the ecosystem benefits of open-source software, we propose an equations system constructor using the computational graph representation, along with its JSON format netlist, to address these limitations. This representation allows for runtime dependencies between signals and subcircuit/device parameters. The proposed method streamlines the model development process and facilitates end-to-end computation of gradients of equations remainders with respect to parameters. This paper discusses in detail the overarching concept of hierarchical subcircuit/device decomposition and nested invocation by drawing parallels to functions in programming languages, and introduces rules for parameters passing and gradient propagation across hierarchical circuit modules. The presented numerical examples, including (1) an uncoupled CMOS model representation using "equivalent circuit decomposition+dynamic parameters" and (2) operational amplifier (OpAmp) auto device sizing, have demonstrated that the proposed method supports circuit simulation and design and particularly subcircuit modeling with improved efficiency, simplicity, and decoupling compared to existing techniques. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.19660 [pdf, other]

Chow rings and augmented Chow rings of uniform matroids and their $q$-analogs

Authors: Hsin-Chieh Liao

Abstract: We study the natural representations of $\mathfrak{S}_n$ and $GL_n(\mathbb{F}_q)$ on the (augmented) Chow rings of uniform matroids and $q$-uniform matroids. The Frobenius series for uniform matroids and their $q$-analogs are computed. As a byproduct, we recover Hameister, Rao, and Simpson's formula of Hilbert series of Chow rings of $q$-uniform matroids in terms of permutations and further obtain… ▽ More We study the natural representations of $\mathfrak{S}_n$ and $GL_n(\mathbb{F}_q)$ on the (augmented) Chow rings of uniform matroids and $q$-uniform matroids. The Frobenius series for uniform matroids and their $q$-analogs are computed. As a byproduct, we recover Hameister, Rao, and Simpson's formula of Hilbert series of Chow rings of $q$-uniform matroids in terms of permutations and further obtain their augmented counterpart in terms of decorated permutations. We also show that equivariant Charney-Davis quantities of (augmented) Chow rings of general matroids are nonnegative (i.e. genuine representations of the automorphism group of the matroid). When the matroid is a uniform matroid, the representations either vanish or are Specht modules of some skew hook shapes. When descending to the usual Charney-Davis quantities, we obtain an elegant combinatorial interpretation of Hameister, Rao, and Simpson's formula for Chow rings of $q$-uniform matroids and its augmented counterpart. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 36 pages, 3 figures

MSC Class: 05B35; 05E18; 05E05; 05E14

arXiv:2406.18807 [pdf, other]

ML-Powered FPGA-based Real-Time Quantum State Discrimination Enabling Mid-circuit Measurements

Authors: Neel R. Vora, Yilun Xu, Akel Hashim, Neelay Fruitwala, Ho Nam Nguyen, Haoran Liao, Jan Balewski, Abhi Rajagopala, Kasra Nowrouzi, Qing Ji, K. Birgitta Whaley, Irfan Siddiqi, Phuc Nguyen, Gang Huang

Abstract: Similar to reading the transistor state in classical computers, identifying the quantum bit (qubit) state is a fundamental operation to translate quantum information. However, identifying quantum state has been the slowest and most susceptible to errors operation on superconducting quantum processors. Most existing state discrimination algorithms have only been implemented and optimized "after the… ▽ More Similar to reading the transistor state in classical computers, identifying the quantum bit (qubit) state is a fundamental operation to translate quantum information. However, identifying quantum state has been the slowest and most susceptible to errors operation on superconducting quantum processors. Most existing state discrimination algorithms have only been implemented and optimized "after the fact" - using offline data transferred from control circuits to host computers. Real-time state discrimination is not possible because a superconducting quantum state only survives for a few hundred us, which is much shorter than the communication delay between the readout circuit and the host computer (i.e., tens of ms). Mid-circuit measurement (MCM), where measurements are conducted on qubits at intermediate stages within a quantum circuit rather than solely at the end, represents an advanced technique for qubit reuse. For MCM necessitating single-shot readout, it is imperative to employ an in-situ technique for state discrimination with low latency and high accuracy. This paper introduces QubiCML, a field-programmable gate array (FPGA) based system for real-time state discrimination enabling MCM - the ability to measure the state at the control circuit before/without transferring data to a host computer. A multi-layer neural network has been designed and deployed on an FPGA to ensure accurate in-situ state discrimination. For the first time, ML-powered quantum state discrimination has been implemented on a radio frequency system-on-chip FPGA platform. The deployed lightweight network on the FPGA only takes 54 ns to complete each inference. We evaluated QubiCML's performance on superconducting quantum processors and obtained an average accuracy of 98.5% with only 500 ns readout. QubiCML has the potential to be the standard real-time state discrimination method for the quantum community. △ Less

Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.16987 [pdf]

AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases

Authors: Gyanna Gao, Hao-Yu Liao, Zhenhong Hu

Abstract: Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training s… ▽ More Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training systems exist, they are often tailored for professionals and are prohibitively expensive. The present study aims to classify tennis players' skill levels and classify tennis strokes into phases characterized by motion attributes for a future development of an AI-based tennis self-training model for affordable and convenient applications running on devices used in daily life such as an iPhone or an Apple Watch for tennis skill improvement. We collected motion data, including Motion Yaw, Roll and Pitch from inertial measurement units (IMUs) worn by participating junior tennis players. For this pilot study, data from twelve participants were processed using Support Vector Machine (SVM) algorithms. The SVM models demonstrated an overall accuracy of 77% in classifying players as beginners or intermediates, with low rates of false positives and false negatives, effectively distinguishing skill levels. Additionally, the tennis swings were successfully classified into five phases based on the collected motion data. These findings indicate that SVM-based classification can be a reliable foundation for developing an equitable and accessible AI-driven tennis training system. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 21 pages, 9 figures, 1 table

arXiv:2406.16210 [pdf, ps, other]

Received Power Maximization Using Nonuniform Discrete Phase Shifts for RISs With a Limited Phase Range

Authors: Dogan Kutay Pekcan, Hongyi Liao, Ender Ayanoglu

Abstract: To maximize the received power at a user equipment, the problem of optimizing a reconfigurable intelligent surface (RIS) with a limited phase range R and nonuniform discrete phase shifts with adjustable gains is addressed. Necessary and sufficient conditions to achieve this maximization are given. These conditions are employed in two algorithms to achieve the global optimum in linear time for R {\… ▽ More To maximize the received power at a user equipment, the problem of optimizing a reconfigurable intelligent surface (RIS) with a limited phase range R and nonuniform discrete phase shifts with adjustable gains is addressed. Necessary and sufficient conditions to achieve this maximization are given. These conditions are employed in two algorithms to achieve the global optimum in linear time for R {\ge} π and R < π, where R is the RIS phase range. With a total number of N(2K +1) complex vector additions, it is shown for R {\ge} π and R < π that the global optimality is achieved in NK or fewer and N(K +1) or fewer steps, respectively, where N is the number of RIS elements and K is the number of discrete phase shifts which may be placed nonuniformly over the phase range R. In addition, we define an intuitive quantization algorithm that we call the nonuniform polar quantization (NPQ) algorithm. With NPQ, we provide a closed form solution for the approximation ratio with which an arbitrary set of nonuniform discrete phase shifts can approximate the continuous solution. We also show that with a phase range limitation, equal separation among the nonuniform discrete phase shifts maximizes the normalized performance. Furthermore, we show that the gain of using K {\ge} 3 with R < π/2 and K {\ge} 4 with R < π is only marginal. Finally, we reveal that when R < 2π/3, ON/OFF selection for the RIS elements brings significant performance compared to the case when the RIS elements are strictly ON. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 27 pages, 19 figures

arXiv:2406.12382 [pdf, other]

From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

Authors: Huanxuan Liao, Yao Xu, Shizhu He, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao

Abstract: Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills… ▽ More Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills and complete tasks not merely through repeated practice but also by understanding and following instructional guidelines. This paper is dedicated to simulating human learning to address the shortcomings of instance training, focusing on instruction learning to enhance cross-task generalization. Within this context, we introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model in a parameter generation manner based on the given task instructions without retraining for unseen tasks. Specifically, we utilize knowledge distillation to enhance the consistency between TAGI developed through Learning with Instruction and task-specific models developed through Training with Instance, by aligning the labels, output logits, and adapter parameters between them. TAGI is endowed with cross-task generalization capabilities through a two-stage training process that includes hypernetwork pretraining and finetuning. We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11937 [pdf, other]

Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Prepared for submission to JINST

arXiv:2406.09944 [pdf, other]

Universal scaling behavior of resistivity under two-dimensional superconducting phase fluctuations

Authors: Zongsheng Zhou, Kang Wang, Hai-Jun Liao, Zi-Xiang Li, Tao Xiang

Abstract: In superconductors with relatively low superfluid density, such as cuprate high-$T_c$ superconductors, the phase fluctuations of the superconducting order parameter are remarkable, presumably playing a nonnegligible role in shaping many distinctive physical properties. This work systematically investigates the electrical transport properties arising from thermal superconducting phase fluctuations… ▽ More In superconductors with relatively low superfluid density, such as cuprate high-$T_c$ superconductors, the phase fluctuations of the superconducting order parameter are remarkable, presumably playing a nonnegligible role in shaping many distinctive physical properties. This work systematically investigates the electrical transport properties arising from thermal superconducting phase fluctuations in two-dimensional superconductors. Employing the Monte Carlo procedure, we access the numerically exact properties of a microscopic model of superconductivity, in which the classical XY model governs the thermal phase fluctuations of the superconducting order parameter. For both $s$-wave and $d_{x^2-y^2}$-wave pairings, the electrical resistivity exhibits a universal scaling behavior in the temperature range above $T_c$. Our numerical results demonstrate that the scaling behavior of the quasiparticle lifetime is associated with the correlation length of the superconducting order parameter, yielding the universal scaling behavior of electrical resistivity determined by the Berezinskii-Kosterlitz-Thouless critical scaling of the correlation length. Furthermore, we discuss the dependence of the electrical resistivity coefficient on the pairing amplitude and the possible implication on recent transport experiments. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09808 [pdf, ps, other]

Uniform property $Γ$ and the small boundary property

Authors: Grigoris Kopsacheilis, Hung-Chang Liao, Aaron Tikuisis, Andrea Vaccaro

Abstract: We prove that, for an action $α\colon G \curvearrowright X$ of a countably infinite discrete amenable group on a compact metric space, the small boundary property is implied by uniform property $Γ$ of the Cartan subalgebra $(C(X) \subseteq C(X) \rtimes_αG)$. The reverse implication has been demonstrated by Kerr and Szabó for free actions, from which we obtain the equivalence of the two conditions… ▽ More We prove that, for an action $α\colon G \curvearrowright X$ of a countably infinite discrete amenable group on a compact metric space, the small boundary property is implied by uniform property $Γ$ of the Cartan subalgebra $(C(X) \subseteq C(X) \rtimes_αG)$. The reverse implication has been demonstrated by Kerr and Szabó for free actions, from which we obtain the equivalence of the two conditions in the free case. We moreover show that, if $α$ is free and minimal, then almost finiteness of $α$ is implied by tracial $\mathcal{Z}$-stability of the subalgebra $(C(X) \subseteq C(X) \rtimes_αG)$. The reverse implication is due to Kerr, resulting in the equivalence of these two properties as well. As an application, we prove that if $α\colon G \curvearrowright X$ and $β\colon H \curvearrowright Y$ are free actions and $α$ has the small boundary property, then $α\times β\colon G \times H \curvearrowright X \times Y$ has the small boundary property. An analogous permanence property is obtained for almost finiteness in case $α$ and $β$ are free minimal actions. △ Less

Submitted 21 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: 23 pages; added remark that the implication (uniform property Gamma of the pair => SBP of the action) does not require freeness of the action

arXiv:2405.18525 [pdf, other]

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment

Authors: Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li

Abstract: Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities. To address this challenge, we present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3… ▽ More Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities. To address this challenge, we present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models; then, it optimizes the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition. By integrating optimal transport-based long-range appearance loss term and high-level semantic loss term in the differentiable rendering, REPARO can effectively recover the layout of 3D assets. The proposed method can significantly enhance object independence, detail accuracy, and overall scene coherence. Extensive evaluation of multi-object scenes demonstrates that our REPARO offers a comprehensive approach to address the complexities of multi-object 3D scene generation from single images. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.12721 [pdf, other]

StarLKNet: Star Mixup with Large Kernel Networks for Palm Vein Identification

Authors: Xin Jin, Hongyu Zhu, Mounîm A. El Yacoubi, Hongchao Liao, Huafeng Qin, Yun Jiang

Abstract: As a representative of a new generation of biometrics, vein identification technology offers a high level of security and convenience. Convolutional neural networks (CNNs), a prominent class of deep learning architectures, have been extensively utilized for vein identification. Since their performance and robustness are limited by small Effective Receptive Fields (e.g. 3$\times$3 kernels) and insu… ▽ More As a representative of a new generation of biometrics, vein identification technology offers a high level of security and convenience. Convolutional neural networks (CNNs), a prominent class of deep learning architectures, have been extensively utilized for vein identification. Since their performance and robustness are limited by small Effective Receptive Fields (e.g. 3$\times$3 kernels) and insufficient training samples, however, they are unable to extract global feature representations from vein images in an effective manner. To address these issues, we propose StarLKNet, a large kernel convolution-based palm-vein identification network, with the Mixup approach. Our StarMix learns effectively the distribution of vein features to expand samples. To enable CNNs to capture comprehensive feature representations from palm-vein images, we explored the effect of convolutional kernel size on the performance of palm-vein identification networks and designed LaKNet, a network leveraging large kernel convolution and gating mechanism. In light of the current state of knowledge, this represents an inaugural instance of the deployment of a CNN with large kernels in the domain of vein identification. Extensive experiments were conducted to validate the performance of StarLKNet on two public palm-vein datasets. The results demonstrated that StarMix provided superior augmentation, and LakNet exhibited more stable performance gains compared to mainstream approaches, resulting in the highest recognition accuracy and lowest identification error. △ Less

Submitted 16 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: 7 pages, 6 figures

arXiv:2405.09132 [pdf, other]

EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary Lifting

Authors: Yilei Zhang, Haoyu Liao, Zekun Wang, Bo Huang, Jianmei Guo

Abstract: Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in stat… ▽ More Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries. This paper presents EFACT, an External Function Auto-Completion Tool for static binary lifting. Our EXF recovery algorithm better recovers the prototypes of mangled EXFs, particularly addressing the template specialization mechanism in C++. EFACT is designed as a lightweight plugin to strengthen other static binary rewriting frameworks in EXFC. Our evaluation shows that EFACT outperforms RetDec and McSema in mangled EXF recovery by 96.4% and 97.3% on SPEC CPU 2017. Furthermore, we delve deeper into static binary translation and address several cross-ISA EXFC problems. When integrated with McSema, EFACT correctly translates 36.7% more benchmarks from x86-64 to x86-64 and 93.6% more from x86-64 to AArch64 than McSema alone on EEMBC. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.04489 [pdf, other]

S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling

Authors: Minh Tran, Adrian De Luis, Haitao Liao, Ying Huang, Roy McCann, Alan Mantooth, Jack Cothren, Ngan Le

Abstract: As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate mapping of PV installations is crucial for understanding the extension of its… ▽ More As the impact of climate change escalates, the global necessity to transition to sustainable energy sources becomes increasingly evident. Renewable energies have emerged as a viable solution for users, with Photovoltaic energy being a favored choice for small installations due to its reliability and efficiency. Accurate mapping of PV installations is crucial for understanding the extension of its adoption and informing energy policy. To meet this need, we introduce S3Former, designed to segment solar panels from aerial imagery and provide size and location information critical for analyzing the impact of such installations on the grid. Solar panel identification is challenging due to factors such as varying weather conditions, roof characteristics, Ground Sampling Distance variations and lack of appropriate initialization weights for optimized training. To tackle these complexities, S3Former features a Masked Attention Mask Transformer incorporating a self-supervised learning pretrained backbone. Specifically, our model leverages low-level and high-level features extracted from the backbone and incorporates an instance query mechanism incorporated on the Transformer architecture to enhance the localization of solar PV installations. We introduce a self-supervised learning phase (pretext task) to improve the initialization weights on the backbone of S3Former. We evaluated S3Former using diverse datasets, demonstrate improvement state-of-the-art models. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.02145 [pdf, other]

Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving

Authors: Haicheng Liao, Xuelin Li, Yongkang Li, Hanlin Kong, Chengyue Wang, Bonan Wang, Yanchen Guan, KaHou Tam, Zhenning Li, Chengzhong Xu

Abstract: Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module… ▽ More Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on the Next Generation Simulation (NGSIM), Highway Drone (HighD), and Macao Connected Autonomous Driving (MoCAD) datasets across both short and extended temporal spans. This performance underscores the model's unparalleled adaptability and efficacy in navigating complex traffic scenarios, including highways, urban streets, and intersections. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.01266 [pdf, other]

MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dongping Liao, Guofa Li, Chengzhong Xu

Abstract: This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph conv… ▽ More This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph convolutional network captures both positional and behavioral features of road users, preserving spatial-temporal intricacies. Enhanced by a linear attention mechanism, the model achieves computational efficiency and reduced parameter overhead. Evaluations on the Argoverse, NGSIM, HighD, and MoCAD datasets underscore MFTraj's robustness and adaptability, outperforming numerous benchmarks even in data-challenged scenarios without the need for additional information such as HD maps or vectorized maps. Importantly, it maintains competitive performance even in scenarios with substantial missing data, on par with most existing state-of-the-art models. The results and methodology suggest a significant advancement in autonomous driving trajectory prediction, paving the way for safer and more efficient autonomous systems. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.17873 [pdf]

Bacterial stress granule protects mRNA through ribonucleases exclusion

Authors: Linsen Pei, Yujia Xian, Xiaodan Yan, Charley Schaefer, Aisha H. Syeda, Jamieson Howard, Hebin Liao, Fan Bai, Mark C. Leake, Yingying Pu

Abstract: Membraneless droplets formed through liquid-liquid phase separation (LLPS) play a crucial role in mRNA storage, enabling organisms to swiftly respond to environmental changes. However, the mechanisms underlying mRNA integration and protection within droplets remain unclear. Here, we unravel the role of bacterial aggresomes as stress granules (SGs) in safeguarding mRNA during stress. We discovered… ▽ More Membraneless droplets formed through liquid-liquid phase separation (LLPS) play a crucial role in mRNA storage, enabling organisms to swiftly respond to environmental changes. However, the mechanisms underlying mRNA integration and protection within droplets remain unclear. Here, we unravel the role of bacterial aggresomes as stress granules (SGs) in safeguarding mRNA during stress. We discovered that upon stress onset, mobile mRNA molecules selectively incorporate into individual proteinaceous SGs based on length-dependent enthalpic gain over entropic loss. As stress prolongs, SGs undergo compaction facilitated by stronger non-specific RNA-protein interactions, thereby promoting recruitment of shorter RNA chains. Remarkably, mRNA ribonucleases are repelled from bacterial SGs, due to the influence of protein surface charge. This exclusion mechanism ensures the integrity and preservation of mRNA within SGs during stress conditions, explaining how mRNA can be stored and protected from degradation. Following stress removal, SGs facilitate mRNA translation, thereby enhancing cell fitness in changing environments. These droplets maintain mRNA physiological activity during storage, making them an intriguing new candidate for mRNA therapeutics manufacturing. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17520 [pdf, other]

A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

Abstract: As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traff… ▽ More As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.14893 [pdf, other]

Average energy dissipation rates of explicit exponential Runge-Kutta methods for gradient flow problems

Authors: Hong-lin Liao, Xuping Wang

Abstract: We propose a unified theoretical framework to examine the energy dissipation properties at all stages of explicit exponential Runge-Kutta (EERK) methods for gradient flow problems. The main part of the novel framework is to construct the differential form of EERK method by using the difference coefficients of method and the so-called discrete orthogonal convolution kernels. As the main result, we… ▽ More We propose a unified theoretical framework to examine the energy dissipation properties at all stages of explicit exponential Runge-Kutta (EERK) methods for gradient flow problems. The main part of the novel framework is to construct the differential form of EERK method by using the difference coefficients of method and the so-called discrete orthogonal convolution kernels. As the main result, we prove that an EERK method can preserve the original energy dissipation law unconditionally if the associated differentiation matrix is positive semi-definite. A simple indicator, namely average dissipation rate, is also introduced for these multi-stage methods to evaluate the overall energy dissipation rate of an EERK method such that one can choose proper parameters in some parameterized EERK methods or compare different kinds of EERK methods. Some existing EERK methods in the literature are evaluated from the perspective of preserving the original energy dissipation law and the energy dissipation rate. Some numerical examples are also included to support our theory. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 35 pages, 44 figures

MSC Class: 35K58; 65L20; 65M06; 65M12

arXiv:2404.14163 [pdf, other]

Dynamical Spectra of Spin Supersolid States in Triangular Antiferromagnets

Authors: Runze Chi, Jiahang Hu, Hai-Jun Liao, T. Xiang

Abstract: We employ tensor network renormalization to explore the dynamical spectra of the easy-axis triangular-lattice antiferromagnet (TLAF) in a magnetic field. Our analysis identifies two distinct low-energy magnon excitations: a gapless Goldstone mode and a gapped mode. At zero field, the spectra display two nearly degenerate roton modes near the M point. With the increase of the magnetic field within… ▽ More We employ tensor network renormalization to explore the dynamical spectra of the easy-axis triangular-lattice antiferromagnet (TLAF) in a magnetic field. Our analysis identifies two distinct low-energy magnon excitations: a gapless Goldstone mode and a gapped mode. At zero field, the spectra display two nearly degenerate roton modes near the M point. With the increase of the magnetic field within the Y-shape superfluid phase, these modes diverge, with the roton excitation vanishing from the Goldstone mode branch, suggesting that the roton dip in this mode may just result from the energy-level repulsion imposed by the roton excitation in the gapped mode. Moreover, the in-plane spectral function shows substantial weight in high energies in the same spin excitation channel where the low-energy roton excitation appears. However, these roton excitations are absent in the V-shape supersolid phase. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.05185 [pdf, other]

Convergence analysis of controlled particle systems arising in deep learning: from finite to infinite sample size

Authors: Huafu Liao, Alpár R. Mészáros, Chenchen Mou, Chao Zhou

Abstract: This paper deals with a class of neural SDEs and studies the limiting behavior of the associated sampled optimal control problems as the sample size grows to infinity. The neural SDEs with N samples can be linked to the N-particle systems with centralized control. We analyze the Hamilton--Jacobi--Bellman equation corresponding to the N-particle system and establish regularity results which are uni… ▽ More This paper deals with a class of neural SDEs and studies the limiting behavior of the associated sampled optimal control problems as the sample size grows to infinity. The neural SDEs with N samples can be linked to the N-particle systems with centralized control. We analyze the Hamilton--Jacobi--Bellman equation corresponding to the N-particle system and establish regularity results which are uniform in N. The uniform regularity estimates are obtained by the stochastic maximum principle and the analysis of a backward stochastic Riccati equation. Using these uniform regularity results, we show the convergence of the minima of objective functionals and optimal parameters of the neural SDEs as the sample size N tends to infinity. The limiting objects can be identified with suitable functions defined on the Wasserstein space of Borel probability measures. Furthermore, quantitative algebraic convergence rates are also obtained. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 45 pages, 2 figures

MSC Class: 49N80; 65C35; 49L12; 62M45

arXiv:2403.18270 [pdf, other]

Image Deraining via Self-supervised Reinforcement Learning

Authors: He-Hao Liao, Yan-Tsung Peng, Wen-Tao Chu, Ping-Chun Hsieh, Chung-Chi Tsai

Abstract: The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from… ▽ More The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from the input rain image via dictionary learning and use pixel-wise RL agents to take multiple inpainting actions to remove rain progressively. To our knowledge, this work is the first attempt where self-supervised RL is applied to image deraining. Experimental results on several benchmark image-deraining datasets show that the proposed SRL-Derain performs favorably against state-of-the-art few-shot and self-supervised deraining and denoising methods. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.15474 [pdf, other]

EC-IoU: Orienting Safety for Object Detectors via Ego-Centric Intersection-over-Union

Authors: Brian Hsuan-Cheng Liao, Chih-Hong Cheng, Hasan Esen, Alois Knoll

Abstract: This paper presents safety-oriented object detection via a novel Ego-Centric Intersection-over-Union (EC-IoU) measure, addressing practical concerns when applying state-of-the-art learning-based perception models in safety-critical domains such as autonomous driving. Concretely, we propose a weighting mechanism to refine the widely used IoU measure, allowing it to assign a higher score to a predic… ▽ More This paper presents safety-oriented object detection via a novel Ego-Centric Intersection-over-Union (EC-IoU) measure, addressing practical concerns when applying state-of-the-art learning-based perception models in safety-critical domains such as autonomous driving. Concretely, we propose a weighting mechanism to refine the widely used IoU measure, allowing it to assign a higher score to a prediction that covers closer points of a ground-truth object from the ego agent's perspective. The proposed EC-IoU measure can be used in typical evaluation processes to select object detectors with higher safety-related performance for downstream tasks. It can also be integrated into common loss functions for model fine-tuning. While geared towards safety, our experiment with the KITTI dataset demonstrates the performance of a model trained on EC-IoU can be better than that of a variant trained on IoU in terms of mean Average Precision as well. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 8 pages (IEEE double column format), 7 figures, 2 tables, submitted to IROS 2024

arXiv:2403.15268 [pdf, other]

Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models

Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Kang Liu, Shengping Liu, Jun Zhao

Abstract: Retrieval-Augmented-Generation and Gener-ation-Augmented-Generation have been proposed to enhance the knowledge required for question answering over Large Language Models (LLMs). However, the former relies on external resources, and both require incorporating explicit documents into the context, which increases execution costs and susceptibility to noise data. Recent works indicate that LLMs have… ▽ More Retrieval-Augmented-Generation and Gener-ation-Augmented-Generation have been proposed to enhance the knowledge required for question answering over Large Language Models (LLMs). However, the former relies on external resources, and both require incorporating explicit documents into the context, which increases execution costs and susceptibility to noise data. Recent works indicate that LLMs have modeled rich knowledge, albeit not effectively triggered or awakened. Inspired by this, we propose a novel knowledge-augmented framework, Imagination-Augmented-Generation (IAG), which simulates the human capacity to compensate for knowledge deficits while answering questions solely through imagination, thereby awakening relevant knowledge in LLMs without relying on external resources. Guided by IAG, we propose an imagine richer context method for question answering (IMcQA). IMcQA consists of two modules: explicit imagination, which generates a short dummy document by learning from long context compression, and implicit imagination, which creates flexible adapters by distilling from a teacher model with a long context. Experimental results on three datasets demonstrate that IMcQA exhibits significant advantages in both open-domain and closed-book settings, as well as in out-of-distribution generalization. Our code will be available at https://github.com/Xnhyacinth/IAG. △ Less

Submitted 18 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12141 [pdf, other]

Fractionalization Signatures in the Dynamics of Quantum Spin Liquids

Authors: Kang Wang, Shi Feng, Penghao Zhu, Runze Chi, Hai-Jun Liao, Nandini Trivedi, Tao Xiang

Abstract: We investigate the signatures of fractionalization in quantum spin liquids by studying different phases of the Kitaev honeycomb model in the presence of an out-of-plane magnetic field through which the model becomes non-integrable. Using the infinite Projected Entangled Pair States (iPEPS) ansatz, along with analytical calculations and exact diagonalization, we calculate dynamical signatures of fr… ▽ More We investigate the signatures of fractionalization in quantum spin liquids by studying different phases of the Kitaev honeycomb model in the presence of an out-of-plane magnetic field through which the model becomes non-integrable. Using the infinite Projected Entangled Pair States (iPEPS) ansatz, along with analytical calculations and exact diagonalization, we calculate dynamical signatures of fractionalized particles through spin-spin and dimer-dimer correlations. Our analysis demonstrates the ability of these correlations to discern distinct fractionalized quantum sectors, namely Majorana fermions and the emergent $Z_2$ fluxes, in both the chiral spin liquid (CSL) phase under weak field and the emergent intermediate gapless phase (IGP) under moderate field. Importantly, our calculation reveals the nature of IGP observed at moderate fields, a region of ongoing debate, indicating that this phase is a Majorana metal induced by strong flux fluctuations. △ Less

Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 5+8 pages, 4+9 figures

arXiv:2403.10577 [pdf, other]

Stembridge codes, permutahedral varieties, and their extensions

Authors: Hsin-Chieh Liao

Abstract: It is well known that the Eulerian polynomial is the Hilbert series of the cohomology of the permutahedral variety. Stanley obtained a formula showing that the cohomology carries a permutation representation of $\mathfrak{S}_n$. We answer a question of Stembridge on finding an explicit permutation basis of this cohomology. We observe that the Feichtner-Yuzvinsky basis for the Chow ring of the Bool… ▽ More It is well known that the Eulerian polynomial is the Hilbert series of the cohomology of the permutahedral variety. Stanley obtained a formula showing that the cohomology carries a permutation representation of $\mathfrak{S}_n$. We answer a question of Stembridge on finding an explicit permutation basis of this cohomology. We observe that the Feichtner-Yuzvinsky basis for the Chow ring of the Boolean matroid is such a permutation basis, and then we construct an $\mathfrak{S}_n$-equivariant bijection between this basis and codes introduced by Stembridge, thereby giving a combinatorial proof of Stanley's formula. We obtain an analogous result for the stellahedral variety. We find a permutation basis of the permutation representation its cohomology carries. This involves the augmented Chow ring of a matroid introduced by Braden, Huh, Matherne, Proudfoot and Wang. Along the way, we obtain a general result on augmented Chow rings (which was also independently obtained by Eur) asserting that augmented Chow rings of matroids are actually Chow rings in the sense of Feichtner and Yuzvinsky. In the last part of the paper, we study enumerative aspects of the permutahedra and the stellohedra related to these permutation bases. △ Less

Submitted 15 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 35 pages, 18 figures. This is the full-length version of the extended abstract arXiv:2212.05362. Correct Thm 6.1, add Thm 6.12, 6.13 and Remark 6.22 and some minor changes

MSC Class: 05B35; 05E18; 05E14; 05E05; 05A05; 05A19

arXiv:2403.09747 [pdf, other]

Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors

Authors: Guanghua Li, Wensheng Lu, Wei Zhang, Defu Lian, Kezhong Lu, Rui Mao, Kai Shu, Hao Liao

Abstract: The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from st… ▽ More The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from static repositories like Wikipedia, are limited by outdated or incomplete data, particularly for emerging or rare claims. Large Language Models (LLMs), known for their remarkable reasoning and generative capabilities, introduce a new frontier for fake news detection. However, like traditional methods, LLM-based solutions also grapple with the limitations of stale and long-tail knowledge. Additionally, retrieval-enhanced LLMs frequently struggle with issues such as low-quality evidence retrieval and context length constraints. To address these challenges, we introduce a novel, retrieval-augmented LLMs framework--the first of its kind to automatically and strategically extract key evidence from web sources for claim verification. Employing a multi-round retrieval strategy, our framework ensures the acquisition of sufficient, relevant evidence, thereby enhancing performance. Comprehensive experiments across three real-world datasets validate the framework's superiority over existing methods. Importantly, our model not only delivers accurate verdicts but also offers human-readable explanations to improve result interpretability. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.07387 [pdf, ps, other]

Combinatorics of generalized parking-function polytopes

Authors: Margaret M. Bayer, Steffen Borgwardt, Teressa Chambers, Spencer Daugherty, Aleyah Dawkins, Danai Deligeorgaki, Hsin-Chieh Liao, Tyrrell McAllister, Angela Morrison, Garrett Nelson, Andrés R. Vindas-Meléndez

Abstract: For $\mathbf{b}=(b_1,\dots,b_n)\in \mathbb{Z}_{>0}^n$, a $\mathbf{b}$-parking function is defined to be a sequence $(β_1,\dots,β_n)$ of positive integers whose nondecreasing rearrangement $β'_1\leq β'_2\leq \cdots \leq β'_n$ satisfies $β'_i\leq b_1+\cdots + b_i$. The $\mathbf{b}$-parking-function polytope $\mathfrak{X}_n(\mathbf{b})$ is the convex hull of all $\mathbf{b}$-parking functions of leng… ▽ More For $\mathbf{b}=(b_1,\dots,b_n)\in \mathbb{Z}_{>0}^n$, a $\mathbf{b}$-parking function is defined to be a sequence $(β_1,\dots,β_n)$ of positive integers whose nondecreasing rearrangement $β'_1\leq β'_2\leq \cdots \leq β'_n$ satisfies $β'_i\leq b_1+\cdots + b_i$. The $\mathbf{b}$-parking-function polytope $\mathfrak{X}_n(\mathbf{b})$ is the convex hull of all $\mathbf{b}$-parking functions of length $n$ in $\mathbb{R}^n$. Geometric properties of $\mathfrak{X}_n(\mathbf{b})$ were previously explored in the specific case where $\mathbf{b}=(a,b,b,\dots,b)$ and were shown to generalize those of the classical parking-function polytope. In this work, we study $\mathfrak{X}_n(\mathbf{b})$ in full generality. We present a minimal inequality and vertex description for $\mathfrak{X}_n(\mathbf{b})$, prove it is a generalized permutahedron, and study its $h$-polynomial. Furthermore, we investigate $\mathfrak{X}_n(\mathbf{b})$ through the perspectives of building sets and polymatroids, allowing us to identify its combinatorial types and obtain bounds on its combinatorial and circuit diameters. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 27 pages, 4 figures, Comments welcomed!

MSC Class: 05A10; 05A15; 05A19; 52B05; 52B11; 52B15; 52B20; 52B40;

arXiv:2403.06852 [pdf, other]

Suppressing Correlated Noise in Quantum Computers via Context-Aware Compiling

Authors: Alireza Seif, Haoran Liao, Vinay Tripathi, Kevin Krsulich, Moein Malekakhlagh, Mirko Amico, Petar Jurcevic, Ali Javadi-Abhari

Abstract: Coherent errors, and especially those that occur in correlation among a set of qubits, are detrimental for large-scale quantum computing. Correlations in noise can occur as a result of spatial and temporal configurations of instructions executing on the quantum processor. In this paper, we perform a detailed experimental characterization of many of these error sources, and theoretically connect th… ▽ More Coherent errors, and especially those that occur in correlation among a set of qubits, are detrimental for large-scale quantum computing. Correlations in noise can occur as a result of spatial and temporal configurations of instructions executing on the quantum processor. In this paper, we perform a detailed experimental characterization of many of these error sources, and theoretically connect them to the physics of superconducting qubits and gate operations. Equipped with this knowledge, we devise compiler strategies to suppress these errors using dynamical decoupling or error compensation into the rest of the circuit. Importantly, these strategies are successful when the context at each layer of computation is taken into account: how qubits are connected, what crosstalk terms exist on the device, and what gates or idle periods occur in that layer. Our context-aware compiler thus suppresses some dominant sources of error, making further error mitigation or error correction substantially less expensive. For example, our experiments show an increase of 18.5\% in layer fidelity for a candidate 10-qubit circuit layer compared to context-unaware suppression. Owing to the exponential nature of error mitigation, these improvements due to error suppression translate to several orders of magnitude reduction of sampling overhead for a circuit consisting of a moderate number of layers. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 16 pages, 10 figures

arXiv:2403.05063 [pdf, other]

Aligning Large Language Models for Controllable Recommendations

Authors: Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

Abstract: Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting th… ▽ More Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 13 pages

MSC Class: 68T50

arXiv:2403.03212 [pdf, other]

Performance of a modular ton-scale pixel-readout liquid argon time projection chamber

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, T. Alves, H. Amar, P. Amedo, J. Anderson, D. A. Andrade , et al. (1340 additional authors not shown)

Abstract: The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmi… ▽ More The Module-0 Demonstrator is a single-phase 600 kg liquid argon time projection chamber operated as a prototype for the DUNE liquid argon near detector. Based on the ArgonCube design concept, Module-0 features a novel 80k-channel pixelated charge readout and advanced high-coverage photon detection system. In this paper, we present an analysis of an eight-day data set consisting of 25 million cosmic ray events collected in the spring of 2021. We use this sample to demonstrate the imaging performance of the charge and light readout systems as well as the signal correlations between the two. We also report argon purity and detector uniformity measurements, and provide comparisons to detector simulations. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 47 pages, 41 figures

Report number: FERMILAB-PUB-24-0073-LBNF

arXiv:2403.02622 [pdf, other]

World Models for Autonomous Driving: An Initial Survey

Authors: Yanchen Guan, Haicheng Liao, Zhenning Li, Jia Hu, Runze Yuan, Yunjian Li, Guohui Zhang, Chengzhong Xu

Abstract: In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting po… ▽ More In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration. △ Less

Submitted 7 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.02589 [pdf, ps, other]

MUSIC: Accelerated Convergence for Distributed Optimization With Inexact and Exact Methods

Authors: Mou Wu, Haibin Liao, Zhengtao Ding, Yonggang Xiao

Abstract: Gradient-type distributed optimization methods have blossomed into one of the most important tools for solving a minimization learning task over a networked agent system. However, only one gradient update per iteration is difficult to achieve a substantive acceleration of convergence. In this paper, we propose an accelerated framework named as MUSIC allowing each agent to perform multiple local up… ▽ More Gradient-type distributed optimization methods have blossomed into one of the most important tools for solving a minimization learning task over a networked agent system. However, only one gradient update per iteration is difficult to achieve a substantive acceleration of convergence. In this paper, we propose an accelerated framework named as MUSIC allowing each agent to perform multiple local updates and a single combination in each iteration. More importantly, we equip inexact and exact distributed optimization methods into this framework, thereby developing two new algorithms that exhibit accelerated linear convergence and high communication efficiency. Our rigorous convergence analysis reveals the sources of steady-state errors arising from inexact policies and offers effective solutions. Numerical results based on synthetic and real datasets demonstrate both our theoretical motivations and analysis, as well as performance advantages. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01918 [pdf, other]

Towards Continuous Assurance Case Creation for ADS with the Evidential Tool Bus

Authors: Lev Sorokin, Radouane Bouchekir, Tewodros A. Beyene, Brian Hsuan-Cheng Liao, Adam Molin

Abstract: An assurance case has become an integral component for the certification of safety-critical systems. While manually defining assurance case patterns can be not avoided, system-specific instantiations of assurance case patterns are both costly and time-consuming. It becomes especially complex to maintain an assurance case for a system when the requirements of the System-Under-Assurance change, or a… ▽ More An assurance case has become an integral component for the certification of safety-critical systems. While manually defining assurance case patterns can be not avoided, system-specific instantiations of assurance case patterns are both costly and time-consuming. It becomes especially complex to maintain an assurance case for a system when the requirements of the System-Under-Assurance change, or an assurance claim becomes invalid due to, e.g., degradation of a systems component, as common when deploying learning-enabled components. In this paper, we report on our preliminary experience leveraging the tool integration framework Evidential Tool Bus (ETB) for the construction and continuous maintenance of an assurance case from a predefined assurance case pattern. Specifically, we demonstrate the assurance process on an industrial Automated Valet Parking system from the automotive domain. We present the formalization of the provided assurance case pattern in the ETB processable logical specification language of workflows. Our findings show that ETB is able to create and maintain evidence required for the construction of an assurance case. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted at International SafeAutonomy Workshop at EDCC '24

arXiv:2403.01683 [pdf, other]

DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy

Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Jian Chen, Zihui Zhang, Bingyu Yang, Sebastien Ourselin, Hongbin Liu

Abstract: Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-tr… ▽ More Real-time 6 DOF localization of bronchoscopes is crucial for enhancing intervention quality. However, current vision-based technologies struggle to balance between generalization to unseen data and computational speed. In this study, we propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) that can generalize across patient cases without the need of re-training. The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. To address the domain gap among patients, we propose a knowledge-embedded depth estimation network that maps endoscope frames to depth, ensuring generalization by eliminating patient-specific textures. The network embeds view synthesis knowledge into a cycle adversarial architecture for scale-constrained monocular depth estimation. For real-time performance, our localization module embeds a fast ego-motion estimation network into the loop of depth registration. The ego-motion inference network estimates the pose change of the bronchoscope in high frequency while depth registration against the pre-operative 3D model provides absolute pose periodically. Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework: 1) monocular depth estimation outperforms SOTA, 2) localization achieves an accuracy of Absolute Tracking Error (ATE) of 4.7 $\pm$ 3.17 mm in phantom and 6.49 $\pm$ 3.88 mm in patient data, 3) with a frame-rate approaching video capture speed, 4) without the necessity of case-wise network retraining. The framework's superior speed and accuracy demonstrate its promising clinical potential for real-time bronchoscopic navigation. △ Less

Submitted 15 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.19251 [pdf, other]

A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Zhiyong Cui, Shengbo Eben Li, Chengzhong Xu

Abstract: In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper… ▽ More In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper introduces the Human-Like Trajectory Prediction (HLTP) model, which adopts a teacher-student knowledge distillation framework inspired by human cognitive processes. The HLTP model incorporates a sophisticated teacher-student knowledge distillation framework. The "teacher" model, equipped with an adaptive visual sector, mimics the visual processing of the human brain, particularly the functions of the occipital and temporal lobes. The "student" model focuses on real-time interaction and decision-making, drawing parallels to prefrontal and parietal cortex functions. This approach allows for dynamic adaptation to changing driving scenarios, capturing essential perceptual cues for accurate prediction. Evaluated using the Macao Connected and Autonomous Driving (MoCAD) dataset, along with the NGSIM and HighD benchmarks, HLTP demonstrates superior performance compared to existing models, particularly in challenging environments with incomplete data. The project page is available at Github. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.15764 [pdf, other]

Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models

Authors: Haoran Liao, Jidong Tian, Shaohua Hu, Hao He, Yaohui Jin

Abstract: Large language models (LLMs) still grapple with complex tasks like mathematical reasoning. Despite significant efforts invested in improving prefix prompts or reasoning process, the crucial role of problem context might have been neglected. Accurate recognition of inputs is fundamental for solving mathematical tasks, as ill-formed problems could potentially mislead LLM's reasoning. In this study,… ▽ More Large language models (LLMs) still grapple with complex tasks like mathematical reasoning. Despite significant efforts invested in improving prefix prompts or reasoning process, the crucial role of problem context might have been neglected. Accurate recognition of inputs is fundamental for solving mathematical tasks, as ill-formed problems could potentially mislead LLM's reasoning. In this study, we propose a new approach named Problem Elaboration Prompting (PEP) to enhance the mathematical capacities of LLMs. Specifically, PEP decomposes and elucidates the problem context before reasoning, therefore enhancing the context modeling and parsing efficiency. Experiments across datasets and models demonstrate promising performances: (1) PEP demonstrates an overall enhancement in various mathematical tasks. For instance, with the GPT-3.5 model, PEP exhibits improvements of 9.93% and 8.80% on GSM8k through greedy decoding and self-consistency, respectively. (2) PEP can be easily implemented and integrated with other prompting methods. (3) PEP shows particular strength in handling distraction problems. △ Less

Submitted 26 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.13616 [pdf, other]

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Authors: Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao

Abstract: Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction an… ▽ More Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate architecture that can facilitate acquisition of enough information for prediction has to be designed. Existing methods ignore a fact that when input data undergoes layer-by-layer feature extraction and spatial transformation, large amount of information will be lost. This paper will delve into the important issues of data loss when data is transmitted through deep networks, namely information bottleneck and reversible functions. We proposed the concept of programmable gradient information (PGI) to cope with the various changes required by deep networks to achieve multiple objectives. PGI can provide complete input information for the target task to calculate objective function, so that reliable gradient information can be obtained to update network weights. In addition, a new lightweight network architecture -- Generalized Efficient Layer Aggregation Network (GELAN), based on gradient path planning is designed. GELAN's architecture confirms that PGI has gained superior results on lightweight models. We verified the proposed GELAN and PGI on MS COCO dataset based object detection. The results show that GELAN only uses conventional convolution operators to achieve better parameter utilization than the state-of-the-art methods developed based on depth-wise convolution. PGI can be used for variety of models from lightweight to large. It can be used to obtain complete information, so that train-from-scratch models can achieve better results than state-of-the-art models pre-trained using large datasets, the comparison results are shown in Figure 1. The source codes are at: https://github.com/WongKinYiu/yolov9. △ Less

Submitted 28 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12763 [pdf, other]

BronchoTrack: Airway Lumen Tracking for Branch-Level Bronchoscopic Localization

Authors: Qingyao Tian, Huai Liao, Xinyan Huang, Bingyu Yang, Jinlin Wu, Jian Chen, Lujie Li, Hongbin Liu

Abstract: Localizing the bronchoscope in real time is essential for ensuring intervention quality. However, most existing methods struggle to balance between speed and generalization. To address these challenges, we present BronchoTrack, an innovative real-time framework for accurate branch-level localization, encompassing lumen detection, tracking, and airway association.To achieve real-time performance, w… ▽ More Localizing the bronchoscope in real time is essential for ensuring intervention quality. However, most existing methods struggle to balance between speed and generalization. To address these challenges, we present BronchoTrack, an innovative real-time framework for accurate branch-level localization, encompassing lumen detection, tracking, and airway association.To achieve real-time performance, we employ a benchmark lightweight detector for efficient lumen detection. We are the first to introduce multi-object tracking to bronchoscopic localization, mitigating temporal confusion in lumen identification caused by rapid bronchoscope movement and complex airway structures. To ensure generalization across patient cases, we propose a training-free detection-airway association method based on a semantic airway graph that encodes the hierarchy of bronchial tree structures.Experiments on nine patient datasets demonstrate BronchoTrack's localization accuracy of 85.64 \%, while accessing up to the 4th generation of airways.Furthermore, we tested BronchoTrack in an in-vivo animal study using a porcine model, where it successfully localized the bronchoscope into the 8th generation airway.Experimental evaluation underscores BronchoTrack's real-time performance in both satisfying accuracy and generalization, demonstrating its potential for clinical applications. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.06129 [pdf, other]

Mesh-robust stability and convergence of variable-step deferred correction methods based on the BDF2 formula

Authors: Jiahe Yue, Hong-lin Liao, Nan Liu

Abstract: We provide a new theoretical framework for the variable-step deferred correction (DC) methods based on the well-known BDF2 formula. By using the discrete orthogonal convolution kernels, some high-order BDF2-DC methods are proven to be stable on arbitrary time grids according to the recent definition of stability (SINUM, 60: 2253-2272). It significantly relaxes the existing step-ratio restrictions… ▽ More We provide a new theoretical framework for the variable-step deferred correction (DC) methods based on the well-known BDF2 formula. By using the discrete orthogonal convolution kernels, some high-order BDF2-DC methods are proven to be stable on arbitrary time grids according to the recent definition of stability (SINUM, 60: 2253-2272). It significantly relaxes the existing step-ratio restrictions for the BDF2-DC methods (BIT, 62: 1789-1822). The associated sharp error estimates are established by taking the numerical effects of the starting approximations into account, and they suggest that the BDF2-DC methods have no aftereffect, that is, the lower-order starting scheme for the BDF2 scheme will not cause a loss in the accuracy of the high-order BDF2-DC methods. Extensive tests on the graded and random time meshes are presented to support the new theory. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 27 pages, 12 tables, 8 figures

MSC Class: 65M06; 65M12

arXiv:2402.04318 [pdf, other]

Human Observation-Inspired Trajectory Prediction for Autonomous Driving in Mixed-Autonomy Traffic Environments

Authors: Haicheng Liao, Shangqian Liu, Yongkang Li, Zhenning Li, Chengyue Wang, Yunjian Li, Shengbo Eben Li, Chengzhong Xu

Abstract: In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as time-series analysis. Our research diverges significantly by adopting an interdisciplinary approach that integrates principles of human cognition and observational behavior into traj… ▽ More In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as time-series analysis. Our research diverges significantly by adopting an interdisciplinary approach that integrates principles of human cognition and observational behavior into trajectory prediction models for AVs. We introduce a novel "adaptive visual sector" mechanism that mimics the dynamic allocation of attention human drivers exhibit based on factors like spatial orientation, proximity, and driving speed. Additionally, we develop a "dynamic traffic graph" using Convolutional Neural Networks (CNN) and Graph Attention Networks (GAT) to capture spatio-temporal dependencies among agents. Benchmark tests on the NGSIM, HighD, and MoCAD datasets reveal that our model (GAVA) outperforms state-of-the-art baselines by at least 15.2%, 19.4%, and 12.0%, respectively. Our findings underscore the potential of leveraging human cognition principles to enhance the proficiency and adaptability of trajectory prediction algorithms in AVs. The code for the proposed model is available at our Github. △ Less

Submitted 8 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01568 [pdf, other]

Doping Liquid Argon with Xenon in ProtoDUNE Single-Phase: Effects on Scintillation Light

Authors: DUNE Collaboration, A. Abed Abud, B. Abi, R. Acciarri, M. A. Acero, M. R. Adames, G. Adamov, M. Adamowski, D. Adams, M. Adinolfi, C. Adriano, A. Aduszkiewicz, J. Aguilar, B. Aimard, F. Akbar, K. Allison, S. Alonso Monsalve, M. Alrashed, A. Alton, R. Alvarez, H. Amar Es-sghir, P. Amedo, J. Anderson, D. A. Andrade, C. Andreopoulos , et al. (1300 additional authors not shown)

Abstract: Doping of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first doping test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUN… ▽ More Doping of liquid argon TPCs (LArTPCs) with a small concentration of xenon is a technique for light-shifting and facilitates the detection of the liquid argon scintillation light. In this paper, we present the results of the first doping test ever performed in a kiloton-scale LArTPC. From February to May 2020, we carried out this special run in the single-phase DUNE Far Detector prototype (ProtoDUNE-SP) at CERN, featuring 770 t of total liquid argon mass with 410 t of fiducial mass. The goal of the run was to measure the light and charge response of the detector to the addition of xenon, up to a concentration of 18.8 ppm. The main purpose was to test the possibility for reduction of non-uniformities in light collection, caused by deployment of photon detectors only within the anode planes. Light collection was analysed as a function of the xenon concentration, by using the pre-existing photon detection system (PDS) of ProtoDUNE-SP and an additional smaller set-up installed specifically for this run. In this paper we first summarize our current understanding of the argon-xenon energy transfer process and the impact of the presence of nitrogen in argon with and without xenon dopant. We then describe the key elements of ProtoDUNE-SP and the injection method deployed. Two dedicated photon detectors were able to collect the light produced by xenon and the total light. The ratio of these components was measured to be about 0.65 as 18.8 ppm of xenon were injected. We performed studies of the collection efficiency as a function of the distance between tracks and light detectors, demonstrating enhanced uniformity of response for the anode-mounted PDS. We also show that xenon doping can substantially recover light losses due to contamination of the liquid argon by nitrogen. △ Less

Submitted 9 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 35 pages, 20 figures

Report number: CERN-EP-2024-024; FERMILAB-PUB-23-0819-LBNF

arXiv:2402.00744 [pdf, other]

BATON: Aligning Text-to-Audio Model with Human Preference Feedback

Authors: Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

Abstract: With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment betw… ▽ More With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, a framework designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON comprises three key stages: Firstly, we curated a dataset containing both prompts and the corresponding generated audio, which was then annotated based on human feedback. Secondly, we introduced a reward model using the constructed dataset, which can mimic human preference by assigning rewards to input text-audio pairs. Finally, we employed the reward model to fine-tune an off-the-shelf text-to-audio model. The experiment results demonstrate that our BATON can significantly improve the generation quality of the original text-to-audio models, concerning audio integrity, temporal relationship, and alignment with human preference. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.05246 [pdf, other]

Loophole-free test of macroscopic realism via high-order correlations of measurement

Authors: Ping Wang, Chong Chen, Hao Liao, Vadim V. Vorobyov, Joerg Wrachtrup, and Ren-Bao Liu

Abstract: Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, t… ▽ More Test of {macroscopic realism} (MR) is key to understanding the foundation of quantum mechanics. Due to the existence of the {non-invasive measurability} loophole and other interpretation loopholes, however, such test remains an open question. Here we propose a general inequality based on high-order correlations of measurements for a loophole-free test of MR at the weak signal limit. Importantly, the inequality is established using the statistics of \textit{raw data} recorded by classical devices, without requiring a specific model for the measurement process, so its violation would falsify MR without the interpretation loophole. The non-invasive measurability loophole is also closed, since the weak signal limit can be verified solely by measurement data (using the relative scaling behaviors of different orders of correlations). We demonstrate that the inequality can be broken by a quantum spin model. The inequality proposed here provides an unambiguous test of the MR principle and is also useful to characterizing {quantum coherence}. △ Less

Submitted 15 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.03506 [pdf, other]

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Authors: Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

Abstract: In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (A… ▽ More In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset. △ Less

Submitted 15 July, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.11010 [pdf, other]

Endogenous preference for non-market goods in carbon abatement decision

Authors: Fangzhi Wang, Hua Liao, Richard S. J. Tol, Changjing Ji

Abstract: Carbon abatement decisions are usually based on the implausible assumption of constant social preference. This paper focuses on a specific case of market and non-market goods, and investigates the optimal climate policy when social preference for them is also changed by climate policy in the DICE model. The relative price of non-market goods grows over time due to increases in both relative scarci… ▽ More Carbon abatement decisions are usually based on the implausible assumption of constant social preference. This paper focuses on a specific case of market and non-market goods, and investigates the optimal climate policy when social preference for them is also changed by climate policy in the DICE model. The relative price of non-market goods grows over time due to increases in both relative scarcity and appreciation of it. Therefore, climbing relative price brings upward the social cost of carbon denominated in terms of market goods. Because abatement decision affects the valuation of non-market goods in the utility function, unlike previous climate-economy models, we solve the model iteratively by taking the obtained abatement rates from the last run as inputs in the current run. The results in baseline calibration advocate a more stringent climate policy, where endogenous social preference to climate policy raises the social cost of carbon further by roughly 12%-18% this century. Moreover, neglecting changing social preference leads to an underestimate of non-market goods damages by 15%. Our results support that climate policy is self-reinforced if it favors more expensive consumption type. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10088 [pdf, ps, other]

On Robustness to Missing Video for Audiovisual Speech Recognition

Authors: Oscar Chang, Otavio Braga, Hank Liao, Dmitriy Serdyuk, Olivier Siohan

Abstract: It has been shown that learning audiovisual features can lead to improved speech recognition performance over audio-only features, especially for noisy speech. However, in many common applications, the visual features are partially or entirely missing, e.g.~the speaker might move off screen. Multi-modal models need to be robust: missing video frames should not degrade the performance of an audiovi… ▽ More It has been shown that learning audiovisual features can lead to improved speech recognition performance over audio-only features, especially for noisy speech. However, in many common applications, the visual features are partially or entirely missing, e.g.~the speaker might move off screen. Multi-modal models need to be robust: missing video frames should not degrade the performance of an audiovisual model to be worse than that of a single-modality audio-only model. While there have been many attempts at building robust models, there is little consensus on how robustness should be evaluated. To address this, we introduce a framework that allows claims about robustness to be evaluated in a precise and testable way. We also conduct a systematic empirical study of the robustness of common audiovisual speech recognition architectures on a range of acoustic noise conditions and test suites. Finally, we show that an architecture-agnostic solution based on cascades can consistently achieve robustness to missing video, even in settings where existing techniques for robustness like dropout fall short. △ Less

Submitted 18 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

Showing 1–50 of 375 results for author: Liao, H