subscribe to arXiv mailings

SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking

Authors: Tianhong Catherine Yu, Manru, Zhang, Peter He, Chi-Jung Lee, Cassidy Cheesman, Saif Mahmud, Ruidong Zhang, François Guimbretière, Cheng Zhang

Abstract: Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing… ▽ More Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear the same as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt. With eight capacitive sensing seams, our customized deep-learning pipeline accurately estimates the upper-body 3D joint positions relative to the pelvis. With a 12-participant user study, we demonstrated promising cross-user and cross-session tracking performance. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11441 [pdf, other]

SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuojin Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling the Transformer to deal with large-scale point clouds. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11274 [pdf, other]

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention between non-adjacent layers. This method improves the model's ability to capture dependencies between high-level abstract features and low-level details. By facilitating direct attention between these diverse feature levels, our approach overcomes the limitations of current Transformers, which often rely on suboptimal intra-layer attention. Our implementation extends the Transformer's functionality by enabling queries in a given layer to interact with keys and values from both the current layer and one preceding layer, thus enhancing the diversity of multi-head attention without additional computational burden. Extensive experiments demonstrate that our enhanced Transformer model achieves superior performance in language modeling tasks, highlighting the effectiveness of our skip-layer attention mechanism. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 7 pages, 1 figure

arXiv:2406.10976 [pdf, other]

Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers and owners. To establish a mechanism that protects both data and model privacy in a federated learning context, we introduce a method that just needs to distribute a quantized version of the model's parameters during training. This method enables accurate gradient estimations for parameter updates while preventing clients from accessing a model whose performance is comparable to the centrally hosted one. Moreover, we combine this quantization strategy with LoRA, a popular and parameter-efficient fine-tuning method, to significantly reduce communication costs in federated learning. The proposed framework, named \textsc{FedLPP}, successfully ensures both data and model privacy in the federated learning context. Additionally, the learned central model exhibits good generalization and can be trained in a resource-efficient manner. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10869 [pdf, other]

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

Authors: Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu

Abstract: As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI sup… ▽ More As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important. Unlike 2D plain images that are formed on a plane, ODIs are projected onto spherical surfaces. Applying established image super-resolution methods to ODIs, therefore, requires performing equirectangular projection (ERP) to map the ODIs onto a plane. ODI super-resolution needs to take into account geometric distortion resulting from ERP. However, without considering such geometric distortion of ERP images, previous deep-learning-based methods only utilize a limited range of pixels and may easily miss self-similar textures for reconstruction. In this paper, we introduce a novel Geometric Distortion Guided Transformer for Omnidirectional image Super-Resolution (GDGT-OSR). Specifically, a distortion modulated rectangle-window self-attention mechanism, integrated with deformable self-attention, is proposed to better perceive the distortion and thus involve more self-similar textures. Distortion modulation is achieved through a newly devised distortion guidance generator that produces guidance by exploiting the variability of distortion across latitudes. Furthermore, we propose a dynamic feature aggregation scheme to adaptively fuse the features from different self-attention modules. We present extensive experimental results on public datasets and show that the new GDGT-OSR outperforms methods in existing literature. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 13 pages, 12 figures, journal

arXiv:2406.10862 [pdf, other]

OpenCAEPoro: A Parallel Simulation Framework for Multiphase and Multicomponent Porous Media Flows

Authors: Shizhe Li, Chen-Song Zhang

Abstract: OpenCAEPoro is a parallel numerical simulation software developed in C++ for simulating multiphase and multicomponent flows in porous media. The software utilizes a set of general-purpose compositional model equations, enabling it to handle a diverse range of fluid dynamics, including the black oil model, compositional model, and thermal recovery models. OpenCAEPoro establishes a unified solving f… ▽ More OpenCAEPoro is a parallel numerical simulation software developed in C++ for simulating multiphase and multicomponent flows in porous media. The software utilizes a set of general-purpose compositional model equations, enabling it to handle a diverse range of fluid dynamics, including the black oil model, compositional model, and thermal recovery models. OpenCAEPoro establishes a unified solving framework that integrates many widely used methods, such as IMPEC, FIM, and AIM. This framework allows dynamic collaboration between different methods. Specifically, based on this framework, we have developed an adaptively coupled domain decomposition method, which can provide initial solutions for global methods to accelerate the simulation. The reliability of OpenCAEPoro has been validated through benchmark testing with the SPE comparative solution project. Furthermore, its robust parallel efficiency has been tested in distributed parallel environments, demonstrating its suitability for large-scale simulation problems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 29 pages, 19 figures

ACM Class: G.4

arXiv:2406.10750 [pdf, other]

EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos

Authors: Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, François Guimbretière, Cheng Zhang

Abstract: Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when e… ▽ More Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when eating occurs without capturing details of the eating activities (E.g., what is being eaten). In this paper, we present EchoGuide, an application and system pipeline that leverages low-power active acoustic sensing to guide head-mounted cameras to capture egocentric videos, enabling efficient and detailed analysis of eating activities. By combining active acoustic sensing for eating detection with video captioning models and large-scale language models for retrieval augmentation, EchoGuide intelligently clips and analyzes videos to create concise, relevant activity records on eating. We evaluated EchoGuide with 9 participants in naturalistic settings involving eating activities, demonstrating high-quality summarization and significant reductions in video data needed, paving the way for practical, scalable eating activity tracking. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.10650 [pdf, other]

The Implicit Bias of Adam on Separable Data

Authors: Chenyang Zhang, Difan Zou, Yuan Cao

Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maximum $\ell_\infty$-margin. Notably, for a general class of diminishing learning rates, this convergence occurs within polynomial time. Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 33 pages, 2 figures

arXiv:2406.10583 [pdf, other]

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0301

arXiv:2406.10408 [pdf, other]

Final Search for Short-Baseline Neutrino Oscillations with the PROSPECT-I Detector at HFIR

Authors: M. Andriamirado, B. Balantekin, C. D. Bass, O. Benevides Rodrigues, E. P. Bernard, N. S. Bowden, C. D. Bryan, R. Carr, T. Classen, A. J. Conant, G. Deichert, M. J. Dolinski, A. Erickson, A. Galindo-Uribarri, S. Gokhale, C. Grant, S. Hans, A. B. Hansell, K. M. Heeger, B. Heffron, D. E. Jaffe, S. Jayakumar, J. R. Koblanski, P. Kunkle, C. E. Lane , et al. (22 additional authors not shown)

Abstract: The PROSPECT experiment is designed to perform precise searches for antineutrino disappearance at short distances (7 - 9~m) from compact nuclear reactor cores. This Letter reports results from a new neutrino oscillation analysis performed using the complete data sample from the PROSPECT-I detector operated at the High Flux Isotope Reactor in 2018. The analysis uses a multi-period selection of inve… ▽ More The PROSPECT experiment is designed to perform precise searches for antineutrino disappearance at short distances (7 - 9~m) from compact nuclear reactor cores. This Letter reports results from a new neutrino oscillation analysis performed using the complete data sample from the PROSPECT-I detector operated at the High Flux Isotope Reactor in 2018. The analysis uses a multi-period selection of inverse beta decay neutrino interactions with reduced backgrounds and enhanced statistical power to set limits on electron-flavor disappearance caused by mixing with sterile neutrinos with 0.2 - 20 eV$^2$ mass splittings. Inverse beta decay positron energy spectra from six different reactor-detector distance ranges are found to be statistically consistent with one another, as would be expected in the absence of sterile neutrino oscillations. The data excludes at 95% confidence level the existence of sterile neutrinos in regions above 3~eV$^2$ previously unexplored by terrestrial experiments, including all space below 10~eV$^2$ suggested by the recently strengthened Gallium Anomaly. The best-fit point of the Neutrino-4 reactor experiment's claimed observation of short-baseline oscillation is ruled out at more than five standard deviations. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2406.10344 [pdf, other]

Phases and phase transition in Grover's algorithm with systematic noise

Authors: Sasanka Dowarah, Chuanwei Zhang, Vedika Khemani, Michael H. Kolodrubetz

Abstract: While limitations on quantum computation by Markovian environmental noise are well-understood in generality, their behavior for different quantum circuits and noise realizations can be less universal. Here we consider a canonical quantum algorithm - Grover's algorithm for unordered search on $L$ qubits - in the presence of systematic noise. This allows us to write the behavior as a random Floquet… ▽ More While limitations on quantum computation by Markovian environmental noise are well-understood in generality, their behavior for different quantum circuits and noise realizations can be less universal. Here we consider a canonical quantum algorithm - Grover's algorithm for unordered search on $L$ qubits - in the presence of systematic noise. This allows us to write the behavior as a random Floquet unitary, which we show is well-characterized by random matrix theory (RMT). The RMT analysis enables analytical predictions for phases and phase transitions of the many-body dynamics. We find two separate transitions. At moderate disorder $δ_{c,\mathrm{gap}}\sim L^{-1}$, there is a ergodicity breaking transition such that a finite-dimensional manifold remains non-ergodic for $δ< δ_{c,\mathrm{gap}}$. Computational power is lost at a much smaller disorder, $δ_{c,\mathrm{comp}} \sim L^{-1/2}2^{-L/2}$. We comment on relevance to non-systematic noise in realistic quantum computers, including cold atom, trapped ion, and superconducting platforms. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 14 pages, 11 figures

arXiv:2406.10163 [pdf, other]

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

arXiv:2406.10123 [pdf, other]

Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (164 additional authors not shown)

Abstract: We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstr… ▽ More We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstructing and summing visible energies, often experience sizable biases and resolution smearing because of the complex nature of neutrino interactions and the detector response. The estimation of neutrino energy can be improved after considering the kinematics information of reconstructed final-state particles. Utilizing kinematic information of reconstructed particles, the deep learning-based approach shows improved resolution and reduced bias for the muon neutrino Monte Carlo simulation sample compared to the traditional approach. In order to address the common concern about the effectiveness of this method on experimental data, the RNN-based energy estimator is further examined and validated with dedicated data-simulation consistency tests using MicroBooNE data. We also assess its potential impact on a neutrino oscillation study after accounting for all statistical and systematic uncertainties and show that it enhances physics sensitivity. This method has good potential to improve the performance of other physics analyses. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0287

arXiv:2406.09931 [pdf, other]

SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms

Authors: Yifei Chen, Zhu Zhu, Shenghao Zhu, Linwei Qiu, Binfeng Zou, Fan Jia, Yunpeng Zhu, Chenyan Zhang, Zhaojie Fang, Feiwei Qin, Jin Fan, Changmiao Wang, Yu Gao, Gang Yu

Abstract: The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund… ▽ More The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures

arXiv:2406.09475 [pdf, other]

Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the $90\%$ confidence level. In addition, the branching faction $B(J/ψ\toωK^+ K^- η)$ is measured to be $(3.33\pm0.02(\rm{stat.})\pm 0.12(\rm{syst.}))\times 10^{-4}$. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09466 [pdf, other]

Diffuse optical tomography in time domain with the inverse Rytov series

Authors: Chi Zhang, Manabu Machida

Abstract: The Rytov approximation has been commonly used to obtain reconstructed images for optical tomography. However, the method requires linearization of the nonlinear inverse problem. Here, we demonstrate nonlinear Rytov approximations by developing the inverse Rytov series for the time-dependent diffusion equation. The method is experimentally verified. The Rytov approximation has been commonly used to obtain reconstructed images for optical tomography. However, the method requires linearization of the nonlinear inverse problem. Here, we demonstrate nonlinear Rytov approximations by developing the inverse Rytov series for the time-dependent diffusion equation. The method is experimentally verified. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09162 [pdf, other]

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Authors: Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

Abstract: Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image gen… ▽ More Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image generation model accepting multi-modal prompts built upon the state-of-the-art text-to-image (T2I) diffusion model, ELLA. EMMA seamlessly incorporates additional modalities alongside text to guide image generation through an innovative Multi-modal Feature Connector design, which effectively integrates textual and supplementary modal information using a special attention mechanism. By freezing all parameters in the original T2I diffusion model and only adjusting some additional layers, we reveal an interesting finding that the pre-trained T2I diffusion model can secretly accept multi-modal prompts. This interesting property facilitates easy adaptation to different existing frameworks, making EMMA a flexible and effective tool for producing personalized and context-aware images and even videos. Additionally, we introduce a strategy to assemble learned EMMA modules to produce images conditioned on multiple modalities simultaneously, eliminating the need for additional training with mixed multi-modal prompts. Extensive experiments demonstrate the effectiveness of EMMA in maintaining high fidelity and detail in generated images, showcasing its potential as a robust solution for advanced multi-modal conditional image generation tasks. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: https://tencentqqgylab.github.io/EMMA

arXiv:2406.09130 [pdf, other]

Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

Authors: Haoxin Liu, Harshavardhan Kamarthi, Lingkai Kong, Zhiyuan Zhao, Chao Zhang, B. Aditya Prakash

Abstract: Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We ident… ▽ More Time-series forecasting (TSF) finds broad applications in real-world scenarios. Due to the dynamic nature of time-series data, it is crucial to equip TSF models with out-of-distribution (OOD) generalization abilities, as historical training data and future test data can have different distributions. In this paper, we aim to alleviate the inherent OOD problem in TSF via invariant learning. We identify fundamental challenges of invariant learning for TSF. First, the target variables in TSF may not be sufficiently determined by the input due to unobserved core variables in TSF, breaking the conventional assumption of invariant learning. Second, time-series datasets lack adequate environment labels, while existing environmental inference methods are not suitable for TSF. To address these challenges, we propose FOIL, a model-agnostic framework that enables timeseries Forecasting for Out-of-distribution generalization via Invariant Learning. FOIL employs a novel surrogate loss to mitigate the impact of unobserved variables. Further, FOIL implements a joint optimization by alternately inferring environments effectively with a multi-head network while preserving the temporal adjacency structure, and learning invariant representations across inferred environments for OOD generalized TSF. We demonstrate that the proposed FOIL significantly improves the performance of various TSF models, achieving gains of up to 85%. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages

ACM Class: H.0

arXiv:2406.08858 [pdf, other]

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

Authors: Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi

Abstract: We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autono… ▽ More We present OmniH2O (Omni Human-to-Humanoid), a learning-based system for whole-body humanoid teleoperation and autonomy. Using kinematic pose as a universal control interface, OmniH2O enables various ways for a human to control a full-sized humanoid with dexterous hands, including using real-time teleoperation through VR headset, verbal instruction, and RGB camera. OmniH2O also enables full autonomy by learning from teleoperated demonstrations or integrating with frontier models such as GPT-4. OmniH2O demonstrates versatility and dexterity in various real-world whole-body tasks through teleoperation or autonomy, such as playing multiple sports, moving and manipulating objects, and interacting with humans. We develop an RL-based sim-to-real pipeline, which involves large-scale retargeting and augmentation of human motion datasets, learning a real-world deployable policy with sparse sensor input by imitating a privileged teacher policy, and reward designs to enhance robustness and stability. We release the first humanoid whole-body control dataset, OmniH2O-6, containing six everyday tasks, and demonstrate humanoid whole-body skill learning from teleoperated datasets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Project page: https://omni.human2humanoid.com/

arXiv:2406.08729 [pdf, other]

Structure Phase Change Induced by Nonequilibrium Effects in Molecular Scale Junctions

Authors: Hao Wang, Kah-Meng Yam, Zhuoling Jiang, Na Guo, Chun Zhang

Abstract: The interrelationship between a material's structure and its properties lies at the heart of materials-related research. Finding how the changes of one affect the other is of primary importance in theoretical and computational materials studies. In this work, based on Hershfield nonequilibrium quantum statistics and the mean-field approach with steady-state density functional theory, we derive a f… ▽ More The interrelationship between a material's structure and its properties lies at the heart of materials-related research. Finding how the changes of one affect the other is of primary importance in theoretical and computational materials studies. In this work, based on Hershfield nonequilibrium quantum statistics and the mean-field approach with steady-state density functional theory, we derive a first-principles method to calculate nonequilibrium effects induced forces acting on atoms, enabling structure optimizations and molecular dynamics simulations for molecular junctions under external biases. By applying the method to a few molecular devices, we found that in general, the external bias can induce profound nonequilibrium effects on both electronic/transport properties and the geometric structure of these devices, and consequent changes in electronic properties and geometric structure are closely interrelated. Particularly, when the bias voltage is above 1.0 V, significant structure phase changes could occur, causing dramatic changes in I-V characteristics and vibrational spectra. These findings greatly broaden our understanding of quantum electronic devices and provide a new avenue for discovering novel transport phenomena at molecular scale. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 17 pages, 11 figures

arXiv:2406.08627 [pdf, other]

Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis

Authors: Haoxin Liu, Shangqing Xu, Zhiyuan Zhao, Lingkai Kong, Harshavardhan Kamarthi, Aditya B. Sasanur, Megha Sharma, Jiaming Cui, Qingsong Wen, Chao Zhang, B. Aditya Prakash

Abstract: Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of text… ▽ More Time series data are ubiquitous across a wide range of real-world domains. While real-world time series analysis (TSA) requires human experts to integrate numerical series data with multimodal domain-specific knowledge, most existing TSA models rely solely on numerical data, overlooking the significance of information beyond numerical series. This oversight is due to the untapped potential of textual series data and the absence of a comprehensive, high-quality multimodal dataset. To overcome this obstacle, we introduce Time-MMD, the first multi-domain, multimodal time series dataset covering 9 primary data domains. Time-MMD ensures fine-grained modality alignment, eliminates data contamination, and provides high usability. Additionally, we develop MM-TSFlib, the first multimodal time-series forecasting (TSF) library, seamlessly pipelining multimodal TSF evaluations based on Time-MMD for in-depth analyses. Extensive experiments conducted on Time-MMD through MM-TSFlib demonstrate significant performance enhancements by extending unimodal TSF to multimodality, evidenced by over 15% mean squared error reduction in general, and up to 40% in domains with rich textual data. More importantly, our datasets and library revolutionize broader applications, impacts, research topics to advance TSA. The dataset and library are available at https://github.com/AdityaLab/Time-MMD and https://github.com/AdityaLab/MM-TSFlib. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08416 [pdf, other]

TokSing: Singing Voice Synthesis based on Discrete Tokens

Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin

Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody expression poses a great challenge for utilizing discrete tokens. In this paper, we introduce TokSing, a discrete-based SVS system equipped with a token formulator that offers flexible token blendings. We observe a melody degradation during discretization, prompting us to integrate a melody signal with the discrete token and incorporate a specially-designed melody enhancement strategy in the musical encoder. Extensive experiments demonstrate that our TokSing achieves better performance against the Mel spectrogram baselines while offering advantages in intermediate representation space cost and convergence speed. △ Less

Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.08359 [pdf, other]

Reactor Antineutrino Directionality Measurement with the PROSPECT-I Detector

Authors: M. Andriamirado, B. Balantekin, C. D. Bass, O. Benevides Rodrigues, E. P. Bernard, N. S. Bowden, C. D. Bryan, R. Carr, T. Classen, A. J. Conant, G. Deichert, M. J. Dolinski, A. Erickson, A. Galindo-Uribarri, S. Gokhale, C. Grant, S. Hans, A. B. Hansell, K. M. Heeger, B. Heffron, D. E. Jaffe, S. Jayakumar, D. C. Jones, J. R. Koblanski, P. Kunkle , et al. (24 additional authors not shown)

Abstract: The PROSPECT-I detector has several features that enable measurement of the direction of a compact neutrino source. In this paper, a detailed report on the directional measurements made on electron antineutrinos emitted from the High Flux Isotope Reactor is presented. With an estimated true neutrino (reactor to detector) direction of $φ= 40.8\unicode{xB0} \pm 0.7\unicode{xB0}$ and… ▽ More The PROSPECT-I detector has several features that enable measurement of the direction of a compact neutrino source. In this paper, a detailed report on the directional measurements made on electron antineutrinos emitted from the High Flux Isotope Reactor is presented. With an estimated true neutrino (reactor to detector) direction of $φ= 40.8\unicode{xB0} \pm 0.7\unicode{xB0}$ and $θ= 98.6\unicode{xB0} \pm 0.4\unicode{xB0}$, the PROSPECT-I detector is able to reconstruct an average neutrino direction of $φ= 39.4\unicode{xB0} \pm 2.9\unicode{xB0}$ and $θ= 97.6\unicode{xB0} \pm 1.6\unicode{xB0}$. This measurement is made with approximately 48000 Inverse Beta Decay signal events and is the most precise directional reconstruction of reactor antineutrinos to date. △ Less

Submitted 11 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08358 [pdf, other]

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

Authors: Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong Chen

Abstract: People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods o… ▽ More People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods of social relationship understanding rely on the basic classification paradigm of detected persons and objects, which fails to understand the comprehensive context and often overlooks decisive social factors, especially subtle visual cues. To highlight the social-aware context and intricate details, we propose a novel approach that recognizes \textbf{Con}textual \textbf{So}cial \textbf{R}elationships (\textbf{ConSoR}) from a social cognitive perspective. Specifically, to incorporate social-aware semantics, we build a lightweight adapter upon the frozen CLIP to learn social concepts via our novel multi-modal side adapter tuning mechanism. Further, we construct social-aware descriptive language prompts (e.g., scene, activity, objects, emotions) with social relationships for each image, and then compel ConSoR to concentrate more intensively on the decisive visual social factors via visual-linguistic contrasting. Impressively, ConSoR outperforms previous methods with a 12.2\% gain on the People-in-Social-Context (PISC) dataset and a 9.8\% increase on the People-in-Photo-Album (PIPA) benchmark. Furthermore, we observe that ConSoR excels at finding critical visual evidence to reveal social relationships. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08311 [pdf, other]

Causality for Tabular Data Synthesis: A High-Order Structure Causal Benchmark Framework

Authors: Ruibo Tu, Zineb Senane, Lele Cao, Cheng Zhang, Hedvig Kjellström, Gustav Eje Henter

Abstract: Tabular synthesis models remain ineffective at capturing complex dependencies, and the quality of synthetic data is still insufficient for comprehensive downstream tasks, such as prediction under distribution shifts, automated decision-making, and cross-table understanding. A major challenge is the lack of prior knowledge about underlying structures and high-order relationships in tabular data. We… ▽ More Tabular synthesis models remain ineffective at capturing complex dependencies, and the quality of synthetic data is still insufficient for comprehensive downstream tasks, such as prediction under distribution shifts, automated decision-making, and cross-table understanding. A major challenge is the lack of prior knowledge about underlying structures and high-order relationships in tabular data. We argue that a systematic evaluation on high-order structural information for tabular data synthesis is the first step towards solving the problem. In this paper, we introduce high-order structural causal information as natural prior knowledge and provide a benchmark framework for the evaluation of tabular synthesis models. The framework allows us to generate benchmark datasets with a flexible range of data generation processes and to train tabular synthesis models using these datasets for further evaluation. We propose multiple benchmark tasks, high-order metrics, and causal inference tasks as downstream tasks for evaluating the quality of synthetic data generated by the trained models. Our experiments demonstrate to leverage the benchmark framework for evaluating the model capability of capturing high-order structural causal information. Furthermore, our benchmarking results provide an initial assessment of state-of-the-art tabular synthesis models. They have clearly revealed significant gaps between ideal and actual performance and how baseline methods differ. Our benchmark framework is available at URL https://github.com/TURuibo/CauTabBench. △ Less

Submitted 5 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.08273 [pdf, other]

SonicID: User Identification on Smart Glasses with Acoustic Sensing

Authors: Ke Li, Devansh Agarwal, Ruidong Zhang, Vipin Gunda, Tianjun Mo, Saif Mahmud, Boao Chen, François Guimbretière, Cheng Zhang

Abstract: Smart glasses have become more prevalent as they provide an increasing number of applications for users. They store various types of private information or can access it via connections established with other devices. Therefore, there is a growing need for user identification on smart glasses. In this paper, we introduce a low-power and minimally-obtrusive system called SonicID, designed to authen… ▽ More Smart glasses have become more prevalent as they provide an increasing number of applications for users. They store various types of private information or can access it via connections established with other devices. Therefore, there is a growing need for user identification on smart glasses. In this paper, we introduce a low-power and minimally-obtrusive system called SonicID, designed to authenticate users on glasses. SonicID extracts unique biometric information from users by scanning their faces with ultrasonic waves and utilizes this information to distinguish between different users, powered by a customized binary classifier with the ResNet-18 architecture. SonicID can authenticate users within 0.12 seconds, with an energy consumption of 19.8 mAs per trial. A user study involving 24 participants confirms that SonicID achieves a true positive rate of 96.5%, a false positive rate of 4.1%, and a balanced accuracy of 96.2% using just 4 minutes of training data collected for each new user. This performance is relatively consistent across different remounting sessions and days. Given this promising performance, we further discuss the potential applications of SonicID and methods to improve its performance in the future. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 20 pages, 2 tables, 6 figures

arXiv:2406.08225 [pdf, ps, other]

Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measured in both destructive and constructive interference scenarios for the first time. The mass and width of the $η_{c}(1S)$ are measured to be $M=(2984.14 \pm 0.13 \pm 0.38)$ MeV/$c^{2}$ and $Γ=(28.82 \pm 0.11 \pm 0.82)$ MeV, respectively. Clear signals for the decays of the $χ_{cJ}(J=0,1,2)$ and the $η_{c}(2S)$ to $2(π^{+}π^{-})η$ are also observed for the first time, and the corresponding branching fractions are measured. The ratio of the branching fractions between the $η_{c}(2S)$ and $η_{c}(1S)$ decays is significantly lower than the theoretical prediction, which might suggest different dynamics in their decays. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07982 [pdf, ps, other]

Quantitative analysis and its applications for Keller-Segel type systems

Authors: Mengyao Ding, Yuzhou Fang, Chao Zhang

Abstract: In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investiga… ▽ More In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investigate the asymptotic stability of a chemotaxis model with nonlinear signal production and a chemotaxis-Navier-Stokes model with a logistic source. Additionally, within the context of $p$-Laplacian diffusion, we establish Hölder continuity for a chemotaxis-haptotaxis model and a chemotaxis-Stokes model. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07914 [pdf, other]

Can Large Language Models Understand Spatial Audio?

Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and localisation-informed speech extraction (LSE), achieving notable progress in each task. For SSL, our approach achieves an MAE of $2.70^{\circ}$ on the Spatial LibriSpeech dataset, substantially surpassing the prior benchmark of about $6.60^{\circ}$. Moreover, our model can employ spatial cues to improve FSR accuracy and execute LSE by selectively attending to sounds originating from a specified direction via text prompts, even amidst overlapping speech. These findings highlight the potential of adapting LLMs to grasp physical audio concepts, paving the way for LLM-based agents in 3D environments. △ Less

Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2406.07894 [pdf, other]

100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems

Authors: Chaopeng Zhang, Wenshuo Wang, Zhaokun Chen, Junqiang Xi

Abstract: Effective driving style analysis is critical to developing human-centered intelligent driving systems that consider drivers' preferences. However, the approaches and conclusions of most related studies are diverse and inconsistent because no unified datasets tagged with driving styles exist as a reliable benchmark. The absence of explicit driving style labels makes verifying different approaches a… ▽ More Effective driving style analysis is critical to developing human-centered intelligent driving systems that consider drivers' preferences. However, the approaches and conclusions of most related studies are diverse and inconsistent because no unified datasets tagged with driving styles exist as a reliable benchmark. The absence of explicit driving style labels makes verifying different approaches and algorithms difficult. This paper provides a new benchmark by constructing a natural dataset of Driving Style (100-DrivingStyle) tagged with the subjective evaluation of 100 drivers' driving styles. In this dataset, the subjective quantification of each driver's driving style is from themselves and an expert according to the Likert-scale questionnaire. The testing routes are selected to cover various driving scenarios, including highways, urban, highway ramps, and signalized traffic. The collected driving data consists of lateral and longitudinal manipulation information, including steering angle, steering speed, lateral acceleration, throttle position, throttle rate, brake pressure, etc. This dataset is the first to provide detailed manipulation data with driving-style tags, and we demonstrate its benchmark function using six classifiers. The 100-DrivingStyle dataset is available via https://github.com/chaopengzhang/100-DrivingStyle-Dataset △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07885 [pdf, other]

GENIU: A Restricted Data Access Unlearning for Imbalanced Data

Authors: Chenhao Zhang, Shaofei Shen, Yawen Zhao, Weitong Tony Chen, Miao Xu

Abstract: With the increasing emphasis on data privacy, the significance of machine unlearning has grown substantially. Class unlearning, which involves enabling a trained model to forget data belonging to a specific class learned before, is important as classification tasks account for the majority of today's machine learning as a service (MLaaS). Retraining the model on the original data, excluding the da… ▽ More With the increasing emphasis on data privacy, the significance of machine unlearning has grown substantially. Class unlearning, which involves enabling a trained model to forget data belonging to a specific class learned before, is important as classification tasks account for the majority of today's machine learning as a service (MLaaS). Retraining the model on the original data, excluding the data to be forgotten (a.k.a forgetting data), is a common approach to class unlearning. However, the availability of original data during the unlearning phase is not always guaranteed, leading to the exploration of class unlearning with restricted data access. While current unlearning methods with restricted data access usually generate proxy sample via the trained neural network classifier, they typically focus on training and forgetting balanced data. However, the imbalanced original data can cause trouble for these proxies and unlearning, particularly when the forgetting data consists predominantly of the majority class. To address this issue, we propose the GENerative Imbalanced Unlearning (GENIU) framework. GENIU utilizes a Variational Autoencoder (VAE) to concurrently train a proxy generator alongside the original model. These generated proxies accurately represent each class and are leveraged in the unlearning phase, eliminating the reliance on the original training data. To further mitigate the performance degradation resulting from forgetting the majority class, we introduce an in-batch tuning strategy that works with the generated proxies. GENIU is the first practical framework for class unlearning in imbalanced data settings and restricted data access, ensuring the preservation of essential information for future unlearning. Experimental results confirm the superiority of GENIU over existing methods, establishing its effectiveness in empirical scenarios. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07514 [pdf, other]

Scintillation Light in SBND: Simulation, Reconstruction, and Expected Performance of the Photon Detection System

Authors: SBND Collaboration, P. Abratenko, R. Acciarri, C. Adams, L. Aliaga-Soplin, O. Alterkait, R. Alvarez-Garrote, C. Andreopoulos, A. Antonakis, L. Arellano, J. Asaadi, W. Badgett, S. Balasubramanian, V. Basque, A. Beever, B. Behera, E. Belchior, M. Betancourt, A. Bhat, M. Bishai, A. Blake, B. Bogart, J. Bogenschuetz, D. Brailsford, A. Brandt , et al. (158 additional authors not shown)

Abstract: SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its… ▽ More SBND is the near detector of the Short-Baseline Neutrino program at Fermilab. Its location near to the Booster Neutrino Beam source and relatively large mass will allow the study of neutrino interactions on argon with unprecedented statistics. This paper describes the expected performance of the SBND photon detection system, using a simulated sample of beam neutrinos and cosmogenic particles. Its design is a dual readout concept combining a system of 120 photomultiplier tubes, used for triggering, with a system of 192 X-ARAPUCA devices, located behind the anode wire planes. Furthermore, covering the cathode plane with highly-reflective panels coated with a wavelength-shifting compound recovers part of the light emitted towards the cathode, where no optical detectors exist. We show how this new design provides a high light yield and a more uniform detection efficiency, an excellent timing resolution and an independent 3D-position reconstruction using only the scintillation light. Finally, the whole reconstruction chain is applied to recover the temporal structure of the beam spill, which is resolved with a resolution on the order of nanoseconds. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 21 pages, 17 figures

Report number: FERMILAB-PUB-24-0303-PPD

arXiv:2406.07296 [pdf, other]

Instruct Large Language Models to Drive like Humans

Authors: Ruijun Zhang, Xianda Guo, Wenzhao Zheng, Chenming Zhang, Kurt Keutzer, Long Chen

Abstract: Motion planning in complex scenarios is the core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to plan the future trajectory. Recent methods seek the knowledge preserved in large language models (LLMs) and apply them in the driving scenarios. Despite the promising results, it is still unclear whether the LLM learns the underlying human logi… ▽ More Motion planning in complex scenarios is the core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to plan the future trajectory. Recent methods seek the knowledge preserved in large language models (LLMs) and apply them in the driving scenarios. Despite the promising results, it is still unclear whether the LLM learns the underlying human logic to drive. In this paper, we propose an InstructDriver method to transform LLM into a motion planner with explicit instruction tuning to align its behavior with humans. We derive driving instruction data based on human logic (e.g., do not cause collisions) and traffic rules (e.g., proceed only when green lights). We then employ an interpretable InstructChain module to further reason the final planning reflecting the instructions. Our InstructDriver allows the injection of human rules and learning from driving data, enabling both interpretability and data scalability. Different from existing methods that experimented on closed-loop or simulated settings, we adopt the real-world closed-loop motion planning nuPlan benchmark for better evaluation. InstructDriver demonstrates the effectiveness of the LLM planner in a real-world closed-loop setting. Our code is publicly available at https://github.com/bonbon-rj/InstructDriver. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: project page: https://github.com/bonbon-rj/InstructDriver

arXiv:2406.07067 [pdf, other]

TIM: Temporal Interaction Model in Notification System

Authors: Huxiao Ji, Haitao Yang, Linchuan Li, Shunyu Zhang, Cunyi Zhang, Xuanping Li, Wenwu Ou

Abstract: Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patter… ▽ More Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patterns. Additionally, these efforts only focus on individual notifications, and there is a lack of studies on optimizing the holistic timing of multiple notifications within a period. To bridge these gaps, we propose the Temporal Interaction Model (TIM), which models users' behavior patterns by estimating CTR in every time slot over a day in our short video application Kuaishou. TIM leverages long-term user historical interaction sequence features such as notification receipts, clicks, watch time and effective views, and employs a temporal attention unit (TAU) to extract user behavior patterns. Moreover, we provide an elegant strategy of holistic notifications send time control to improve user engagement while minimizing disruption. We evaluate the effectiveness of TIM through offline experiments and online A/B tests. The results indicate that TIM is a reliable tool for forecasting user behavior, leading to a remarkable enhancement in user engagement without causing undue disturbance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07011 [pdf, ps, other]

Breaking Free: Efficient Multi-Party Private Set Union Without Non-Collusion Assumptions

Authors: Minglang Dong, Yu Chen, Cong Zhang, Yujie Bai

Abstract: Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resultin… ▽ More Multi-party private set union (MPSU) protocol enables $m$ $(m > 2)$ parties, each holding a set, to collectively compute the union of their sets without revealing any additional information to other parties. There are two main categories of MPSU protocols: The first builds on public-key techniques. All existing works in this category involve a super-linear number of public-key operations, resulting in poor practical efficiency. The second builds on oblivious transfer and symmetric-key techniques. The only existing work in this category is proposed by Liu and Gao (ASIACRYPT 2023), which features the best concrete performance among all existing protocols, despite its super-linear computation and communication. Unfortunately, it does not achieve the standard semi-honest security, as it inherently relies on a non-collusion assumption, which is unlikely to hold in practice. Therefore, the problem of constructing a practical MPSU protocol based on oblivious transfer and symmetric-key techniques in standard semi-honest model remains open. Furthermore, there is no MPSU protocol achieving both linear computation and linear communication complexity, which leaves another unresolved problem. In this work, we resolve these two open problems. We propose the first MPSU protocol based on oblivious transfer and symmetric-key techniques in the standard semi-honest model. This protocol is $4.9-9.3 \times$ faster than Liu and Gao in the LAN setting. Concretely, our protocol requires only $3.6$ seconds in online phase for 3 parties with sets of $2^{20}$ items each. We propose the first MPSU protocol achieving both linear computation and linear communication complexity, based on public-key operations. This protocol has the lowest overall communication costs and shows a factor of $3.0-36.5\times$ improvement in terms of overall communication compared to Liu and Gao. △ Less

Submitted 1 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06754 [pdf, other]

Incremental Sliding Window Connectivity over Streaming Graphs

Authors: Chao Zhang, Angela Bonifati, M. Tamer Özsu

Abstract: We study index-based processing for connectivity queries within sliding windows on streaming graphs. These queries, which determine whether two vertices belong to the same connected component, are fundamental operations in real-time graph data processing and demand high throughput and low latency. While indexing methods that leverage data structures for fully dynamic connectivity can facilitate ef… ▽ More We study index-based processing for connectivity queries within sliding windows on streaming graphs. These queries, which determine whether two vertices belong to the same connected component, are fundamental operations in real-time graph data processing and demand high throughput and low latency. While indexing methods that leverage data structures for fully dynamic connectivity can facilitate efficient query processing, they encounter significant challenges with deleting expired edges from the window during window updates. We introduce a novel indexing approach that eliminates the need for physically performing edge deletions. This is achieved through a unique bidirectional incremental computation framework, referred to as the BIC model. The BIC model implements two distinct incremental computations to compute connected components within the window, operating along and against the timeline, respectively. These computations are then merged to efficiently compute queries in the window. We propose techniques for optimized index storage, incremental index updates, and efficient query processing to improve BIC effectiveness. Empirically, BIC achieves a 14$\times$ increase in throughput and a reduction in P95 latency by up to 3900$\times$ when compared to state-of-the-art indexes. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: To appear in VLDB 2024

arXiv:2406.06701 [pdf, other]

The XMM-SERVS X-ray eXtended Galaxy Cluster (XVXGC) catalog

Authors: Weiwei Xu, Linhua Jiang, Ran Li, Bin Luo, W. Nielsen Brandt, Chaoli Zhang, Thomas Erben

Abstract: To explain the well-known tension between cosmological parameter constraints obtained from the primary CMB and those drawn from galaxy cluster samples, we propose a possible explanation for the incompleteness of detected clusters are higher than estimated. We aim to search for galaxy groups and clusters with particularly extended surface brightness distributions by creating a new X-ray-selected ca… ▽ More To explain the well-known tension between cosmological parameter constraints obtained from the primary CMB and those drawn from galaxy cluster samples, we propose a possible explanation for the incompleteness of detected clusters are higher than estimated. We aim to search for galaxy groups and clusters with particularly extended surface brightness distributions by creating a new X-ray-selected catalog of extended galaxy clusters from the XMM-SERVS data, based on a dedicated source detection and characterization algorithm that is optimized for extended sources. Our state-of-the-art algorithm is composed of wavelet filtering, source detection, and characterization. We make a visual inspection of the optical image, and spatial distribution of galaxies within the same redshift layer to confirm the existence of clusters and estimate the cluster redshift with the spectroscopic and photometric redshifts of galaxies. The growth curve analysis is used to characterize the detections. We report a catalog of extended X-ray galaxy clusters detected from the XMM-SERVS data, named the XMM- SERVS X-ray eXtended Galaxy Cluster (XVXGC) catalog. It includes 141 cluster candidates. Specifically, there are 52 clusters previously identified as clusters with the intra-cluster medium (ICM) emission (class 3), 37 ones previously known as optical or infrared clusters but detected as X-ray clusters for the first time (class 2), and 52 identified as clusters for the first time (class 1). Compared with the class3 sample, the 'class1+2' sample is systematically fainter, and exhibits a flatter surface brightness profile. The median flux in [0.1-2.4]keV band for 'class1+2' and class3 sample is 2.336e-14 and 3.163e-14erg/s/cm2, respectively. The median slope of surface brightness profile are 0.502 and 0.577 for the 'class1+2' and class 3 samples, respectively. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 16pages, 11 figures, 5 tables, submit to A&A. This entire sample is available at https://github.com/wwxu/xvxgc.github.io together with the paper publication

arXiv:2406.06693 [pdf, other]

The measurement of the splashback radius of dark matter halo

Authors: Weiwei Xu, Huanyuan Shan, Ran Li, Ji Yao, Chunxiang Wang, Nan Li, Chaoli Zhang

Abstract: In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and re… ▽ More In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and redshift, and to determine the most important parameters to affect it. We use the typical definition of splashback radius (Rsp) as the radius with the steepest radial density profile. In this work, we measure Rsp of dark matter halos within the mass of 1e13-3e15Msun and redshifts spanning 0.08-0.65. This is the measurement of the Rsp in the largest range of halo mass and redshift. Using the shear catalog of the DECaLS DR8, we investigate Rsp of halos associated with galaxies and galaxy clusters identified in the various catalogs. Our finding reveals a trend wherein massive halos demonstrate a larger Rsp, and the normalized splashback radius (Rsp/R200m) shows a U-shaped mass evolution. The upturn in these relations mainly comes from the contribution of massive halos with low redshifts. We further find Rsp increases with the peak height, while Rsp/R200m has a negative relation with the peak height. We also find the Rsp >~R200m for most halos, indicating their low accretion rates. Our result is consistent with previous literature across a wide range of mass, redshift, and peak height, as well as the simulation work from More et al. (2015). △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures, submitted to ApJ

arXiv:2406.06420 [pdf, other]

An Improved Empirical Fisher Approximation for Natural Gradient Descent

Authors: Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

Abstract: Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementati… ▽ More Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper first investigates the inversely-scaled projection issue of EF, which is shown to be a major cause of the poor empirical approximation quality. An improved empirical Fisher (iEF) method, motivated as a generalised NGD method from a loss reduction perspective, is proposed to address this issue, meanwhile retaining the practical convenience of EF. The exact iEF and EF methods are experimentally evaluated using practical deep learning setups, including widely-used setups for parameter-efficient fine-tuning of pre-trained models (T5-base with LoRA and Prompt-Tuning on GLUE tasks, and ViT with LoRA for CIFAR100). Optimisation experiments show that applying exact iEF as an optimiser provides strong convergence and generalisation. It achieves the best test performance and the lowest training loss for majority of the tasks, even when compared with well-tuned AdamW/Adafactor baselines. Additionally, under a novel empirical evaluation framework, the proposed iEF method shows consistently better approximation quality to the exact Natural Gradient updates than both EF and the more expensive sampled Fisher (SF). Further investigation also shows that the superior approximation quality of iEF is robust to damping across tasks and training stages. Improving existing approximate NGD optimisers with iEF is expected to lead to better convergence ability and stronger robustness to choice of damping. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 33 pages, 11 figures, 7 tables

arXiv:2406.06230 [pdf, other]

UEMM-Air: A Synthetic Multi-modal Dataset for Unmanned Aerial Vehicle Object Detection

Authors: Fan Liu, Liang Yao, Shengxiang Xu, Chuanyi Zhang, Xinlei Zhang, Ting Wu

Abstract: The development of multi-modal object detection for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based object detection dataset, UEMM-Air. Specially, we simulate… ▽ More The development of multi-modal object detection for Unmanned Aerial Vehicles (UAVs) typically relies on a large amount of pixel-aligned multi-modal image data. However, existing datasets face challenges such as limited modalities, high construction costs, and imprecise annotations. To this end, we propose a synthetic multi-modal UAV-based object detection dataset, UEMM-Air. Specially, we simulate various UAV flight scenarios and object types using the Unreal Engine (UE). Then we design the UAV's flight logic to automatically collect data from different scenarios, perspectives, and altitudes. Finally, we propose a novel heuristic automatic annotation algorithm to generate accurate object detection labels. In total, our UEMM-Air consists of 20k pairs of images with 5 modalities and precise annotations. Moreover, we conduct numerous experiments and establish new benchmark results on our dataset. We found that models pre-trained on UEMM-Air exhibit better performance on downstream tasks compared to other similar datasets. The dataset is publicly available (https://github.com/1e12Leon/UEMM-Air) to support the research of multi-modal UAV object detection models. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06132 [pdf, other]

Instantaneous optical singularities and duality-protected dark directions

Authors: Chunchao Wen, Jianfa Zhang, Chaofan Zhang, Shiqiao Qin, Zhihong Zhu, Wei Liu

Abstract: Electromagnetic waves are described by not only polarization ellipses but also cyclically rotating vectors tracing out them. The corresponding fields are respectively directionless steady line fields and directional instantaneous vector fields. Here we study the seminal topic of electromagnetic scattering from the perspective of instantaneous vector fields and uncover how the global topology of th… ▽ More Electromagnetic waves are described by not only polarization ellipses but also cyclically rotating vectors tracing out them. The corresponding fields are respectively directionless steady line fields and directional instantaneous vector fields. Here we study the seminal topic of electromagnetic scattering from the perspective of instantaneous vector fields and uncover how the global topology of the momentum sphere regulates local distributions of tangent scattered fields. Structurally-stable generic singularities of vector fields move cyclically along lines of linear polarizations and at any instant their index sum has to be the Euler characteristic $χ=2$. This contrasts sharply with steady line fields, of which generic singularities constrained by the Euler characteristic locate on points of circular polarizations. From such unique perspective of instantaneous singularities, we discovered that for circularly-polarized waves scattered by electromagnetic duality-symmetric particles, since linearly-polarized scatterings are prohibited by helicity conservation, there must exist at least one dark direction along which the scattering is strictly zero. Two such dark directions can be tuned to overlap, along which the scattering would remain zero for arbitrary incident polarizations. We have essentially revealed that \textit{polarizations underdescribe vectorial electromagnetic waves and the instantaneous perspective is indispensable}. The complementarity we discover provides broader and deeper insights into not only electromagnetism, but also other branches of wave physics where singularities are generic and ubiquitous. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Wei Liu acknowledges many illuminating correspondences with Sir Michael Berry, whose monumental paper with J. F. Nye on phase singularities was published 50 years ago

arXiv:2406.06118 [pdf, other]

Strong and weak $CP$ tests in sequential decays of polarized $Σ^0$ hyperons

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

Abstract: The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The wea… ▽ More The $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ processes and subsequent decays are studied using the world's largest $J/ψ$ and $ψ(3686)$ data samples collected with the BESIII detector. The strong-$CP$ symmetry is tested in the decays of the $Σ^0$ hyperons for the first time by measuring the decay parameters, $α_{Σ^0} = -0.0017 \pm 0.0021 \pm 0.0018$ and $\barα_{Σ^0} = 0.0021 \pm 0.0020 \pm 0.0022$. The weak-$CP$ test is performed in the subsequent decays of their daughter particles $Λ$ and $\barΛ$. Also for the first time, the transverse polarizations of the $Σ^0$ hyperons in $J/ψ$ and $ψ(3686)$ decays are observed with opposite directions, and the ratios between the S-wave and D-wave contributions of the $J/ψ, ψ(3686) \to Σ^0 \barΣ^{0}$ decays are obtained. These results are crucial to understand the decay dynamics of the charmonium states and the production mechanism of the $Σ^0-\barΣ^0$ pairs. △ Less

Submitted 16 July, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06063 [pdf, other]

Enabling Large-Scale and High-Precision Fluid Simulations on Near-Term Quantum Computers

Authors: Zhao-Yun Chen, Teng-Yang Ma, Chuang-Chao Ye, Liang Xu, Ming-Yang Tan, Xi-Ning Zhuang, Xiao-Fan Xu, Yun-Jie Wang, Tai-Ping Sun, Yong Chen, Lei Du, Liang-Liang Guo, Hai-Feng Zhang, Hao-Ran Tao, Tian-Le Wang, Xiao-Yan Yang, Ze-An Zhao, Peng Wang, Sheng Zhang, Chi Zhang, Ren-Ze Zhao, Zhi-Long Jia, Wei-Cheng Kong, Meng-Han Dou, Jun-Chao Wang , et al. (7 additional authors not shown)

Abstract: Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement o… ▽ More Quantum computational fluid dynamics (QCFD) offers a promising alternative to classical computational fluid dynamics (CFD) by leveraging quantum algorithms for higher efficiency. This paper introduces a comprehensive QCFD method, including an iterative method "Iterative-QLS" that suppresses error in quantum linear solver, and a subspace method to scale the solution to a larger size. We implement our method on a superconducting quantum computer, demonstrating successful simulations of steady Poiseuille flow and unsteady acoustic wave propagation. The Poiseuille flow simulation achieved a relative error of less than $0.2\%$, and the unsteady acoustic wave simulation solved a 5043-dimensional matrix. We emphasize the utilization of the quantum-classical hybrid approach in applications of near-term quantum computers. By adapting to quantum hardware constraints and offering scalable solutions for large-scale CFD problems, our method paves the way for practical applications of near-term quantum computers in computational science. △ Less

Submitted 19 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 31 pages, 10 figures

arXiv:2406.06005 [pdf, other]

WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts

Authors: Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi

Abstract: Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still r… ▽ More Humanoid activities involving sequential contacts are crucial for complex robotic interactions and operations in the real world and are traditionally solved by model-based motion planning, which is time-consuming and often relies on simplified dynamics models. Although model-free reinforcement learning (RL) has become a powerful tool for versatile and robust whole-body humanoid control, it still requires tedious task-specific tuning and state machine design and suffers from long-horizon exploration issues in tasks involving contact sequences. In this work, we propose WoCoCo (Whole-Body Control with Sequential Contacts), a unified framework to learn whole-body humanoid control with sequential contacts by naturally decomposing the tasks into separate contact stages. Such decomposition facilitates simple and general policy learning pipelines through task-agnostic reward and sim-to-real designs, requiring only one or two task-related terms to be specified for each task. We demonstrated that end-to-end RL-based controllers trained with WoCoCo enable four challenging whole-body humanoid tasks involving diverse contact sequences in the real world without any motion priors: 1) versatile parkour jumping, 2) box loco-manipulation, 3) dynamic clap-and-tap dancing, and 4) cliffside climbing. We further show that WoCoCo is a general framework beyond humanoid by applying it in 22-DoF dinosaur robot loco-manipulation tasks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Website and Videos: https://lecar-lab.github.io/wococo/

arXiv:2406.05954 [pdf, other]

Aligning Large Language Models with Representation Editing: A Control Perspective

Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: fix typos

arXiv:2406.05862 [pdf, other]

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74.8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https://huggingface.co/datasets/m-a-p/II-Bench. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: 100 pages, 82 figures, add citations

arXiv:2406.05827 [pdf, ps, other]

Measurement of the integrated luminosity of the data collected at 3.773 GeV by BESIII from 2021 to 2024

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$,… ▽ More We present a measurement of the integrated luminosity of $e^+e^-$ collision data collected with the BESIII detector at the BEPCII collider at a center-of-mass energy of $E_{\rm cm} = 3.773$~GeV. The integrated luminosities of the data sets taken from December 2021 to June 2022, from November 2022 to June 2023, and from October 2023 to February 2024 are determined to be $4.995 \pm 0.019$~fb$^{-1}$, $8.157 \pm 0.031$~fb$^{-1}$, and $4.191 \pm 0.016$~fb$^{-1}$, respectively, by analyzing large angle Bhabha scattering events. The uncertainties are dominated by systematic effects and the statistical uncertainties are negligible. Our results provide essential input for future analyses and precision measurements. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05618 [pdf, other]

Photon induced proton and anti-proton pair production with ultraperipheral heavy ion collisions at RHIC

Authors: Cheng Zhang, Li-Mao Zhang, Ding Yu Shao

Abstract: We investigate proton-antiproton ($p\bar{p}$) pair production via photon-photon fusion in the ultra-peripheral collisions at RHIC, employing a joint impact parameter and transverse momentum dependent formalism. We consider proton exchange, $s$-channel resonance and hand-bag mechanisms, predicting differential distributions of $p\bar p$ production. Our theoretical predictions can be tested against… ▽ More We investigate proton-antiproton ($p\bar{p}$) pair production via photon-photon fusion in the ultra-peripheral collisions at RHIC, employing a joint impact parameter and transverse momentum dependent formalism. We consider proton exchange, $s$-channel resonance and hand-bag mechanisms, predicting differential distributions of $p\bar p$ production. Our theoretical predictions can be tested against future measurements at RHIC, to enhance our understanding of photon-photon interactions in strong electromagnetic fields. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

Showing 151–200 of 7,366 results for author: zhang, C