subscribe to arXiv mailings

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures

arXiv:2407.11556 [pdf, other]

LITS: An Optimized Learned Index for Strings (An Extended Version)

Authors: Yifan Yang, Shimin Chen

Abstract: Index is an important component in database systems. Learned indexes have been shown to outperform traditional tree-based index structures for fixed-sized integer or floating point keys. However, the application of the learned solution to variable-length string keys is under-researched. Our experiments show that existing learned indexes for strings fail to outperform traditional string indexes, su… ▽ More Index is an important component in database systems. Learned indexes have been shown to outperform traditional tree-based index structures for fixed-sized integer or floating point keys. However, the application of the learned solution to variable-length string keys is under-researched. Our experiments show that existing learned indexes for strings fail to outperform traditional string indexes, such as HOT and ART. String keys are long and variable sized, and often contain skewed prefixes, which make the last-mile search expensive, and adversely impact the capability of learned models to capture the skewed distribution of string keys. In this paper, we propose a novel learned index for string keys, LITS (Learned Index with Hash-enhanced Prefix Table and Sub-tries). We propose an optimized learned model, combining a global Hash-enhanced Prefix Table (HPT) and a per-node local linear model to better distinguish string keys. Moreover, LITS exploits compact leaf nodes and hybrid structures with a PMSS model for efficient point and range operations. Our experimental results using eleven string data sets show that LITS achieves up to 2.43x and 2.27x improvement over HOT and ART for point operations, and attains comparable scan performance. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11474 [pdf, other]

Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi… ▽ More A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-005.html (LHCb public pages)

Report number: LHCb-PAPER-2024-005, CERN-EP-2024-158

arXiv:2407.11420 [pdf, other]

iKalibr: Unified Targetless Spatiotemporal Calibration for Resilient Integrated Inertial Systems

Authors: Shuolong Chen, Xingxing Li, Shengyu Li, Yuxuan Zhou, Xiaoteng Yang

Abstract: The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, whic… ▽ More The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, which benefits the system performance regarding failure tolerance, perception capability, and environment compatibility. For such systems, accurate and consistent spatiotemporal calibration is required to maintain a unique spatiotemporal framework for multi-sensor fusion. Considering most existing calibration methods (i) are generally oriented to specific integrated inertial systems, (ii) often only focus on spatial determination, (iii) usually require artificial targets, lacking convenience and usability, we propose iKalibr: a unified targetless spatiotemporal calibration framework for resilient integrated inertial systems, which overcomes the above issues, and enables both accurate and consistent calibration. Altogether four commonly employed sensors are supported in iKalibr currently, namely IMU, radar, LiDAR, and camera. The proposed method starts with a rigorous and efficient dynamic initialization, where all parameters in the estimator would be accurately recovered. Following that, several continuous-time-based batch optimizations would be carried out to refine initialized parameters to global optimal ones. Sufficient real-world experiments were conducted to verify the feasibility and evaluate the calibration performance of iKalibr. The results demonstrate that iKalibr can achieve accurate resilient spatiotemporal calibration. We open-source our implementations at (https://github.com/Unsigned-Long/iKalibr) to benefit the research community. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10691 [pdf, other]

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Authors: Fengyu Cai, Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Iryna Gurevych, Heinz Koeppl

Abstract: Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, thi… ▽ More Recent studies show the growing significance of document retrieval in the generation of LLMs, i.e., RAG, within the scientific domain by bridging their knowledge gap. However, dense retrievers often struggle with domain-specific retrieval and complex query-document relationships, particularly when query segments correspond to various parts of a document. To alleviate such prevalent challenges, this paper introduces $\texttt{MixGR}$, which improves dense retrievers' awareness of query-document matching across various levels of granularity in queries and documents using a zero-shot approach. $\texttt{MixGR}$ fuses various metrics based on these granularities to a united score that reflects a comprehensive query-document similarity. Our experiments demonstrate that $\texttt{MixGR}$ outperforms previous document retrieval by 24.7% and 9.8% on nDCG@5 with unsupervised and supervised retrievers, respectively, averaged on queries containing multiple subqueries from five scientific retrieval datasets. Moreover, the efficacy of two downstream scientific question-answering tasks highlights the advantage of $\texttt{MixGR}$to boost the application of LLMs in the scientific domain. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10627 [pdf, other]

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

Abstract: Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate thes… ▽ More Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate these arena battles using AI-driven annotations to evaluate battle outcomes, thus facilitating the continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. Arena Learning comprises two key elements. First, it ensures precise evaluations and maintains consistency between offline simulations and online competitions via WizardArena, a pipeline developed to accurately predict the Elo rankings of various models using a meticulously designed offline test set. Our results demonstrate that WizardArena's predictions closely align with those from the online Arena. Second, it involves the continuous improvement of training data based on the battle results and the refined model. We establish a data flywheel to iteratively update the training data by highlighting the weaknesses of the target model based on its battle results, enabling it to learn from the strengths of multiple different models. We apply Arena Learning to train our target model, WizardLM-$β$, and demonstrate significant performance enhancements across various metrics. This fully automated training and evaluation pipeline sets the stage for continuous advancements in various LLMs via post-training. Notably, Arena Learning plays a pivotal role in the success of WizardLM-2, and this paper serves both as an exploration of its efficacy and a foundational study for future discussions related to WizardLM-2 and its derivatives. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10379 [pdf]

doi 10.1038/s41586-024-07076-x

Room temperature operation of germanium-silicon single-photon avalanche diode

Authors: Neil Na, Yen-Cheng Lu, Yu-Hsuan Liu, Po-Wei Chen, Ying-Chen Lai, You-Ru Lin, Chung-Chih Lin, Tim Shia, Chih-Hao Cheng, Shu-Lu Chen

Abstract: The ability to detect single photons has led to the advancement of numerous research fields. Although various types of single-photon detector have been developed, because of two main factors - that is, (1) the need for operating at cryogenic temperature and (2) the incompatibility with complementary metal-oxide-semiconductor (CMOS) fabrication processes - so far, to our knowledge, only Si-based si… ▽ More The ability to detect single photons has led to the advancement of numerous research fields. Although various types of single-photon detector have been developed, because of two main factors - that is, (1) the need for operating at cryogenic temperature and (2) the incompatibility with complementary metal-oxide-semiconductor (CMOS) fabrication processes - so far, to our knowledge, only Si-based single-photon avalanche diode (SPAD) has gained mainstream success and has been used in consumer electronics. With the growing demand to shift the operation wavelength from near-infrared to short-wavelength infrared (SWIR) for better safety and performance, an alternative solution is required because Si has negligible optical absorption for wavelengths beyond 1 μm. Here we report a CMOS-compatible, high-performing germanium-silicon SPAD operated at room temperature, featuring a noise-equivalent power improvement over the previous Ge-based SPADs by 2-3.5 orders of magnitude. Key parameters such as dark count rate, single-photon detection probability at 1,310 nm, timing jitter, after-pulsing characteristic time and after-pulsing probability are, respectively, measured as 19 kHz μm^2, 12%, 188 ps, ~90 ns and <1%, with a low breakdown voltage of 10.26 V and a small excess bias of 0.75 V. Three-dimensional point-cloud images are captured with direct time-of-flight technique as proof of concept. This work paves the way towards using single-photon-sensitive SWIR sensors, imagers and photonic integrated circuits in everyday life. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: original manuscript

Journal ref: Nature 627, 295 (2024)

arXiv:2407.10062 [pdf, other]

SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion

Authors: Jiyuan Zhang, Kang Chen, Shiyan Chen, Yajing Zheng, Tiejun Huang, Zhaofei Yu

Abstract: Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-wor… ▽ More Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-world objects or scenes in various fields, including Virtual Reality or embodied AI. Spike cameras, a novel type of neuromorphic sensor, continuously record scenes with an ultra-high temporal resolution, showing potential for accurate 3D reconstruction. Despite their promise, existing approaches, such as applying Neural Radiance Fields (NeRF) to spike cameras, encounter challenges due to the time-consuming rendering process. To address this issue, we make the first attempt to introduce the 3D Gaussian Splatting (3DGS) into spike cameras in high-speed capture, providing 3DGS as dense and continuous clues of views, then constructing SpikeGS. Specifically, to train SpikeGS, we establish computational equations between the rendering process of 3DGS and the processes of instantaneous imaging and exposing-like imaging of the continuous spike stream. Besides, we build a very lightweight but effective mapping process from spikes to instant images to support training. Furthermore, we introduced a new spike-based 3D rendering dataset for validation. Extensive experiments have demonstrated our method possesses the high quality of novel view rendering, proving the tremendous potential of spike cameras in modeling 3D scenes. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09802 [pdf, ps, other]

Chaos, entanglement and Husimi Q function in quantum Rabi model

Authors: Shangyun Wang, Songbai Chen, Jiliang Jing

Abstract: As one of the famous effects in quantum Rabi model (QRM), Rabi oscillation may lead to the occurrence of quantum dynamics behaviors without classical dynamic counterparts, such as quantum collapse and revival effects. In this paper, we focus on studying whether the entanglement entropy and Husimi Q function, as diagnostic tools for quantum chaos in quantum systems, are invalidated by quantum colla… ▽ More As one of the famous effects in quantum Rabi model (QRM), Rabi oscillation may lead to the occurrence of quantum dynamics behaviors without classical dynamic counterparts, such as quantum collapse and revival effects. In this paper, we focus on studying whether the entanglement entropy and Husimi Q function, as diagnostic tools for quantum chaos in quantum systems, are invalidated by quantum collapse and revival. It is shown that the saturation values of entanglement entropy for initial states located in the chaotic sea of QRM are higher than that in the regular regions. When the system reaches dynamic equilibrium, the Husimi Q function which initial states located in the chaotic sea are more dispersed than that in the regular regions. Moreover, we observe a good correspondence between the the time-average entanglement entropy and classical phase space structures. Our results imply that entanglement entropy and Husimi Q function maintain the function for diagnosing chaos in the QRM and the corresponding principle does not be invalidated by quantum collapse and revival effects in this system. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 6 pages, 5 figures

arXiv:2407.09370 [pdf, other]

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Authors: Chuanhao Sun, Zhihang Yuan, Kai Xu, Luo Mai, Siddharth N, Shuo Chen, Mahesh K. Marina

Abstract: Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. Despite their effectiveness, existing PEs require manual, empirical adjustment of crucial hyperparameters, specifically the Fourier features, tailored t… ▽ More Fourier features based positional encoding (PE) is commonly used in machine learning tasks that involve learning high-frequency features from low-dimensional inputs, such as 3D view synthesis and time series regression with neural tangent kernels. Despite their effectiveness, existing PEs require manual, empirical adjustment of crucial hyperparameters, specifically the Fourier features, tailored to each unique task. Further, PEs face challenges in efficiently learning high-frequency functions, particularly in tasks with limited data. In this paper, we introduce sinusoidal PE (SPE), designed to efficiently learn adaptive frequency features closely aligned with the true underlying function. Our experiments demonstrate that SPE, without hyperparameter tuning, consistently achieves enhanced fidelity and faster training across various tasks, including 3D view synthesis, Text-to-Speech generation, and 1D regression. SPE is implemented as a direct replacement for existing PEs. Its plug-and-play nature lets numerous tasks easily adopt and benefit from SPE. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 16 pages, Conference, Accepted by ICML 2024

arXiv:2407.08582 [pdf, other]

On the Universal Truthfulness Hyperplane Inside LLMs

Authors: Junteng Liu, Shiqi Chen, Yu Cheng, Junxian He

Abstract: While large language models (LLMs) have demonstrated remarkable abilities across various fields, hallucination remains a significant challenge. Recent studies have explored hallucinations through the lens of internal representations, proposing mechanisms to decipher LLMs' adherence to facts. However, these approaches often fail to generalize to out-of-distribution data, leading to concerns about w… ▽ More While large language models (LLMs) have demonstrated remarkable abilities across various fields, hallucination remains a significant challenge. Recent studies have explored hallucinations through the lens of internal representations, proposing mechanisms to decipher LLMs' adherence to facts. However, these approaches often fail to generalize to out-of-distribution data, leading to concerns about whether internal representation patterns reflect fundamental factual awareness, or only overfit spurious correlations on the specific datasets. In this work, we investigate whether a universal truthfulness hyperplane that distinguishes the model's factually correct and incorrect outputs exists within the model. To this end, we scale up the number of training datasets and conduct an extensive evaluation -- we train the truthfulness hyperplane on a diverse collection of over 40 datasets and examine its cross-task, cross-domain, and in-domain generalization. Our results indicate that increasing the diversity of the training datasets significantly enhances the performance in all scenarios, while the volume of data samples plays a less critical role. This finding supports the optimistic hypothesis that a universal truthfulness hyperplane may indeed exist within the model, offering promising directions for future research. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08551 [pdf, other]

Autoregressive Speech Synthesis without Vector Quantization

Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross-entropy loss, we apply regression loss with a proposed spectrogram flux loss function to model the probability distribution of the continuous-valued tokens. (ii) we have incorporated variational inference into MELLE to facilitate sampling mechanisms, thereby enhancing the output diversity and model robustness. Experiments demonstrate that, compared to the two-stage codec language models VALL-E and its variants, the single-stage MELLE mitigates robustness issues by avoiding the inherent flaws of sampling discrete codes, achieves superior performance across multiple metrics, and, most importantly, offers a more streamlined paradigm. See https://aka.ms/melle for demos of our work. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08398 [pdf, other]

Delocalization of skin steady states

Authors: Xu Feng, Shu Chen

Abstract: The skin effect, characterized by the tendency of particles to accumulate at the boundaries, has been extensively studied in non-Hermitian systems. In this work, we propose an intuitive Lindbladian composed of two chains with reversed skin localization. The skin steady state is gradually delocalized as the interchain coupling increases. In the single-body scenario, it corresponds to a shift in the… ▽ More The skin effect, characterized by the tendency of particles to accumulate at the boundaries, has been extensively studied in non-Hermitian systems. In this work, we propose an intuitive Lindbladian composed of two chains with reversed skin localization. The skin steady state is gradually delocalized as the interchain coupling increases. In the single-body scenario, it corresponds to a shift in the scaling of the Liouvillian gap $Δ$ from $Δ\propto N^0$ to $Δ\propto N^{-2}$. Notably, exact diagonalization results reveal a system-size sensitivity of the single-particle Liouvillian spectrum, inherited from the non-Hermitian effective Hamiltonian's system-size sensitivity. We predict that even an arbitrarily small coupling will induce dramatic changes in the Liouvillian spectrum and steady state in the thermodynamic limit, a phenomenon we term the critical Liouvillian skin effect. Additionally, in the many-body scenario, by employing the stochastic Schrödinger equation to unravel the Lindblad master equation, it is revealed that the scaling behavior of steady-state entanglement changes from the area law to the logarithmic law. This work demonstrates the delocalization of both single-body and many-body skin steady states, introducing a novel mechanism for inducing entanglement transitions beyond the quantum Zeno effect. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 10 pages,7 figures

arXiv:2407.08148 [pdf, other]

SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Authors: Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

Abstract: We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe… ▽ More We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.08134 [pdf, ps, other]

Highway Networks for Improved Surface Reconstruction: The Role of Residuals and Weight Updates

Authors: A. Noorizadegan, Y. C. Hon, D. L. Young, C. S. Chen

Abstract: Surface reconstruction from point clouds is a fundamental challenge in computer graphics and medical imaging. In this paper, we explore the application of advanced neural network architectures for the accurate and efficient reconstruction of surfaces from data points. We introduce a novel variant of the Highway network (Hw) called Square-Highway (SqrHw) within the context of multilayer perceptrons… ▽ More Surface reconstruction from point clouds is a fundamental challenge in computer graphics and medical imaging. In this paper, we explore the application of advanced neural network architectures for the accurate and efficient reconstruction of surfaces from data points. We introduce a novel variant of the Highway network (Hw) called Square-Highway (SqrHw) within the context of multilayer perceptrons and investigate its performance alongside plain neural networks and a simplified Hw in various numerical examples. These examples include the reconstruction of simple and complex surfaces, such as spheres, human hands, and intricate models like the Stanford Bunny. We analyze the impact of factors such as the number of hidden layers, interior and exterior points, and data distribution on surface reconstruction quality. Our results show that the proposed SqrHw architecture outperforms other neural network configurations, achieving faster convergence and higher-quality surface reconstructions. Additionally, we demonstrate the SqrHw's ability to predict surfaces over missing data, a valuable feature for challenging applications like medical imaging. Furthermore, our study delves into further details, demonstrating that the proposed method based on highway networks yields more stable weight norms and backpropagation gradients compared to the Plain Network architecture. This research not only advances the field of computer graphics but also holds utility for other purposes such as function interpolation and physics-informed neural networks, which integrate multilayer perceptrons into their algorithms. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07841 [pdf, other]

Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective

Authors: Shengjia Chen, Gabriele Campanella, Abdulkadir Elmas, Aryeh Stock, Jennifer Zeng, Alexandros D. Polydorides, Adam J. Schoenfeld, Kuan-lin Huang, Jane Houldsworth, Chad Vanderbilt, Thomas J. Fuchs

Abstract: Recent advances in artificial intelligence (AI), in particular self-supervised learning of foundation models (FMs), are revolutionizing medical imaging and computational pathology (CPath). A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation. Due to the prevalent use… ▽ More Recent advances in artificial intelligence (AI), in particular self-supervised learning of foundation models (FMs), are revolutionizing medical imaging and computational pathology (CPath). A constant challenge in the analysis of digital Whole Slide Images (WSIs) is the problem of aggregating tens of thousands of tile-level image embeddings to a slide-level representation. Due to the prevalent use of datasets created for genomic research, such as TCGA, for method development, the performance of these techniques on diagnostic slides from clinical practice has been inadequately explored. This study conducts a thorough benchmarking analysis of ten slide-level aggregation techniques across nine clinically relevant tasks, including diagnostic assessment, biomarker classification, and outcome prediction. The results yield following key insights: (1) Embeddings derived from domain-specific (histological images) FMs outperform those from generic ImageNet-based models across aggregation methods. (2) Spatial-aware aggregators enhance the performance significantly when using ImageNet pre-trained models but not when using FMs. (3) No single model excels in all tasks and spatially-aware models do not show general superiority as it would be expected. These findings underscore the need for more adaptable and universally applicable aggregation techniques, guiding future research towards tools that better meet the evolving needs of clinical-AI in pathology. The code used in this work is available at \url{https://github.com/fuchs-lab-public/CPath_SABenchmark}. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 10 pages, 2 figures

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07375 [pdf, ps, other]

Stable Weight Updating: A Key to Reliable PDE Solutions Using Deep Learning

Authors: A. Noorizadegan, R. Cavoretto, D. L. Young, C. S. Chen

Abstract: Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architecture… ▽ More Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architectures, namely the Simple Highway Network and the Squared Residual Network, designed to enhance stability and accuracy in physics-informed neural networks (PINNs). These architectures augment traditional neural networks by incorporating residual connections, which facilitate smoother weight updates and improve backpropagation efficiency. Results: Through extensive numerical experiments across various examples including linear and nonlinear, time-dependent and independent PDEs we demonstrate the efficacy of the proposed architectures. The Squared Residual Network, in particular, exhibits robust performance, achieving enhanced stability and accuracy compared to conventional neural networks. These findings underscore the potential of residual-based architectures in advancing deep learning for PDEs and computational physics applications. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07339 [pdf, other]

TDML -- A Trustworthy Distributed Machine Learning Framework

Authors: Zhen Wang, Qin Wang, Guangsheng Yu, Shiping Chen

Abstract: Recent years have witnessed a surge in deep learning research, marked by the introduction of expansive generative models like OpenAI's SORA and GPT, Meta AI's LLAMA series, and Google's FLAN, BART, and Gemini models. However, the rapid advancement of large models (LM) has intensified the demand for computing resources, particularly GPUs, which are crucial for their parallel processing capabilities… ▽ More Recent years have witnessed a surge in deep learning research, marked by the introduction of expansive generative models like OpenAI's SORA and GPT, Meta AI's LLAMA series, and Google's FLAN, BART, and Gemini models. However, the rapid advancement of large models (LM) has intensified the demand for computing resources, particularly GPUs, which are crucial for their parallel processing capabilities. This demand is exacerbated by limited GPU availability due to supply chain delays and monopolistic acquisition by major tech firms. Distributed Machine Learning (DML) methods, such as Federated Learning (FL), mitigate these challenges by partitioning data and models across multiple servers, though implementing optimizations like tensor and pipeline parallelism remains complex. Blockchain technology emerges as a promising solution, ensuring data integrity, scalability, and trust in distributed computing environments, but still lacks guidance on building practical DML systems. In this paper, we propose a \textit{trustworthy distributed machine learning} (TDML) framework that leverages blockchain to coordinate remote trainers and validate workloads, achieving privacy, transparency, and efficient model training across public remote computing resources. Experimental validation demonstrates TDML's efficacy in overcoming performance limitations and malicious node detection, positioning it as a robust solution for scalable and secure distributed machine learning. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06508 [pdf, other]

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

Authors: Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan, Jane Houldsworth, Adam J. Schoenfeld, Chad Vanderbilt

Abstract: The use of self-supervised learning (SSL) to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With… ▽ More The use of self-supervised learning (SSL) to train pathology foundation models has increased substantially in the past few years. Notably, several models trained on large quantities of clinical data have been made publicly available in recent months. This will significantly enhance scientific research in computational pathology and help bridge the gap between research and clinical deployment. With the increase in availability of public foundation models of different sizes, trained using different algorithms on different datasets, it becomes important to establish a benchmark to compare the performance of such models on a variety of clinically relevant tasks spanning multiple organs and diseases. In this work, we present a collection of pathology datasets comprising clinical slides associated with clinically relevant endpoints including cancer diagnoses and a variety of biomarkers generated during standard hospital operation from two medical centers. We leverage these datasets to systematically assess the performance of public pathology foundation models and provide insights into best practices for training new foundation models and selecting appropriate pretrained models. △ Less

Submitted 11 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.07033

arXiv:2407.06152 [pdf, other]

Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design

Authors: Boshen Zeng, Sian Chen, Xinxin Liu, Changhong Chen, Bin Deng, Xiaoxu Wang, Zhifeng Gao, Yuzhi Zhang, Weinan E, Linfeng Zhang

Abstract: Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level represen… ▽ More Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level representation learning framework to advance electrolyte design. Our approach involves two-stage pretraining: reconstructing three-dimensional molecular structures at the molecular level using the Uni-Mol model, and predicting statistical structural properties (e.g., radial distribution functions) from molecular dynamics simulations at the mixture level. Through this comprehensive pretraining, Uni-ELF is able to capture intricate molecular and mixture-level information, which significantly enhances its predictive capability. As a result, Uni-ELF substantially outperforms state-of-the-art methods in predicting both molecular properties (e.g., melting point, boiling point, synthesizability) and formulation properties (e.g., conductivity, Coulombic efficiency). Moreover, Uni-ELF can be seamlessly integrated into an automatic experimental design workflow. We believe this innovative framework will pave the way for automated AI-based electrolyte design and engineering. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06025 [pdf, other]

iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement

Authors: Aoyu Pang, Maonan Wang, Man-On Pun, Chung Shue Chen, Xi Xiong

Abstract: Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noi… ▽ More Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noise, as well as rare real-life events not included in the reward function, such as unconsidered emergency vehicles. To address these limitations, we introduce a novel integration framework that combines a large language model (LLM) with RL. This framework is designed to manage overlooked elements in the reward function and gaps in state information, thereby enhancing the policies of RL agents. In our approach, RL initially makes decisions based on observed data. Subsequently, LLMs evaluate these decisions to verify their reasonableness. If a decision is found to be unreasonable, it is adjusted accordingly. Additionally, this integration approach can be seamlessly integrated with existing RL-based TSC systems without necessitating modifications. Extensive testing confirms that our approach reduces the average waiting time by $17.5\%$ in degraded communication conditions as compared to traditional RL methods, underscoring its potential to advance practical RL applications in intelligent transportation systems. The related code can be found at \url{https://github.com/Traffic-Alpha/iLLM-TSC}. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05546 [pdf, other]

AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling

Authors: Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Misha Sra, Pradeep Sen

Abstract: We propose Image Content Appeal Assessment (ICAA), a novel metric that quantifies the level of positive interest an image's content generates for viewers, such as the appeal of food in a photograph. This is fundamentally different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality. While previous studies often confuse the concepts of ``aesthetics'' and ``a… ▽ More We propose Image Content Appeal Assessment (ICAA), a novel metric that quantifies the level of positive interest an image's content generates for viewers, such as the appeal of food in a photograph. This is fundamentally different from traditional Image-Aesthetics Assessment (IAA), which judges an image's artistic quality. While previous studies often confuse the concepts of ``aesthetics'' and ``appeal,'' our work addresses this by being the first to study ICAA explicitly. To do this, we propose a novel system that automates dataset creation and implements algorithms to estimate and boost content appeal. We use our pipeline to generate two large-scale datasets (70K+ images each) in diverse domains (food and room interior design) to train our models, which revealed little correlation between content appeal and aesthetics. Our user study, with more than 76% of participants preferring the appeal-enhanced images, confirms that our appeal ratings accurately reflect user preferences, establishing ICAA as a unique evaluative criterion. Our code and datasets are available at https://github.com/SherryXTChen/AID-Appeal. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: European Conference on Computer Vision

arXiv:2407.05268 [pdf, other]

Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients

Authors: Shaoyuan Chen, Linlin You, Rui Liu, Shuo Yu, Ahmed M. Abdelmoniem

Abstract: The training of large models, involving fine-tuning, faces the scarcity of high-quality data. Compared to the solutions based on centralized data centers, updating large models in the Internet of Things (IoT) faces challenges in coordinating knowledge from distributed clients by using their private and heterogeneous data. To tackle such a challenge, we propose KOALA (Federated Knowledge Transfer F… ▽ More The training of large models, involving fine-tuning, faces the scarcity of high-quality data. Compared to the solutions based on centralized data centers, updating large models in the Internet of Things (IoT) faces challenges in coordinating knowledge from distributed clients by using their private and heterogeneous data. To tackle such a challenge, we propose KOALA (Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients) to impel the training of large models in IoT. Since the resources obtained by IoT clients are limited and restricted, it is infeasible to locally execute large models and also update them in a privacy-preserving manner. Therefore, we leverage federated learning and knowledge distillation to update large models through collaboration with their small models, which can run locally at IoT clients to process their private data separately and enable large-small model knowledge transfer through iterative learning between the server and clients. Moreover, to support clients with similar or different computing capacities, KOALA is designed with two kinds of large-small model joint learning modes, namely to be homogeneous or heterogeneous. Experimental results demonstrate that compared to the conventional approach, our method can not only achieve similar training performance but also significantly reduce the need for local storage and computing power resources. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05200 [pdf, other]

First Results from the Dragonfly Ultrawide Survey: the Largest Eleven Quenched Diffuse Dwarf Galaxies in 3100 deg$^2$ with Spectroscopic Confirmation

Authors: Zili Shen, William P. Bowman, Pieter van Dokkum, Roberto G. Abraham, Imad Pasha, Michael A. Keim, Qing Liu, Deborah M. Lokhorst, Steven R. Janssens, Seery Chen

Abstract: The Dragonfly Telephoto Array employs a unique design to detect very large and diffuse galaxies, which might be missed with conventional telescopes. The Dragonfly Ultrawide Survey (DFUWS) is a new wide-field survey which will cover 10,000 deg$^2$ of the northern sky, and it provides an ideal dataset to find these large diffuse galaxies. From 3100 deg$^2$ of DFUWS data, we identified eleven large,… ▽ More The Dragonfly Telephoto Array employs a unique design to detect very large and diffuse galaxies, which might be missed with conventional telescopes. The Dragonfly Ultrawide Survey (DFUWS) is a new wide-field survey which will cover 10,000 deg$^2$ of the northern sky, and it provides an ideal dataset to find these large diffuse galaxies. From 3100 deg$^2$ of DFUWS data, we identified eleven large, low surface brightness galaxies as a pilot sample for spectroscopic follow-up. These are the largest galaxies in the examined area that appear smooth and isolated, with effective radii of 12"-27". Eight are below 24 $\mathrm{mag\,arcsec^{-2}}$ in central $g$-band surface brightness. Keck Cosmic Web Imager (KCWI) spectra of the diffuse light show that all eleven galaxies in this sample are quiescent, and seven qualify as ultra-diffuse galaxies (UDGs). Eight galaxies have distances between 15 and 30 Mpc, while the other three are in the Pegasus cluster at 50 Mpc. Their spectra show evidence of a $\sim 1$Gyr old stellar population in addition to an even older stellar population. The intermediate-age component is present in group and satellite galaxies but not in the Pegasus cluster UDGs. All galaxies in this sample are detected in both Dragonfly and Legacy imaging, and the sample partially overlaps with existing UDG catalogs. This pilot sample provides an excellent training set for our analysis of the upcoming full 10,000 deg$^2$ DFUWS data, from which we may expect to discover even larger, previously-unknown galaxies. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Submitted to ApJ

arXiv:2407.04795 [pdf, other]

Not all lensing is low: An analysis of DESI$\times$DES using the Lagrangian Effective Theory of LSS

Authors: S. Chen, J. DeRose, R. Zhou, M. White, S. Ferraro, C. Blake, J. U. Lange, R. H. Wechsler, J. Aguilar, S. Ahlen, D. Brooks, T. Claybaugh, K. Dawson, A. de la Macorra, P. Doel, A. Font-Ribera, E. Gaztañaga, S. Gontcho A Gontcho, G. Gutierrez, K. Honscheid, C. Howlett, R. Kehoe, D. Kirkby, T. Kisner, A. Kremin , et al. (17 additional authors not shown)

Abstract: In this work we use Lagrangian perturbation theory to analyze the harmonic space galaxy clustering signal of Bright Galaxy Survey (BGS) and Luminous Red Galaxies (LRGs) targeted by the Dark Energy Spectroscopic Instrument (DESI), combined with the galaxy--galaxy lensing signal measured around these galaxies using Dark Energy Survey Year 3 source galaxies. The BGS and LRG galaxies are extremely wel… ▽ More In this work we use Lagrangian perturbation theory to analyze the harmonic space galaxy clustering signal of Bright Galaxy Survey (BGS) and Luminous Red Galaxies (LRGs) targeted by the Dark Energy Spectroscopic Instrument (DESI), combined with the galaxy--galaxy lensing signal measured around these galaxies using Dark Energy Survey Year 3 source galaxies. The BGS and LRG galaxies are extremely well characterized by DESI spectroscopy and, as a result, lens galaxy redshift uncertainty and photometric systematics contribute negligibly to the error budget of our ``$2\times2$-point'' analysis. On the modeling side, this work represents the first application of the \texttt{spinosaurus} code, implementing an effective field theory model for galaxy intrinsic alignments, and we additionally introduce a new scheme (\texttt{MAIAR}) for marginalizing over the large uncertainties in the redshift evolution of the intrinsic alignment signal. Furthermore, this is the first application of a hybrid effective field theory (HEFT) model for galaxy bias based on the $\texttt{Aemulus}\, ν$ simulations. Our main result is a measurement of the amplitude of the lensing signal, $S_8=σ_8 \left(Ω_m/0.3\right)^{0.5} = 0.850^{+0.042}_{-0.050}$, consistent with values of this parameter derived from the primary CMB. This constraint is artificially improved by a factor of $51\%$ if we assume a more standard, but restrictive parameterization for the redshift evolution and sample dependence of the intrinsic alignment signal, and $63\%$ if we additionally assume the nonlinear alignment model. We show that when fixing the cosmological model to the best-fit values from Planck PR4 there is $> 5 σ$ evidence for a deviation of the evolution of the intrinsic alignment signal from the functional form that is usually assumed in cosmic shear and galaxy--galaxy lensing studies. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 52 pages, 24 figures, to be submitted to PRD

arXiv:2407.04607 [pdf, other]

Cosmological constraints from the cross-correlation of DESI Luminous Red Galaxies with CMB lensing from Planck PR4 and ACT DR6

Authors: Noah Sailer, Joshua Kim, Simone Ferraro, Mathew S. Madhavacheril, Martin White, Irene Abril-Cabezas, Jessica Nicole Aguilar, Steven Ahlen, J. Richard Bond, David Brooks, Etienne Burtin, Erminia Calabrese, Shi-Fan Chen, Steve K. Choi, Todd Claybaugh, Kyle Dawson, Axel de la Macorra, Joseph DeRose, Arjun Dey, Biprateep Dey, Peter Doel, Jo Dunkley, Carmen Embil-Villagra, Gerrit S. Farren, Andreu Font-Ribera , et al. (41 additional authors not shown)

Abstract: We infer the growth of large scale structure over the redshift range $0.4\lesssim z \lesssim 1$ from the cross-correlation of spectroscopically calibrated Luminous Red Galaxies (LRGs) selected from the Dark Energy Spectroscopic Instrument (DESI) legacy imaging survey with CMB lensing maps reconstructed from the latest Planck and ACT data. We adopt a hybrid effective field theory (HEFT) model that… ▽ More We infer the growth of large scale structure over the redshift range $0.4\lesssim z \lesssim 1$ from the cross-correlation of spectroscopically calibrated Luminous Red Galaxies (LRGs) selected from the Dark Energy Spectroscopic Instrument (DESI) legacy imaging survey with CMB lensing maps reconstructed from the latest Planck and ACT data. We adopt a hybrid effective field theory (HEFT) model that robustly regulates the cosmological information obtainable from smaller scales, such that our cosmological constraints are reliably derived from the (predominantly) linear regime. We perform an extensive set of bandpower- and parameter-level systematics checks to ensure the robustness of our results and to characterize the uniformity of the LRG sample. We demonstrate that our results are stable to a wide range of modeling assumptions, finding excellent agreement with a linear theory analysis performed on a restricted range of scales. From a tomographic analysis of the four LRG photometric redshift bins we find that the rate of structure growth is consistent with $Λ$CDM with an overall amplitude that is $\simeq5-7\%$ lower than predicted by primary CMB measurements with modest $(\sim2σ)$ statistical significance. From the combined analysis of all four bins and their cross-correlations with Planck we obtain $S_8 = 0.765\pm0.023$, which is less discrepant with primary CMB measurements than previous DESI LRG cross Planck CMB lensing results. From the cross-correlation with ACT we obtain $S_8 = 0.790^{+0.024}_{-0.027}$, while when jointly analyzing Planck and ACT we find $S_8 = 0.775^{+0.019}_{-0.022}$ from our data alone and $σ_8 = 0.772^{+0.020}_{-0.023}$ with the addition of BAO data. These constraints are consistent with the latest Planck primary CMB analyses at the $\simeq 1.6-2.2σ$ level, and are in excellent agreement with galaxy lensing surveys. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 60 pages, 26 figures, comments welcome

arXiv:2407.04606 [pdf, other]

The Atacama Cosmology Telescope DR6 and DESI: Structure formation over cosmic time with a measurement of the cross-correlation of CMB Lensing and Luminous Red Galaxies

Authors: Joshua Kim, Noah Sailer, Mathew S. Madhavacheril, Simone Ferraro, Irene Abril-Cabezas, Jessica Nicole Aguilar, Steven Ahlen, J. Richard Bond, David Brooks, Etienne Burtin, Erminia Calabrese, Shi-Fan Chen, Steve K. Choi, Todd Claybaugh, Omar Darwish, Axel de la Macorra, Joseph DeRose, Mark Devlin, Arjun Dey, Peter Doel, Jo Dunkley, Carmen Embil-Villagra, Gerrit S. Farren, Andreu Font-Ribera, Jaime E. Forero-Romero , et al. (48 additional authors not shown)

Abstract: We present a high-significance cross-correlation of CMB lensing maps from the Atacama Cosmology Telescope (ACT) Data Release 6 (DR6) with spectroscopically calibrated luminous red galaxies (LRGs) from the Dark Energy Spectroscopic Instrument (DESI). We detect this cross-correlation at a significance of 38$σ$; combining our measurement with the Planck Public Release 4 (PR4) lensing map, we detect t… ▽ More We present a high-significance cross-correlation of CMB lensing maps from the Atacama Cosmology Telescope (ACT) Data Release 6 (DR6) with spectroscopically calibrated luminous red galaxies (LRGs) from the Dark Energy Spectroscopic Instrument (DESI). We detect this cross-correlation at a significance of 38$σ$; combining our measurement with the Planck Public Release 4 (PR4) lensing map, we detect the cross-correlation at 50$σ$. Fitting this jointly with the galaxy auto-correlation power spectrum to break the galaxy bias degeneracy with $σ_8$, we perform a tomographic analysis in four LRG redshift bins spanning $0.4 \le z \le 1.0$ to constrain the amplitude of matter density fluctuations through the parameter combination $S_8^\times = σ_8 \left(Ω_m / 0.3\right)^{0.4}$. Prior to unblinding, we confirm with extragalactic simulations that foreground biases are negligible and carry out a comprehensive suite of null and consistency tests. Using a hybrid effective field theory (HEFT) model that allows scales as small as $k_{\rm max}=0.6$ $h/{\rm Mpc}$, we obtain a 3.3% constraint on $S_8^\times = σ_8 \left(Ω_m / 0.3\right)^{0.4} = 0.792^{+0.024}_{-0.028}$ from ACT data, as well as constraints on $S_8^\times(z)$ that probe structure formation over cosmic time. Our result is consistent with the early-universe extrapolation from primary CMB anisotropies measured by Planck PR4 within 1.2$σ$. Jointly fitting ACT and Planck lensing cross-correlations we obtain a 2.7% constraint of $S_8^\times = 0.776^{+0.019}_{-0.021}$, which is consistent with the Planck early-universe extrapolation within 2.1$σ$, with the lowest redshift bin showing the largest difference in mean. The latter may motivate further CMB lensing tomography analyses at $z<0.6$ to assess the impact of potential systematics or the consistency of the $Λ$CDM model over cosmic time. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Prepared for submission to JCAP (47 pages, 13 figures)

arXiv:2407.04529 [pdf, other]

doi 10.1016/j.physletb.2024.138828

Spectroscopy of deeply bound orbitals in neutron-rich Ca isotopes

Authors: P. J. Li, J. Lee, P. Doornenbal, S. Chen, S. Wang, A. Obertelli, Y. Chazono, J. D. Holt, B. S. Hu, K. Ogata, Y. Utsuno, K. Yoshida, N. L. Achouri, H. Baba, F. Browne, D. Calvet, F. Château, N. Chiga, A. Corsi, M. L. Cortés, A. Delbart, J-M. Gheller, A. Giganon, A. Gillibert, C. Hilaire , et al. (63 additional authors not shown)

Abstract: The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam en… ▽ More The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam energy of around 216 MeV/nucleon at the RIKEN Radioactive Isotopes Beam Factory. The resonance properties, partial cross sections, and momentum distributions of these unbound states were analyzed. Orbital angular momentum $l$ assignments were extracted from momentum distributions based on calculations using the distorted wave impulse approximation (DWIA) reaction model. The resonances at excitation energies of 5516(41)\,keV in $^{53}$Ca and 6000(250)\,keV in $^{55}$Ca indicate a significant $l$\, =\,3 component, providing the first experimental evidence for the $ν0f_{7/2}$ single-particle strength of unbound hole states in the neutron-rich Ca isotopes. The observed excitation energies and cross-sections point towards extremely localized and well separated strength distributions, with some fragmentation for the $ν0f_{7/2}$ orbital in $^{55}$Ca. These results are in good agreement with predictions from shell-model calculations using the effective GXPF1Bs interaction and \textit{ab initio} calculations and diverge markedly from the experimental distributions in the nickel isotones at $Z=28$. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 13 pages, 7 figures

Journal ref: Phys. Lett. B, 855 (2024),138828

arXiv:2407.04388 [pdf, ps, other]

On a problem of Nathanson on non-minimal additive complements

Authors: Shi--Qiang Chen, Yuchen Ding

Abstract: Let $C$ and $W$ be two sets of integers. If $C+W=\mathbb{Z}$, then $C$ is called an additive complement to $W$. We further call $C$ a minimal additive complement to $W$ if no proper subset of $C$ is an additive complement to $W$. Answering a problem of Nathanson in part, we give sufficient conditions of $W$ which has no minimal additive complements. Our result also extends the prior result of Chen… ▽ More Let $C$ and $W$ be two sets of integers. If $C+W=\mathbb{Z}$, then $C$ is called an additive complement to $W$. We further call $C$ a minimal additive complement to $W$ if no proper subset of $C$ is an additive complement to $W$. Answering a problem of Nathanson in part, we give sufficient conditions of $W$ which has no minimal additive complements. Our result also extends the prior result of Chen and Yang. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: comments are welcomed!

arXiv:2407.04362 [pdf, other]

Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR

Authors: Shogo Morita, Yan Zhang, Takuto Yamauchi, Sinan Chen, Jialong Li, Kenji Tei

Abstract: People with color vision deficiency often face challenges in distinguishing colors such as red and green, which can complicate daily tasks and require the use of assistive tools or environmental adjustments. Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings. However, offering context-aware support, like indicating the… ▽ More People with color vision deficiency often face challenges in distinguishing colors such as red and green, which can complicate daily tasks and require the use of assistive tools or environmental adjustments. Current support tools mainly focus on presentation-based aids, like the color vision modes found in iPhone accessibility settings. However, offering context-aware support, like indicating the doneness of meat, remains a challenge since task-specific solutions are not cost-effective for all possible scenarios. To address this, our paper proposes an application that provides contextual and autonomous assistance. This application is mainly composed of: (i) an augmented reality interface that efficiently captures context; and (ii) a multi-modal large language model-based reasoner that serves to cognitize the context and then reason about the appropriate support contents. Preliminary user experiments with two color vision deficient users across five different scenarios have demonstrated the effectiveness and universality of our application. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04029 [pdf, other]

Robust Learning under Hybrid Noise

Authors: Yang Wei, Shuo Chen, Shanshan Ye, Bo Han, Chen Gong

Abstract: Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collecti… ▽ More Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collection and annotation processes. Although some results have been achieved by a few representation learning based attempts, this issue is still far from being addressed with promising performance and guaranteed theoretical analyses. To address the challenge, we propose a novel unified learning framework called "Feature and Label Recovery" (FLR) to combat the hybrid noise from the perspective of data recovery, where we concurrently reconstruct both the feature matrix and the label matrix of input data. Specifically, the clean feature matrix is discovered by the low-rank approximation, and the ground-truth label matrix is embedded based on the recovered features with a nuclear norm regularization. Meanwhile, the feature noise and label noise are characterized by their respective adaptive matrix norms to satisfy the corresponding maximum likelihood. As this framework leads to a non-convex optimization problem, we develop the non-convex Alternating Direction Method of Multipliers (ADMM) with the convergence guarantee to solve our learning objective. We also provide the theoretical analysis to show that the generalization error of FLR can be upper-bounded in the presence of hybrid noise. Experimental results on several typical benchmark datasets clearly demonstrate the superiority of our proposed method over the state-of-the-art robust learning approaches for various noises. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03719 [pdf, other]

Relative Difficulty Distillation for Semantic Segmentation

Authors: Dong Liang, Yue Sun, Yun Du, Songcan Chen, Sheng-Jun Huang

Abstract: Current knowledge distillation (KD) methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network. However, introducing too many additional optimization objectives may lead to unstable training, such as gradient conflicts. Moreover, these methods ignored the guideline… ▽ More Current knowledge distillation (KD) methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network. However, introducing too many additional optimization objectives may lead to unstable training, such as gradient conflicts. Moreover, these methods ignored the guidelines of relative learning difficulty between the teacher and student networks. Inspired by human cognitive science, in this paper, we redefine knowledge from a new perspective -- the student and teacher networks' relative difficulty of samples, and propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD). We propose a two-stage RDD framework: Teacher-Full Evaluated RDD (TFE-RDD) and Teacher-Student Evaluated RDD (TSE-RDD). RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals, thus avoiding adjusting learning weights for multiple losses. Extensive experimental evaluations using a general distillation loss function on popular datasets such as Cityscapes, CamVid, Pascal VOC, and ADE20k demonstrate the effectiveness of RDD against state-of-the-art KD methods. Additionally, our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03586 [pdf, other]

Gluon decay into heavy quark pair under a strong magnetic field

Authors: Shile Chen, Jiaxing Zhao, Pengfei Zhuang

Abstract: Due to the extreme large magnetic field produced in the initial stage of non-central heavy-ion collision, the dynamical process of gluon decay into heavy quark pair will take place under an external field rather than in vacuum. Unlike in the vacuum case, where the process is forbidden by energy momentum conservation, under the external field, a process emerges considering the background energy whi… ▽ More Due to the extreme large magnetic field produced in the initial stage of non-central heavy-ion collision, the dynamical process of gluon decay into heavy quark pair will take place under an external field rather than in vacuum. Unlike in the vacuum case, where the process is forbidden by energy momentum conservation, under the external field, a process emerges considering the background energy which recovers the conservation. We calculate the gluon decay rate at leading order under a uniform magnetic field. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02899 [pdf, other]

Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02842 [pdf, other]

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma

Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which n… ▽ More Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which not only includes meticulously constructed bilingual authentic or synthetic images, detailed annotations, evaluation metrics and baseline models, but also specifically designs five types of structured understanding and parsing tasks. These tasks include full parsing, partial parsing, position-related parsing, structured Visual Question Answering (VQA), and position-related VQA, covering key areas such as text recognition, spatial awareness, relationship discernment, and structured parsing. Extensive experimental results demonstrate the substantial potential and significant room for improvement in current models' ability to handle structured document information. We anticipate that the launch of MindBench will significantly advance research and application development in structured document analysis technology. MindBench is available at: https://miasanlei.github.io/MindBench.github.io/. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: technical report

arXiv:2407.02767 [pdf, other]

Comparison of Short-Range Order in GeSn Grown by Molecular Beam Epitaxy and Chemical Vapor Deposition

Authors: Shang Liu, Yunfan Liang, Haochen Zhao, Nirosh M. Eldose, Jin-Hee Bae, Omar Concepcion, Xiaochen Jin, Shunda Chen, Ilias Bikmukhametov, Austin Akey, Cory T. Cline, Alejandra Cuervo Covian, Xiaoxin Wang, Tianshu Li, Yuping Zeng, Dan Buca, Shui-Qing Yu, Gregory J. Salamo, Shengbai Zhang, Jifeng Liu

Abstract: Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom pr… ▽ More Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom probe tomography. An $\sim$15% stronger preference for Sn-Sn 1$^{st}$ nearest neighbor (1NN) is observed in MBE GeSn than CVD GeSn for both thin film and quantum well samples with Sn composition ranging from 7 to 20%. Interestingly, samples grown by different deposition tools under the same method (either MBE or CVD) showed remarkable consistency in Sn-Sn 1NN SRO, while MBE vs. CVD showed clear differences. Supported by theoretical modeling, we consider that this difference in SRO originates from the impact of surface termination, where MBE surfaces are exposed to ultrahigh vacuum while CVD surfaces are terminated by H to a good extent. This finding not only suggests engineering surface termination or surfactants during the growth as a potential approach to control SRO in GeSn, but also provides insight into the underlying reasons for very different growth temperature between MBE and CVD that directly impact the strain relaxation behavior. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02675 [pdf, other]

Depth-Aware Endoscopic Video Inpainting

Authors: Francis Xiatian Zhang, Shuang Chen, Xianghua Xie, Hubert P. H. Shum

Abstract: Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorpor… ▽ More Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorporating depth information. Still, in endoscopic contexts, they face challenges including reliance on pre-acquired depth maps, less effective fusion designs, and ignorance of the fidelity of 3D spatial details. To address them, we introduce a novel Depth-aware Endoscopic Video Inpainting (DAEVI) framework. It features a Spatial-Temporal Guided Depth Estimation module for direct depth estimation from visual features, a Bi-Modal Paired Channel Fusion module for effective channel-by-channel fusion of visual and depth information, and a Depth Enhanced Discriminator to assess the fidelity of the RGB-D sequence comprised of the inpainted frames and estimated depth images. Experimental evaluations on established benchmarks demonstrate our framework's superiority, achieving a 2% improvement in PSNR and a 6% reduction in MSE compared to state-of-the-art methods. Qualitative analyses further validate its enhanced ability to inpaint fine details, highlighting the benefits of integrating depth information into endoscopic inpainting. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by MICCAI 2024

arXiv:2407.02666 [pdf, other]

Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unusual scenarios successfully. This presents an open challenge to current learning methods, which often struggle with generalization to the long tail of unexpected situations without heavy human supervision. To address this issue, we investigate how to leverage the broad knowledge about the structure of the world and commonsense reasoning capabilities of vision-language models (VLMs) to aid legged robots in handling difficult, ambiguous situations. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection with VLMs: (1) in-context adaptation over previous robot interactions and (2) planning multiple skills into the future and replanning. We evaluate VLM-PC on several challenging real-world obstacle courses, involving dead ends and climbing and crawling, on a Go1 quadruped robot. Our experiments show that by reasoning over the history of interactions and future plans, VLMs enable the robot to autonomously perceive, navigate, and act in a wide range of complex scenarios that would otherwise require environment-specific engineering or human guidance. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 27 pages

arXiv:2407.02318 [pdf, other]

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information to help the model better localize the start and end of sounds. The fused features are trained in a multi-scale Transformer for training. In the final test dataset, we achieved a mean average precision (mAP) of 0.33, obtaining the second-best performance in this track. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.02261 [pdf, other]

Federated Distillation for Medical Image Classification: Towards Trustworthy Computer-Aided Diagnosis

Authors: Sufen Ren, Yule Hu, Shengchao Chen, Guanjun Wang

Abstract: Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency du… ▽ More Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency due to increasing data and model scales. This paper proposes a novel privacy-preserving medical image classification framework based on federated learning to address these issues, named FedMIC. The framework enables healthcare organizations to learn from both global and local knowledge, enhancing local representation of private data despite statistical heterogeneity. It provides customized models for organizations with diverse data distributions while minimizing communication overhead and improving efficiency without compromising performance. Our FedMIC enhances robustness and practical applicability under resource-constrained conditions. We demonstrate FedMIC's effectiveness using four public medical image datasets for classical medical image classification tasks. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Work in progress. This paper is the first to introduce intra-client knowledge distillation in the context of trustworthy medical image classification. arXiv admin note: text overlap with arXiv:2401.01493

arXiv:2407.02235 [pdf]

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, including (1) data complexity, (2) model capacity, and (3) evaluation metric fidelity, we collected an 18,885 text-scan pairs 3D-BrainCT dataset and applied clinical visual instruction tuning (CVIT) to train BrainGPT models to generate radiology-adherent 3D brain CT reports. Statistically, our BrainGPT scored BLEU-1 = 44.35, BLEU-4 = 20.38, METEOR = 30.13, ROUGE-L = 47.6, and CIDEr-R = 211.77 during internal testing and demonstrated an accuracy of 0.91 in captioning midline shifts on the external validation CQ500 dataset. By further inspecting the captioned report, we reported that the traditional metrics appeared to measure only the surface text similarity and failed to gauge the information density of the diagnostic purpose. To close this gap, we proposed a novel Feature-Oriented Radiology Task Evaluation (FORTE) to estimate the report's clinical relevance (lesion feature and landmarks). Notably, the BrainGPT model scored an average FORTE F1-score of 0.71 (degree=0.661; landmark=0.706; feature=0.693; impression=0.779). To demonstrate that BrainGPT models possess objective readiness to generate human-like radiology reports, we conducted a Turing test that enrolled 11 physician evaluators, and around 74% of the BrainGPT-generated captions were indistinguishable from those written by humans. Our work embodies a holistic framework that showcased the first-hand experience of curating a 3D brain CT dataset, fine-tuning anatomy-sensible language models, and proposing robust radiology evaluation metrics. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

arXiv:2407.02028 [pdf, other]

Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions

Authors: Xiang Li, Haoran Tang, Siyu Chen, Ziwei Wang, Ryan Chen, Marcin Abram

Abstract: We measure the performance of in-context learning as a function of task novelty and difficulty for open and closed questions. For that purpose, we created a novel benchmark consisting of hard scientific questions, each paired with a context of various relevancy. We show that counter-intuitively, a context that is more aligned with the topic does not always help more than a less relevant context. T… ▽ More We measure the performance of in-context learning as a function of task novelty and difficulty for open and closed questions. For that purpose, we created a novel benchmark consisting of hard scientific questions, each paired with a context of various relevancy. We show that counter-intuitively, a context that is more aligned with the topic does not always help more than a less relevant context. This effect is especially visible for open questions and questions of high difficulty or novelty. This result reveals a fundamental difference between the treatment of close-form and open-form questions by large-language models and shows a need for a more robust evaluation of in-context learning on the variety of different types of questions. It also poses a new question of how to optimally select a context for large language models, especially in the context of Retrieval Augmented Generation (RAG) systems. Our results suggest that the answer to this question can be highly application-dependent and might be contingent on factors including the format of the question, the perceived difficulty level of the questions, and the novelty or popularity of the information we seek. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 8 pages plus references, 4 main figures, 6 pages of supplementary material

arXiv:2407.01820 [pdf, other]

Exploring the Role of Randomization on Belief Rigidity in Online Social Networks

Authors: Adiba Mahbub Proma, Neeley Pate, Raiyan Abdul Baten, Sifeng Chen, James Druckman, Gourab Ghoshal, Ehsan Hoque

Abstract: People often stick to their existing beliefs, ignoring contradicting evidence or only interacting with those who reinforce their views. Social media platforms often facilitate such tendencies of homophily and echo-chambers as they promote highly personalized content to maximize user engagement. However, increased belief rigidity can negatively affect real-world policy decisions such as leading to… ▽ More People often stick to their existing beliefs, ignoring contradicting evidence or only interacting with those who reinforce their views. Social media platforms often facilitate such tendencies of homophily and echo-chambers as they promote highly personalized content to maximize user engagement. However, increased belief rigidity can negatively affect real-world policy decisions such as leading to climate change inaction and increased vaccine hesitancy. To understand and effectively tackle belief rigidity on online social networks, designing and evaluating various intervention strategies is crucial, and increasing randomization in the network can be considered one such intervention. In this paper, we empirically quantify the effects of a randomized social network structure on belief rigidity, specifically examining the potential benefits of introducing randomness into the network. We show that individuals' beliefs are positively influenced by peer opinions, regardless of whether those opinions are similar to or differ from their own by passively sensing belief rigidity through our experimental framework. Moreover, people incorporate a slightly higher variety of different peers (based on their opinions) into their networks when the recommendation algorithm provides them with diverse content, compared to when it provides them with similar content. Our results indicate that in some cases, there might be benefits to randomization, providing empirical evidence that a more randomized network could be a feasible way of helping people get out of their echo-chambers. Our findings have broader implications in computing and platform design of social media, and can help combat overly rigid beliefs in online social networks. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01458 [pdf, other]

Contractual Reinforcement Learning: Pulling Arms with Invisible Hands

Authors: Jibang Wu, Siyu Chen, Mengdi Wang, Huazheng Wang, Haifeng Xu

Abstract: The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally aris… ▽ More The agency problem emerges in today's large scale machine learning tasks, where the learners are unable to direct content creation or enforce data collection. In this work, we propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design. The problem, termed \emph{contractual reinforcement learning}, naturally arises from the classic model of Markov decision processes, where a learning principal seeks to optimally influence the agent's action policy for their common interests through a set of payment rules contingent on the realization of next state. For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent. For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation, reducing the complexity analysis to the construction of efficient search algorithms. For several natural classes of problems, we design tailored search algorithms that provably achieve $\tilde{O}(\sqrt{T})$ regret. We also present an algorithm with $\tilde{O}(T^{2/3})$ for the general problem that improves the existing analysis in online contract design with mild technical assumptions. △ Less

Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01027 [pdf, other]

Blind Inversion using Latent Diffusion Priors

Authors: Weimin Bai, Siyi Chen, Wenzheng Chen, He Sun

Abstract: Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion m… ▽ More Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00856 [pdf, other]

Drone-Based Antenna Beam Calibration in the High Arctic

Authors: Lawrence Herman, Christopher Barbarie, Mohan Agrawal, Vlad Calinescu, Simon Chen, H. Cynthia Chiang, Cherie K. Day, Eamon Egan, Stephen Fay, Kit Gerodias, Maya Goss, Michael Hétu, Daniel C. Jacobs, Marc-Olivier R. Lalonde, Francis McGee, Loïc Miara, John Orlowski-Scherer, Jonathan Sievers

Abstract: The development of low-frequency radio astronomy experiments for detecting 21-cm line emission from hydrogen presents new opportunities for creative solutions to the challenge of characterizing an antenna beam pattern. The Array of Long Baseline Antennas for Taking Radio Observations from the Seventy-ninth parallel (ALBATROS) is a new radio interferometer sited in the Canadian high Arctic that aim… ▽ More The development of low-frequency radio astronomy experiments for detecting 21-cm line emission from hydrogen presents new opportunities for creative solutions to the challenge of characterizing an antenna beam pattern. The Array of Long Baseline Antennas for Taking Radio Observations from the Seventy-ninth parallel (ALBATROS) is a new radio interferometer sited in the Canadian high Arctic that aims to map Galactic foregrounds at frequencies below $\sim$30 MHz. We present PteroSoar, a custom-built hexacopter outfitted with a transmitter, that will be used to characterize the beam patterns of ALBATROS and other experiments. The PteroSoar drone hardware is motivated by the need for user-servicing at remote sites and environmental factors that are unique to the high Arctic. In particular, magnetic heading is unreliable because the magnetic field lines near the north pole are almost vertical. We therefore implement moving baseline real time kinematic (RTK) positioning with two GPS units to obtain heading solutions with $\sim$1$^\circ$ accuracy. We present a preliminary beam map of an ALBATROS antenna, thus demonstrating successful PteroSoar operation in the high Arctic. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00628 [pdf, other]

Chirality-Induced Majorana Polarization

Authors: Song Chen, Hua-Hua Fu

Abstract: To realize Majorana fermions having novel physical features has been developed as a key while difficult task in topological superconductor. Here we have proposed another platform to generate Majorana zero modes (MZMs), which is constructed by a single opened circular helix molecules (CHM) coupled with a s-wave superconductor (with magnetic field) or by an interlinked-CHMs chain coupled with a phas… ▽ More To realize Majorana fermions having novel physical features has been developed as a key while difficult task in topological superconductor. Here we have proposed another platform to generate Majorana zero modes (MZMs), which is constructed by a single opened circular helix molecules (CHM) coupled with a s-wave superconductor (with magnetic field) or by an interlinked-CHMs chain coupled with a phase-bias s-wave superconducting heterostructure (without any magnetic field). The MZMs achieved here are tightly associated with the structural chirality in CHMs. Importantly, the left and right handedness may result in completely opposite Majorana polarization (MP), and the local MP is associated to the chiraliy-induced spin polarization. These properties provides us multiple effective ways to detect and regulate the MZMs by using the chirality-induced spin selectivity (CISS) effect and the related spin-polarized currents in chiral materials. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00185 [pdf, other]

Shape optimization of non-matching isogeometric shells with moving intersections

Authors: Han Zhao, John T. Hwang, J. S. Chen

Abstract: While shape optimization using isogeometric shells exhibits appealing features by integrating design geometries and analysis models, challenges arise when addressing computer-aided design (CAD) geometries comprised of multiple non-uniform rational B-splines (NURBS) patches, which are common in practice. The intractability stems from surface intersections within these CAD models. In this paper, we… ▽ More While shape optimization using isogeometric shells exhibits appealing features by integrating design geometries and analysis models, challenges arise when addressing computer-aided design (CAD) geometries comprised of multiple non-uniform rational B-splines (NURBS) patches, which are common in practice. The intractability stems from surface intersections within these CAD models. In this paper, we develop an approach for shape optimization of non-matching isogeometric shells incorporating intersection movement. Separately parametrized NURBS surfaces are modeled using Kirchhoff--Love shell theory and coupled using a penalty-based formulation. The optimization scheme allows shell patches to move without preserving relative location with other members during the shape optimization. This flexibility is achieved through an implicit state function, and analytical sensitivities are derived for the relative movement of shell patches. The introduction of differentiable intersections expands the design space and overcomes challenges associated with large mesh distortion, particularly when optimal shapes involve significant movement of patch intersections in physical space. Throughout optimization iterations, all members within the shell structures maintain the NURBS geometry representation, enabling efficient integration of analysis and design models. The optimization approach leverages the multilevel design concept by selecting a refined model for accurate analysis from a coarse design model while maintaining the same geometry. We adopt several example problems to verify the effectiveness of the proposed scheme and demonstrate its applicability to the optimization of the internal stiffeners of an aircraft wing. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 41 pages, 18 figures

arXiv:2407.00136 [pdf, other]

Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions $\frac{\mathcal{B}(h_c\rightarrow e^+e^-η_c)}{\mathcal{B}(h_c\rightarrow γη_c)}$ separately for the $h_c$ samples produced via $ψ(3686)\toπ^0h_c$ and $e^+e^-\toπ^+π^-h_c$. The average ratio is determined to be $(0.59\pm0.10(\text{stat.})\pm0.04(\text{syst.}))\%$, where the uncertainty includes both statistical and systematic components. △ Less

Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

Showing 1–50 of 5,652 results for author: Chen, S