Showing 1–50 of 11,998 results for author: Li, X

Search v0.5.6 released 2020-02-24

arXiv:2407.11851 [pdf, ps, other]

quant-ph

Atom Cavity Encoding for NP-Complete Problems

Authors: Meng Ye, Xiaopeng Li

Abstract: We consider an atom-cavity system having long-range atomic interactions mediated by cavity modes. It has been shown that quantum simulations of spin models with this system can naturally be used to solve number partition problems. Here, we present encoding schemes for numerous NP-complete problems, encompassing the majority of Karp's 21 NP-complete problems. We find a number of such computation pr… ▽ More We consider an atom-cavity system having long-range atomic interactions mediated by cavity modes. It has been shown that quantum simulations of spin models with this system can naturally be used to solve number partition problems. Here, we present encoding schemes for numerous NP-complete problems, encompassing the majority of Karp's 21 NP-complete problems. We find a number of such computation problems can be encoded by the atom-cavity system at a linear cost of atom number. There are still certain problems that cannot be encoded by the atom-cavity as efficiently, such as quadratic unconstrained binary optimization (QUBO), and the Hamiltonian cycle. For these problems, we provide encoding schemes with a quadratic or quartic cost in the atom number. We expect this work to provide important guidance to search for the practical quantum advantage of the atom-cavity system in solving NP-complete problems. Moreover, the encoding schemes we develop here may also be adopted in other optical systems for solving NP-complete problems, where a similar form of Mattis-type spin glass Hamiltonian as in the atom-cavity system can be implemented. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 25 pages, 1 table
arXiv:2407.11836 [pdf, other]

cond-mat.mtrl-sci

Magnetic memory and distinct spin populations in ferromagnetic Co3Sn2S2

Authors: Charles Menil, Brigitte Leridon, Antonella Cavanna, Ulf Gennser, Dominique Mailly, Linchao Ding, Xiaokang Li, Zengwei Zhu, Benoît Fauqué, Kamran Behnia

Abstract: Co3Sn2S2, a ferromagnetic Weyl semi-metal with Co atoms on a kagome lattice, has generated much recent attention. Experiments have identified a temperature scale below the Curie temperature. Here, we find that this magnet keeps a memory, when not exposed to a magnetic field sufficiently large to erase it. We identify the driver of this memory effect as a small secondary population of spins, whose… ▽ More Co3Sn2S2, a ferromagnetic Weyl semi-metal with Co atoms on a kagome lattice, has generated much recent attention. Experiments have identified a temperature scale below the Curie temperature. Here, we find that this magnet keeps a memory, when not exposed to a magnetic field sufficiently large to erase it. We identify the driver of this memory effect as a small secondary population of spins, whose coercive field is significantly larger than that of the majority spins. The shape of the magnetization hysteresis curve has a threshold magnetic field set by the demagnetizing factor. These two field scales set the hitherto unidentified temperature scale, which is not a thermodynamic phase transition, but a crossing point between meta-stable boundaries. Global magnetization is well defined, even when it is non-uniform, but drastic variations in local magnetization point to a coarse energy landscape, with the thermodynamic limit not achieved at micrometer length scales. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11812 [pdf, ps, other]

cs.LG q-bio.QM

DFDRNN: A dual-feature based neural network for drug repositioning

Authors: Enqiang Zhu, Xiang Li, Chanjuan Liu, Nikhil R. Pal

Abstract: Drug repositioning is an economically efficient strategy used to discover new indications for existing drugs beyond their original approvals, expanding their applicability and usage to address challenges in disease treatment. In recent years, deep-learning techniques for drug repositioning have gained much attention. While most deep learning-based research methods focus on encoding drugs and disea… ▽ More Drug repositioning is an economically efficient strategy used to discover new indications for existing drugs beyond their original approvals, expanding their applicability and usage to address challenges in disease treatment. In recent years, deep-learning techniques for drug repositioning have gained much attention. While most deep learning-based research methods focus on encoding drugs and diseases by extracting feature information from neighbors in the network, they often pay little attention to the potential relationships between the features of drugs and diseases, leading to imprecise encoding of drugs and diseases. To address this, we design a dual-feature drug repositioning neural network (DFDRNN) model to achieve precise encoding of drugs and diseases. DFDRNN uses two features to represent drugs and diseases: the similarity feature and the association feature. The model incorporates a self-attention mechanism to design two dual-feature extraction modules for achieving precisely encoding of drugs and diseases: the intra-domain dual-feature extraction (IntraDDFE) module and the inter-domain dual-feature extraction (InterDDFE) module. The IntraDDFE module extracts features from a single domain (drug or disease domain), while the InterDDFE module extracts features from the mixed domain (drug and disease domain). In particular, the feature is changed by InterDDFE, ensuring a precise encoding of drugs and diseases. Finally, a cross-dual-domain decoder is designed to predict drug-disease associations in both the drug and disease domains. Compared to six state-of-the-art methods, DFDRNN outperforms others on four benchmark datasets, with an average AUROC of 0.946 and an average AUPR of 0.597. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11737 [pdf, other]

astro-ph.HE astro-ph.CO hep-ph

A $\sim 43$ GeV $γ$-ray line signature in the directions of a group of nearby massive galaxy clusters

Authors: Yi-Zhong Fan, Zhao-Qiang Shen, Yun-Feng Liang, Xiang Li, Kai-Kai Duan, Zi-Qing Xia, Xiao-Yuan Huang, Lei Feng, Qiang Yuan

Abstract: As the largest gravitationally bound objects in the Universe, galaxy clusters have provided the first piece of evidence for the presence of dark matter and may be suitable targets for indirect dark matter searches. Among various signals, the GeV-TeV $γ$-ray line has been taken as the smoking-gun signal of the dark matter annihilation/decay since no known astrophysical/physical process(es) could ge… ▽ More As the largest gravitationally bound objects in the Universe, galaxy clusters have provided the first piece of evidence for the presence of dark matter and may be suitable targets for indirect dark matter searches. Among various signals, the GeV-TeV $γ$-ray line has been taken as the smoking-gun signal of the dark matter annihilation/decay since no known astrophysical/physical process(es) could generate such a peculiar spectrum. With 15.5 years of Fermi-LAT P8R3 publicly available data, we search for the $γ$-ray line emission in the directions of a group of 13 nearby massive galaxy clusters with an unbinned likelihood analysis. A $γ$-ray line signal at $\sim 43.2$ GeV has a net TS value of $\approx 30$ if we only take into account the data in the directions of Virgo, Fornax and Ophiuchus clusters, three massive clusters with the highest J-factors expected to generate the dark matter annihilation signal. The signal still presents when the data of other 10 nearby massive clusters have also been included, though the TS value decreases to $\approx 21$ likely because of their lower signal-to-noise ratios. The absence of this signal in the inner Galaxy disfavors both the instrumental effect and the canonical dark matter annihilation interpretation, and a more sophisticated dark matter model or very peculiar astrophysical scenario might be needed. This $γ$-ray line signal, if intrinsic, could be unambiguously verified by the Very Large Area $γ$-ray Space Telescope in its first two years of performance. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures, 1 table. Comments are welcome!
arXiv:2407.11727 [pdf, ps, other]

hep-ex hep-ph

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures
arXiv:2407.11702 [pdf, ps, other]

math.AP

Dynamics for a diffusive epidemic model with a free boundary: sharp asymptotic profile

Authors: Xueping Li, Lei Li, Mingxin Wang

Abstract: This paper concerns the sharp asymptotic profiles of the solution of a diffusive epidemic model with one free boundary and one fixed boundary which is subject to the homogeneous Dirichlet boundary condition and Neumann boundary condition, respectively. The longtime behaviors has been proved to be governed by a spreading-vanishing dichotomy in \cite{LL}, and when spreading happens, the spreading sp… ▽ More This paper concerns the sharp asymptotic profiles of the solution of a diffusive epidemic model with one free boundary and one fixed boundary which is subject to the homogeneous Dirichlet boundary condition and Neumann boundary condition, respectively. The longtime behaviors has been proved to be governed by a spreading-vanishing dichotomy in \cite{LL}, and when spreading happens, the spreading speed is determined in \cite{LLW}. In this paper, by constructing some subtle upper and lower solutions, as well as employing some detailed analysis, we improve the results in \cite{LLW} and obtain the sharp asymptotic spreading profiles, which show the homogeneous Dirichlet boundary condition and Neumann boundary condition imposed at the fixed boundary $x=0$ lead to the same asymptotic behaviors of $h(t)$ and $(u,v)$ near the spreading front $h(t)$. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11663 [pdf, other]

cs.CV

Affective Behavior Analysis using Task-adaptive and AU-assisted Graph Network

Authors: Xiaodong Li, Wenchao Du, Hongyu Yang

Abstract: In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feat… ▽ More In this paper, we present our solution and experiment result for the Multi-Task Learning Challenge of the 7th Affective Behavior Analysis in-the-wild(ABAW7) Competition. This challenge consists of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. We address the research problems of this challenge from three aspects: 1)For learning robust visual feature representations, we introduce the pre-trained large model Dinov2. 2) To adaptively extract the required features of eack task, we design a task-adaptive block that performs cross-attention between a set of learnable query vectors and pre-extracted features. 3) By proposing the AU-assisted Graph Convolutional Network(AU-GCN), we make full use of the correlation information between AUs to assist in solving the EXPR and VA tasks. Finally, we achieve the evaluation measure of \textbf{1.2542} on the validation set provided by the organizers. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11654 [pdf, other]

cs.LG cs.AI eess.SP

R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models

Authors: Aladin Djuhera, Vlad C. Andrei, Xinyang Li, Ullrich J. Mönich, Holger Boche, Walid Saad

Abstract: Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML), where components of large ML models are outsourced to remote servers. A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming that could jeopardize the learning process. This is particularly pron… ▽ More Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML), where components of large ML models are outsourced to remote servers. A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming that could jeopardize the learning process. This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding. In this paper, rigorous insights are provided into the influence of jamming LLM word embeddings in SFL by deriving an expression for the ML training loss divergence and showing that it is upper-bounded by the mean squared error (MSE). Based on this analysis, a physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks. R-SFLLM leverages wireless sensing data to gather information on the jamming directions-of-arrival (DoAs) for the purpose of devising a novel, sensing-assisted anti-jamming strategy while jointly optimizing beamforming, user scheduling, and resource allocation. Extensive experiments using BERT and RoBERTa models demonstrate R-SFLLM's effectiveness, achieving close-to-baseline performance across various natural language processing (NLP) tasks and datasets. The proposed methodology further introduces an adversarial training component, where controlled noise exposure significantly enhances the LLM's resilience to perturbed parameters during training. The results show that more noise-sensitive models, such as RoBERTa, benefit from this feature, especially when resource allocation is unfair. It is also shown that worst-case jamming in particular translates into worst-case model outcomes, thereby necessitating the need for jamming-resilient SFL protocols. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11644 [pdf, other]

cs.CV cs.RO

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Authors: Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

Abstract: When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into… ▽ More When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11420 [pdf, other]

cs.RO

iKalibr: Unified Targetless Spatiotemporal Calibration for Resilient Integrated Inertial Systems

Authors: Shuolong Chen, Xingxing Li, Shengyu Li, Yuxuan Zhou, Xiaoteng Yang

Abstract: The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, whic… ▽ More The integrated inertial system, typically integrating an IMU and an exteroceptive sensor such as radar, LiDAR, and camera, has been widely accepted and applied in modern robotic applications for ego-motion estimation, motion control, or autonomous exploration. To improve system accuracy, robustness, and further usability, both multiple and various sensors are generally resiliently integrated, which benefits the system performance regarding failure tolerance, perception capability, and environment compatibility. For such systems, accurate and consistent spatiotemporal calibration is required to maintain a unique spatiotemporal framework for multi-sensor fusion. Considering most existing calibration methods (i) are generally oriented to specific integrated inertial systems, (ii) often only focus on spatial determination, (iii) usually require artificial targets, lacking convenience and usability, we propose iKalibr: a unified targetless spatiotemporal calibration framework for resilient integrated inertial systems, which overcomes the above issues, and enables both accurate and consistent calibration. Altogether four commonly employed sensors are supported in iKalibr currently, namely IMU, radar, LiDAR, and camera. The proposed method starts with a rigorous and efficient dynamic initialization, where all parameters in the estimator would be accurately recovered. Following that, several continuous-time-based batch optimizations would be carried out to refine initialized parameters to global optimal ones. Sufficient real-world experiments were conducted to verify the feasibility and evaluate the calibration performance of iKalibr. The results demonstrate that iKalibr can achieve accurate resilient spatiotemporal calibration. We open-source our implementations at (https://github.com/Unsigned-Long/iKalibr) to benefit the research community. △ Less

Submitted 16 July, 2024; originally announced July 2024.
arXiv:2407.11324 [pdf, other]

cs.AR

ApproxPilot: A GNN-based Accelerator Approximation Framework

Authors: Qing Zhang, Cheng Liu, Siting Liu, Yajuan Hui, Huawei Li, Xiaowei Li

Abstract: A typical optimization of customized accelerators for error-tolerant applications such as multimedia, recognition, and classification is to replace traditional arithmetic units like multipliers and adders with the approximate ones to enhance energy efficiency while adhering to accuracy requirements. However, the plethora of arithmetic units and diverse approximate unit options result in an exceedi… ▽ More A typical optimization of customized accelerators for error-tolerant applications such as multimedia, recognition, and classification is to replace traditional arithmetic units like multipliers and adders with the approximate ones to enhance energy efficiency while adhering to accuracy requirements. However, the plethora of arithmetic units and diverse approximate unit options result in an exceedingly large design space. Therefore, there is a pressing need for an end-to-end design framework capable of navigating this intricate design space for approximation optimization. Traditional methods relying on simulation-based or blackbox model evaluations suffer from either high computational costs or limitations in accuracy and scalability, posing significant challenges to the optimization process. In this paper, we propose a Graph Neural Network (GNN) model that leverages the physical connections of arithmetic units to capture their influence on the performance, power, area (PPA), and accuracy of the accelerator. Particularly, we notice that critical path plays a key role in node feature of the GNN model and having it embedded in the feature vector greatly enhances the prediction quality of the models. On top of the models that allow rapid and efficient PPA and accuracy prediction of various approximate accelerator configurations, we can further explore the large design space effectively and build an end-to-end accelerator approximation framework named ApproxPilot to optimize the accelerator approximation. Our experimental results demonstrate that ApproxPilot outperforms state-of-the-art approximation optimization frameworks in both performance and hardware overhead with the same accuracy constraints. △ Less

Submitted 15 July, 2024; originally announced July 2024.
arXiv:2407.11299 [pdf, other]

cs.RO cs.CV

FR-SLAM: A SLAM Improvement Method Based on Floor Plan Registration

Authors: Jiantao Feng, Xinde Li, HyunCheol Park, Juan Liu, Zhentong Zhang

Abstract: Simultaneous Localization and Mapping (SLAM) technology enables the construction of environmental maps and localization, serving as a key technique for indoor autonomous navigation of mobile robots. Traditional SLAM methods typically require exhaustive traversal of all rooms during indoor navigation to obtain a complete map, resulting in lengthy path planning times and prolonged time to reach targ… ▽ More Simultaneous Localization and Mapping (SLAM) technology enables the construction of environmental maps and localization, serving as a key technique for indoor autonomous navigation of mobile robots. Traditional SLAM methods typically require exhaustive traversal of all rooms during indoor navigation to obtain a complete map, resulting in lengthy path planning times and prolonged time to reach target points. Moreover, cumulative errors during motion lead to inaccurate robot localization, impacting navigation efficiency.This paper proposes an improved SLAM method, FR-SLAM, based on floor plan registration, utilizing a morphology-based floor plan registration algorithm to align and transform original floor plans. This approach facilitates the rapid acquisition of comprehensive motion maps and efficient path planning, enabling swift navigation to target positions within a shorter timeframe. To enhance registration and robot motion localization accuracy, a real-time update strategy is employed, comparing the current position's building structure with the map and dynamically updating floor plan registration results for precise localization. Comparative tests conducted on real and simulated datasets demonstrate that, compared to other benchmark algorithms, this method achieves higher floor plan registration accuracy and shorter time consumption to reach target positions. △ Less

Submitted 15 July, 2024; originally announced July 2024.
arXiv:2407.11227 [pdf, other]

cond-mat.str-el cond-mat.dis-nn

Magnetic skin effect in Pb(Fe$_{1/2}$Nb$_{1/2}$)O$_3$

Authors: N. Giles-Donovan, A. D. Hillier, K. Ishida, B. V. Hampshire, S. R. Giblin, B. Roessli, P. M. Gehring, G. Xu, X. Li, H. Luo, S. Cochran, C. Stock

Abstract: Relaxor-ferroelectrics display exceptional dielectric properties resulting from the underlying random dipolar fields induced by strong chemical inhomogeneity. An unusual structural aspect of relaxors is a skin-effect where the near-surface region in single crystals exhibit structures and critical phenomena that differ from the bulk. Relaxors are unique in that this skin effect extends over a macro… ▽ More Relaxor-ferroelectrics display exceptional dielectric properties resulting from the underlying random dipolar fields induced by strong chemical inhomogeneity. An unusual structural aspect of relaxors is a skin-effect where the near-surface region in single crystals exhibit structures and critical phenomena that differ from the bulk. Relaxors are unique in that this skin effect extends over a macroscopic lengthscale of $\sim$ 100$μ$m whereas usual surface layers only extend over a few unit cells (or $\sim$ nm). We present a muon spectroscopy study of Pb(Fe$_{1/2}$Nb$_{1/2}$)O$_{3}$ (PFN) which displays ferroelectric order, including many relaxor-like dielectric properties such as a frequency broadened dielectric response, and antiferromagnetism with spatially short-range polar correlations and hence can be termed a multiferroic. In terms of the magnetic behavior determined by the Fe$^{3+}$ ($S=5/2$, $L\approx0$) ions, PFN has been characterized as a unique example of a "cluster spin-glass". We use variable momentum muon spectroscopy to study the depth dependence of the slow magnetic relaxations in a large 1 cm$^{3}$ crystal of PFN. Zero-field positive muon spin relaxation is parameterized using a stretched exponential, indicative of a distribution of relaxation rates of the Fe$^{3+}$ spins. This bandwidth of frequencies changes as a function of muon momentum, indicative of a change in the Fe$^{3+}$ relaxation rates as a function of muon implantation depth in our single crystal. Using negative muon elemental analysis, we find small-to-no measurable change in the Fe$^{3+}$/Nb$^{5+}$ concentration with depth implying that chemical concentration alone cannot account for the change in the relaxational dynamics. PFN displays an analogous magnetic skin effect reported to exist in the structural properties of relaxor-ferroelectrics. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 32 pages, 13 figures
arXiv:2407.11034 [pdf]

cs.LG

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

Authors: Siqi Li, Xin Li, Kunyu Yu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Qiming Wu, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu

Abstract: Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l… ▽ More Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pre-trained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. In our review of TL applications in structured clinical and biomedical data, we screened 3,515 papers, with 55 meeting the inclusion criteria. Among these, only 2% (one out of 55) utilized external studies, and 7% (four out of 55) addressed scenarios involving multi-site collaborations with privacy constraints. To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines. △ Less

Submitted 4 July, 2024; originally announced July 2024.
arXiv:2407.10990 [pdf]

cs.CL cs.AI

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn. △ Less

Submitted 23 June, 2024; originally announced July 2024.

Comments: 25 pages.4 figures
arXiv:2407.10979 [pdf, ps, other]

cs.NI

Diffusion Model-based Incentive Mechanism with Prospect Theory for Edge AIGC Services in 6G IoT

Authors: Jinbo Wen, Jiangtian Nie, Yue Zhong, Changyan Yi, Xiaohuan Li, Jiangming Jin, Yang Zhang, Dusit Niyato

Abstract: The fusion of Internet of Things (IoT) with Sixth-Generation (6G) technology has significant potential to revolutionize the IoT landscape. Utilizing the ultra-reliable and low-latency communication capabilities of 6G, 6G-IoT networks can transmit high-quality and diverse data to enhance edge learning. Artificial Intelligence-Generated Content (AIGC) harnesses advanced AI algorithms to automaticall… ▽ More The fusion of Internet of Things (IoT) with Sixth-Generation (6G) technology has significant potential to revolutionize the IoT landscape. Utilizing the ultra-reliable and low-latency communication capabilities of 6G, 6G-IoT networks can transmit high-quality and diverse data to enhance edge learning. Artificial Intelligence-Generated Content (AIGC) harnesses advanced AI algorithms to automatically generate various types of content. The emergence of edge AIGC integrates with edge networks, facilitating real-time provision of customized AIGC services by deploying AIGC models on edge devices. However, the current practice of edge devices as AIGC Service Providers (ASPs) lacks incentives, hindering the sustainable provision of high-quality edge AIGC services amidst information asymmetry. In this paper, we develop a user-centric incentive mechanism framework for edge AIGC services in 6G-IoT networks. Specifically, we first propose a contract theory model for incentivizing ASPs to provide AIGC services to clients. Recognizing the irrationality of clients towards personalized AIGC services, we utilize Prospect Theory (PT) to better capture the subjective utility of clients. Finally, we adopt the generative diffusion model to generate the optimal contract design under PT, outperforming traditional deep reinforcement learning algorithms, i.e., soft actor-critic algorithms. Our numerical results demonstrate the effectiveness of the proposed scheme. △ Less

Submitted 10 June, 2024; originally announced July 2024.
arXiv:2407.10918 [pdf, other]

cs.CV

PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition

Authors: Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu

Abstract: Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annot… ▽ More Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annotations, the effectiveness of these methods is only validated on small-scale nonstandard datasets. In this work, we propose PIN++, short for PartImageNet++, a dataset providing high-quality part segmentation annotations for all categories of ImageNet-1K (IN-1K). With these annotations, we build part-based methods directly on the standard IN-1K dataset for robust recognition. Different from previous two-stage part-based models, we propose a Multi-scale Part-supervised Model (MPM), to learn a robust representation with part annotations. Experiments show that MPM yielded better adversarial robustness on the large-scale IN-1K over strong baselines across various attack settings. Furthermore, MPM achieved improved robustness on common corruptions and several out-of-distribution datasets. The dataset, together with these results, enables and encourages researchers to explore the potential of part-based models in more real applications. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV2024
arXiv:2407.10909 [pdf, other]

q-fin.CP

FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets

Authors: Xiaohui Victor Li, Francesco Sanna Passino

Abstract: Dynamic knowledge graphs (DKGs) are popular structures to express different types of connections between objects over time. They can also serve as an efficient mathematical tool to represent information extracted from complex unstructured data sources, such as text or images. Within financial applications, DKGs could be used to detect trends for strategic thematic investing, based on information o… ▽ More Dynamic knowledge graphs (DKGs) are popular structures to express different types of connections between objects over time. They can also serve as an efficient mathematical tool to represent information extracted from complex unstructured data sources, such as text or images. Within financial applications, DKGs could be used to detect trends for strategic thematic investing, based on information obtained from financial news articles. In this work, we explore the properties of large language models (LLMs) as dynamic knowledge graph generators, proposing a novel open-source fine-tuned LLM for this purpose, called the Integrated Contextual Knowledge Graph Generator (ICKG). We use ICKG to produce a novel open-source DKG from a corpus of financial news articles, called FinDKG, and we propose an attention-based GNN architecture for analysing it, called KGTransformer. We test the performance of the proposed model on benchmark datasets and FinDKG, demonstrating superior performance on link prediction tasks. Additionally, we evaluate the performance of the KGTransformer on FinDKG for thematic investing, showing it can outperform existing thematic ETFs. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 8 pages
arXiv:2407.10833 [pdf, other]

eess.IV cs.CV

MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration

Authors: Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

Abstract: We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develo… ▽ More We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develops the powerful mixture-of-experts (MoE) prompt module, where some basic prompts cooperate to excavate the task-customized diffusion priors from Stable Diffusion (SD) for each compression task. Moreover, the degradation-aware routing mechanism is proposed to enable the flexible assignment of basic prompts. To activate and reuse the cross-modality generation prior of SD, we design the visual-to-text adapter for MoE-DiffIR, which aims to adapt the embedding of low-quality images from the visual domain to the textual domain as the textual guidance for SD, enabling more consistent and reasonable texture generation. We also construct one comprehensive benchmark dataset for universal CIR, covering 21 types of degradations from 7 popular traditional and learned codecs. Extensive experiments on universal CIR have demonstrated the excellent robustness and texture restoration capability of our proposed MoE-DiffIR. The project can be found at https://renyulin-f.github.io/MoE-DiffIR.github.io/. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024
arXiv:2407.10523 [pdf, other]

quant-ph physics.chem-ph

Variational Quantum Imaginary Time Evolution for Matrix Product State Ansatz with Tests on Transcorrelated Hamiltonians

Authors: Hao-En Li, Xiang Li, Jia-Cheng Huang, Guang-Ze Zhang, Zhu-Ping Shen, Chen Zhao, Jun Li, Han-Shi Hu

Abstract: The matrix product state (MPS) ansatz offers a promising approach for finding the ground state of molecular Hamiltonians and solving quantum chemistry problems. Building on this concept, the proposed technique of quantum circuit MPS (QCMPS) enables the simulation of chemical systems using a relatively small number of qubits. In this study, we enhance the optimization performance of the QCMPS ansat… ▽ More The matrix product state (MPS) ansatz offers a promising approach for finding the ground state of molecular Hamiltonians and solving quantum chemistry problems. Building on this concept, the proposed technique of quantum circuit MPS (QCMPS) enables the simulation of chemical systems using a relatively small number of qubits. In this study, we enhance the optimization performance of the QCMPS ansatz by employing the variational quantum imaginary time evolution (VarQITE) approach. Guided by McLachlan's variational principle, the VarQITE method provides analytical metrics and gradients, resulting in improved convergence efficiency and robustness of the QCMPS. We validate these improvements numerically through simulations of $\rm H_2$, $\rm H_4$, and $\rm LiH$ molecules. Additionally, given that VarQITE is applicable to non-Hermitian Hamiltonians, we evaluate its effectiveness in preparing the ground state of transcorrelated (TC) Hamiltonians. This approach yields energy estimates comparable to the complete basis set (CBS) limit while using even fewer qubits. Specifically, we perform simulations of the beryllium atom and $\rm LiH$ molecule using only three qubits, maintaining high fidelity with the CBS ground state energy of these systems. This qubit reduction is achieved through the combined advantages of both the QCMPS ansatz and transcorrelation. Our findings demonstrate the potential practicality of this quantum chemistry algorithm on near-term quantum devices. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 15 pages, 8 figures
arXiv:2407.10408 [pdf, other]

cs.IT eess.SP

Latency Minimization for IRS-enhanced Wideband MEC Networks with Practical Reflection Model

Authors: N. Li, W. Hao, X. Li, Z. Zhu, Z. Tang, S. Yang

Abstract: Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performanc… ▽ More Intelligent reflecting surface (IRS) has been considered as an efficient way to boost the computation capability of mobile edge computing (MEC) system, especially when the communication links is blocked or the communication signal is weak. However, most existing works are restricted to narrow-band channel and ideal IRS reflection model, which is not practical and may lead to significant performance degradation in realistic systems. To further exploit the benefits of IRS in MEC system, we consider an IRS-enhanced wideband MEC system with practical IRS reflection model. With the aim of minimizing the weighted latency of all devices, the offloading data volume, edge computing resource, BS's receiving vector, and IRS passive beamforming are jointly optimized. Since the formulated problem is non-convex, we employ the block coordinate descent (BCD) technique to decouple it into two subproblems for alternatively optimizing computing and communication settings. The effectiveness and convergence of the proposed algorithm are validate via numerical analyses. In addition, simulation results demonstrate that the proposed algorithm can achieve lower latency compared to that based on the ideal IRS reflection model, which confirms the necessary of considering practical model when designing an IRS-enhanced wideband MEC system. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 13 pages, 9 figures
arXiv:2407.10391 [pdf, other]

astro-ph.HE

On the magnetic braking law in black hole low-mass X-ray binaries

Authors: Zhuling Deng, Xiangdong Li

Abstract: Magnetic braking (MB) plays an important role in the evolution of close low-mass X-ray binaries (LMXBs). It is also essential to the formation of ultracompact X-ray binaries (UCXBs). There have been lively investigations on the MB mechanism(s) in both single stars and close binaries including cataclysmic variables and neutron star (NS) LMXBs, but with diverse conclusions. In this paper we explore… ▽ More Magnetic braking (MB) plays an important role in the evolution of close low-mass X-ray binaries (LMXBs). It is also essential to the formation of ultracompact X-ray binaries (UCXBs). There have been lively investigations on the MB mechanism(s) in both single stars and close binaries including cataclysmic variables and neutron star (NS) LMXBs, but with diverse conclusions. In this paper we explore the effect of MB on the black hole (BH) LMXB evolution. We combine binary population synthesis with detailed binary evolution to obtain the expected properties of Galactic BH LMXB population. The simulated results are compared with the observational data including the BH mass, companion mass, companion temperature, orbital period, and mean accretion rate. Our results reveal that the MB laws with relatively low efficiency (i.e., RM12 and RVJ83) exhibit better agreement with observations, contrary to what was found for NS LMXBs. This raises the interesting question about whether MB really follows the same unified law in different types of binaries. We also predict that only a very small fraction ($\lesssim 2.5\%$) of BH LMXBs can evolve to be UCXBs. This explains why there is no BH UCXB discovered by far. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 16 pages, 7 figures, 3 tables. Accepted for publication by ApJ
arXiv:2407.10327 [pdf, other]

cs.LG cs.AI cs.CV

Learning Unlabeled Clients Divergence via Anchor Model Aggregation for Federated Semi-supervised Learning

Authors: Marawan Elbatel, Hualiang Wang, Jixiang Chen, Hao Wang, Xiaomeng Li

Abstract: Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models fro… ▽ More Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models from unlabeled clients due to their inherent unreliability, thus overlooking unique information from their heterogeneous data distribution, leading to sub-optimal results. In this paper, we enable unlabeled client aggregation through SemiAnAgg, a novel Semi-supervised Anchor-Based federated Aggregation. SemiAnAgg learns unlabeled client contributions via an anchor model, effectively harnessing their informative value. Our key idea is that by feeding local client data to the same global model and the same consistently initialized anchor model (i.e., random model), we can measure the importance of each unlabeled client accordingly. Extensive experiments demonstrate that SemiAnAgg achieves new state-of-the-art results on four widely used FedSemi benchmarks, leading to substantial performance improvements: a 9% increase in accuracy on CIFAR-100 and a 7.6% improvement in recall on the medical dataset ISIC-18, compared with prior state-of-the-art. Code is available at: https://github.com/xmed-lab/SemiAnAgg. △ Less

Submitted 14 July, 2024; originally announced July 2024.
arXiv:2407.10274 [pdf, other]

cs.CV cs.LG

Enhancing Weakly-Supervised Histopathology Image Segmentation with Knowledge Distillation on MIL-Based Pseudo-Labels

Authors: Yinsheng He, Xingyu Li, Roger J. Zemp

Abstract: Segmenting tumors in histological images is vital for cancer diagnosis. While fully supervised models excel with pixel-level annotations, creating such annotations is labor-intensive and costly. Accurate histopathology image segmentation under weakly-supervised conditions with coarse-grained image labels is still a challenging problem. Although multiple instance learning (MIL) has shown promise in… ▽ More Segmenting tumors in histological images is vital for cancer diagnosis. While fully supervised models excel with pixel-level annotations, creating such annotations is labor-intensive and costly. Accurate histopathology image segmentation under weakly-supervised conditions with coarse-grained image labels is still a challenging problem. Although multiple instance learning (MIL) has shown promise in segmentation tasks, surprisingly, no previous pseudo-supervision methods have used MIL-based outputs as pseudo-masks for training. We suspect this stems from concerns over noises in MIL results affecting pseudo supervision quality. To explore the potential of leveraging MIL-based segmentation for pseudo supervision, we propose a novel distillation framework for histopathology image segmentation. This framework introduces a iterative fusion-knowledge distillation strategy, enabling the student model to learn directly from the teacher's comprehensive outcomes. Through dynamic role reversal between the fixed teacher and learnable student models and the incorporation of weighted cross-entropy loss for model optimization, our approach prevents performance deterioration and noise amplification during knowledge distillation. Experimental results on public histopathology datasets, Camelyon16 and Digestpath2019, demonstrate that our approach not only complements various MIL-based segmentation methods but also significantly enhances their performance. Additionally, our method achieves new SOTA in the field. △ Less

Submitted 14 July, 2024; originally announced July 2024.
arXiv:2407.10204 [pdf, other]

cs.LG

Improving Graph Out-of-distribution Generalization on Real-world Data

Authors: Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Xiang Li

Abstract: Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environm… ▽ More Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environments and invariant sub-graphs, this paper presents the theorems of environment-label dependency and mutable rationale invariance, where the former characterizes the usefulness of environments in determining graph labels while the latter refers to the mutable importance of graph rationales. Based on analytic investigations, a novel variational inference based method named ``Probability Dependency on Environments and Rationales for OOD Graphs on Real-world Data'' (DEROG) is introduced. To alleviate the adverse effect of unknown prior knowledge on environments and rationales, DEROG utilizes generalized Bayesian inference. Further, DEROG employs an EM-based algorithm for optimization. Finally, extensive experiments on real-world datasets under different distribution shifts are conducted to show the superiority of DEROG. Our code is publicly available at https://anonymous.4open.science/r/DEROG-536B. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 21 pages, 5 figures
arXiv:2407.09922 [pdf]

q-bio.NC

Transcranial low-level laser stimulation in near infrared-II region for brain safety and protection

Authors: Zhilin Li, Yongheng Zhao, Yiqing Hu, Yang Li, Keyao Zhang, Zhibing Gao, Lirou Tan, Hanli Liu, Xiaoli Li, Aihua Cao, Zaixu Cui, Chenguang Zhao

Abstract: Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuroprotective properties of tPBM using near-infrared (NIR)-II laser stimulation. Methods: We conducted thirteen experiments involving multidimensional and… ▽ More Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuroprotective properties of tPBM using near-infrared (NIR)-II laser stimulation. Methods: We conducted thirteen experiments involving multidimensional and quantitative methods and measured serum neurobiomarkers, performed electroencephalogram (EEG) and magnetic resonance imaging (MRI) scans, assessed executive functions, and collected a subjective questionnaire. Results: Significant reductions (n=15) in neuron specific enolase (NSE) levels were observed after treatment, indicating neuroprotective effects. No structural or functional brain abnormalities were observed, confirming the safety of tPBM. Additionally, cognitive and executive functions were not impaired, with participants' feedback indicating minimal discomfort. Conclusions: Our data indicate that NIR-II tPBM is safe with specific parameters, highlighting its potential for brain protection. △ Less

Submitted 13 July, 2024; originally announced July 2024.
arXiv:2407.09540 [pdf, other]

eess.IV cs.CE cs.CV cs.LG q-bio.TO

Prompting Whole Slide Image Based Genetic Biomarker Prediction

Authors: Ling Zhang, Boxiang Yun, Xingran Xie, Qingli Li, Xinxing Li, Yan Wang

Abstract: Prediction of genetic biomarkers, e.g., microsatellite instability and BRAF in colorectal cancer is crucial for clinical decision making. In this paper, we propose a whole slide image (WSI) based genetic biomarker prediction method via prompting techniques. Our work aims at addressing the following challenges: (1) extracting foreground instances related to genetic biomarkers from gigapixel WSIs, a… ▽ More Prediction of genetic biomarkers, e.g., microsatellite instability and BRAF in colorectal cancer is crucial for clinical decision making. In this paper, we propose a whole slide image (WSI) based genetic biomarker prediction method via prompting techniques. Our work aims at addressing the following challenges: (1) extracting foreground instances related to genetic biomarkers from gigapixel WSIs, and (2) the interaction among the fine-grained pathological components in WSIs.Specifically, we leverage large language models to generate medical prompts that serve as prior knowledge in extracting instances associated with genetic biomarkers. We adopt a coarse-to-fine approach to mine biomarker information within the tumor microenvironment. This involves extracting instances related to genetic biomarkers using coarse medical prior knowledge, grouping pathology instances into fine-grained pathological components and mining their interactions. Experimental results on two colorectal cancer datasets show the superiority of our method, achieving 91.49% in AUC for MSI classification. The analysis further shows the clinical interpretability of our method. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/PromptBio. △ Less

Submitted 26 June, 2024; originally announced July 2024.

Comments: 11 pages, 3 figures, MICCAI2024
arXiv:2407.09488 [pdf, other]

q-bio.NC cs.LG cs.NE

Manifold Learning via Memory and Context

Authors: Xin Li

Abstract: Given a memory with infinite capacity, can we solve the learning problem? Apparently, nature has solved this problem as evidenced by the evolution of mammalian brains. Inspired by the organizational principles underlying hippocampal-neocortical systems, we present a navigation-based approach to manifold learning using memory and context. The key insight is to navigate on the manifold and memorize… ▽ More Given a memory with infinite capacity, can we solve the learning problem? Apparently, nature has solved this problem as evidenced by the evolution of mammalian brains. Inspired by the organizational principles underlying hippocampal-neocortical systems, we present a navigation-based approach to manifold learning using memory and context. The key insight is to navigate on the manifold and memorize the positions of each route as inductive/design bias of direct-fit-to-nature. We name it navigation-based because our approach can be interpreted as navigating in the latent space of sensorimotor learning via memory (local maps) and context (global indexing). The indexing to the library of local maps within global coordinates is collected by an associative memory serving as the librarian, which mimics the coupling between the hippocampus and the neocortex. In addition to breaking from the notorious bias-variance dilemma and the curse of dimensionality, we discuss the biological implementation of our navigation-based learning by episodic and semantic memories in neural systems. The energy efficiency of navigation-based learning makes it suitable for hardware implementation on non-von Neumann architectures, such as the emerging in-memory computing paradigm, including spiking neural networks and memristor neural networks. △ Less

Submitted 17 May, 2024; originally announced July 2024.
arXiv:2407.09278 [pdf, ps, other]

math-ph math.DS math.SP

Exact local distribution of the absolutely continuous spectral measure

Authors: Xianzhe Li, Jiangong You, Qi Zhou

Abstract: It is well-established that the spectral measure for one-frequency Schrödinger operators with Diophantine frequencies exhibits optimal $1/2$-Hölder continuity within the absolutely continuous spectrum. This study extends these findings by precisely characterizing the local distribution of the spectral measure for dense small potentials, including a notable result for any subcritical almost Mathieu… ▽ More It is well-established that the spectral measure for one-frequency Schrödinger operators with Diophantine frequencies exhibits optimal $1/2$-Hölder continuity within the absolutely continuous spectrum. This study extends these findings by precisely characterizing the local distribution of the spectral measure for dense small potentials, including a notable result for any subcritical almost Mathieu operators. Additionally, we investigate the stratified Hölder continuity of the spectral measure at subcritical energies. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 49 pages
arXiv:2407.09227 [pdf, other]

cond-mat.quant-gas quant-ph

Stability and decay of subradiant patterns in a quantum gas with photon-mediated interactions

Authors: Alexander Baumgärtner, Simon Hertlein, Tom Schmit, Davide Dreon, Carlos Máximo, Xiangliang Li, Giovanna Morigi, Tobias Donner

Abstract: The phenomenon of subradiance, marked by its surprising suppression of spontaneous emission, challenges conventional expectations of the collective behavior of scatterers. We study subradiance in the experimental setting of a Bose-Einstein condensate positioned at the mode crossing of two optical cavities. In this setup, subradiance manifests in the form of metastable density structures that suppr… ▽ More The phenomenon of subradiance, marked by its surprising suppression of spontaneous emission, challenges conventional expectations of the collective behavior of scatterers. We study subradiance in the experimental setting of a Bose-Einstein condensate positioned at the mode crossing of two optical cavities. In this setup, subradiance manifests in the form of metastable density structures that suppress emission into one cavity mode, thereby preventing relaxation to the stationary, superradiant grating that minimizes the system's energy. We observe lifetimes of the subradiant states exceeding hundred milliseconds, far surpassing any characteristic dynamic time scale of the system. Eventually, an instability triggers a rapid transition to the superradiant stationary pattern. We reproduce these dynamics by a quantum mean field model, suggesting that subradiance shares characteristics with quasi-stationary states predicted in other long-range interacting systems such as astrophysical clusters and plasmas. This behavior highlights the potential of photon-mediated long-range forces as controllable and exploitable quantum cooperative phenomenon. △ Less

Submitted 12 July, 2024; originally announced July 2024.
arXiv:2407.09139 [pdf, other]

hep-ex

Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II

Authors: Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, H. Aihara, N. Akopov, A. Aloisio, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, S. Bansal, M. Barrett, J. Baudot, A. Baur, A. Beaubien, F. Becherer , et al. (414 additional authors not shown)

Abstract: We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det… ▽ More We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

Report number: Belle II Preprint 2024-009, KEK Preprint 2024-1
arXiv:2407.09088 [pdf, other]

eess.IV cs.AI cs.CV

FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

Authors: Marawan Elbatel, Keyuan Liu, Yanqi Yang, Xiaomeng Li

Abstract: Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis… ▽ More Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at: https://github.com/xmed-lab/FD-SOS. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: MICCAI 2024
arXiv:2407.08944 [pdf, other]

cs.CV eess.IV

Bora: Biomedical Generalist Video Generation Model

Authors: Weixiang Sun, Xiaocao You, Ruizhe Zheng, Zhengqing Yuan, Xiang Li, Lifang He, Quanzheng Li, Lichao Sun

Abstract: Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical… ▽ More Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical procedures and detailed anatomical structures. This paper introduces Bora, the first spatio-temporal diffusion probabilistic model designed for text-guided biomedical video generation. Bora leverages Transformer architecture and is pre-trained on general-purpose video generation tasks. It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus, which includes paired text-video data from various biomedical fields. To the best of our knowledge, this is the first attempt to establish such a comprehensive annotated biomedical video dataset. Bora is capable of generating high-quality video data across four distinct biomedical domains, adhering to medical expert standards and demonstrating consistency and diversity. This generalist video generative model holds significant potential for enhancing medical consultation and decision-making, particularly in resource-limited settings. Additionally, Bora could pave the way for immersive medical training and procedure planning. Extensive experiments on distinct medical modalities such as endoscopy, ultrasound, MRI, and cell tracking validate the effectiveness of our model in understanding biomedical instructions and its superior performance across subjects compared to state-of-the-art generation models. △ Less

Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08928 [pdf, ps, other]

math.AP

Dynamics for a diffusive epidemic model with a free boundary: spreading speed

Authors: Xueping Li, Lei Li, Mingxin Wang

Abstract: We study the spreading speed of a diffusive epidemic model proposed by Li et al. \cite{LL}, where the Stefan boundary condition is imposed at the right boundary, and the left boundary is subject to the homogeneous Dirichlet and Neumann condition, respectively. A spreading-vanishing dichotomy and some sharp criteria were obtained in \cite{LL}. In this paper, when spreading happens, we not only obta… ▽ More We study the spreading speed of a diffusive epidemic model proposed by Li et al. \cite{LL}, where the Stefan boundary condition is imposed at the right boundary, and the left boundary is subject to the homogeneous Dirichlet and Neumann condition, respectively. A spreading-vanishing dichotomy and some sharp criteria were obtained in \cite{LL}. In this paper, when spreading happens, we not only obtain the exact spreading speed of the spreading front described by the right boundary, but derive some sharp estimates on the asymptotical behavior of solution component $(u,v)$. Our arguments depend crucially on some detailed understandings for a corresponding semi-wave problem and a steady state problem. △ Less

Submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08903 [pdf, other]

cs.CR cs.AI cs.AR

doi 10.1145/3622781.3674168

TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing

Authors: Husheng Han, Xinyao Zheng, Yuanbo Wen, Yifan Hao, Erhu Feng, Ling Liang, Jianan Mu, Xiaqing Li, Tianyun Ma, Pengwei Jin, Xinkai Song, Zidong Du, Qi Guo, Xing Hu

Abstract: Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computin… ▽ More Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling. To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by ASPLOS 2024
arXiv:2407.08661 [pdf, other]

cond-mat.str-el cond-mat.mes-hall

Self-consistent theory for the fractional quantum anomalous Hall effect in rhombohedral pentalayer graphene

Authors: Ke Huang, Xiao Li, Sankar Das Sarma, Fan Zhang

Abstract: The fractional quantum anomalous Hall (FQAH) effect in rhombohedral pentalayer graphene (PLG) has attracted significant attention due to its potential for observing exotic quantum states. In this work, we present a self-consistent Hartree-Fock theory for the FQAH effect in rhombohedral PLG. In particular, we focus on the convergence of the Hartree-Fock calculation with various reference fields and… ▽ More The fractional quantum anomalous Hall (FQAH) effect in rhombohedral pentalayer graphene (PLG) has attracted significant attention due to its potential for observing exotic quantum states. In this work, we present a self-consistent Hartree-Fock theory for the FQAH effect in rhombohedral PLG. In particular, we focus on the convergence of the Hartree-Fock calculation with various reference fields and discuss the stability of the FQAH states in PLG. We show that the so-called charge neutrality scheme provides an unambiguous result for the Hartree-Fock calculation, as it ensures a convergence with respect to the momentum cutoff. Based on the Hartree-Fock band structure, we further carry out exact diagonalization calculations to explore the stability of the FQAH states in PLG. Our work provides an improved and unified (minimal) theoretical framework to understand the FQAH effect in rhombohedral PLG and paves the way for future experimental and theoretical studies. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 18 pages, 12 figures. Comments are welcome
arXiv:2407.08516 [pdf, other]

cs.AI

Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents

Authors: Haoyi Xiong, Zhiyuan Wang, Xuhong Li, Jiang Bian, Zeke Xie, Shahid Mumtaz, Laura E. Barnes

Abstract: This article explores the convergence of connectionist and symbolic artificial intelligence (AI), from historical debates to contemporary advancements. Traditionally considered distinct paradigms, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs), exemplified by ChatGPT and GPT-4, highlig… ▽ More This article explores the convergence of connectionist and symbolic artificial intelligence (AI), from historical debates to contemporary advancements. Traditionally considered distinct paradigms, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs), exemplified by ChatGPT and GPT-4, highlight the potential of connectionist architectures in handling human language as a form of symbols. The study argues that LLM-empowered Autonomous Agents (LAAs) embody this paradigm convergence. By utilizing LLMs for text-based knowledge modeling and representation, LAAs integrate neuro-symbolic AI principles, showcasing enhanced reasoning and decision-making capabilities. Comparing LAAs with Knowledge Graphs within the neuro-symbolic AI theme highlights the unique strengths of LAAs in mimicking human-like reasoning processes, scaling effectively with large datasets, and leveraging in-context samples without explicit re-training. The research underscores promising avenues in neuro-vector-symbolic integration, instructional encoding, and implicit reasoning, aimed at further enhancing LAA capabilities. By exploring the progression of neuro-symbolic AI and proposing future research trajectories, this work advances the understanding and development of AI technologies. △ Less

Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08468 [pdf, other]

math.ST

Matching-Based Policy Learning

Authors: Xuqiao Li, Ying Yan

Abstract: Treatment heterogeneity is ubiquitous in many areas, motivating practitioners to search for the optimal policy that maximizes the expected outcome based on individualized characteristics. However, most existing policy learning methods rely on weighting-based approaches, which may suffer from high instability in observational studies. To enhance the robustness of the estimated policy, we propose a… ▽ More Treatment heterogeneity is ubiquitous in many areas, motivating practitioners to search for the optimal policy that maximizes the expected outcome based on individualized characteristics. However, most existing policy learning methods rely on weighting-based approaches, which may suffer from high instability in observational studies. To enhance the robustness of the estimated policy, we propose a matching-based estimator of the policy improvement upon a randomized baseline. After correcting the conditional bias, we learn the optimal policy by maximizing the estimate over a policy class. We derive a non-asymptotic high probability bound for the regret of the learned policy and show that the convergence rate is almost $1/\sqrt{n}$. The competitive finite sample performance of the proposed method is demonstrated in extensive simulation studies and a real data application. △ Less

Submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08351 [pdf, other]

cs.CL cs.LG

AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

Authors: Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto

Abstract: Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), a… ▽ More Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), and (iii) difficulty (i.e., the benchmark should be difficult for existing models, leaving headroom for future improvement). We operationalize these three desiderata and cast benchmark creation as a search problem, that of finding benchmarks that that satisfy all three desiderata. To tackle this search problem, we present AutoBencher, which uses a language model to automatically search for datasets that meet the three desiderata. AutoBencher uses privileged information (e.g. relevant documents) to construct reliable datasets, and adaptivity with reranking to optimize for the search objective. We use AutoBencher to create datasets for math, multilingual, and knowledge-intensive question answering. The scalability of AutoBencher allows it to test fine-grained categories and tail knowledge, creating datasets that are on average 27% more novel and 22% more difficult than existing benchmarks. A closer investigation of our constructed datasets shows that we can identify specific gaps in LM knowledge in language models that are not captured by existing benchmarks, such as Gemini Pro performing much worse on question answering about the Permian Extinction and Fordism, while OpenAGI-7B performing surprisingly well on QA about COVID-19. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: preprint
arXiv:2407.08303 [pdf, other]

cs.CV cs.AI

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Authors: Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan

Abstract: Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity… ▽ More Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity of such hyper-detailed datasets currently hinders progress within the MLLM community. The bottleneck stems from the limited perceptual capabilities of current caption engines, which fall short in providing complete and accurate annotations. To facilitate the cutting-edge research of MLLMs on comprehensive vision perception, we thereby propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions. Specifically, Perceptual Fusion integrates diverse perception experts as image priors to provide explicit information on visual elements and adopts an efficient MLLM as a centric pivot to mimic advanced MLLMs' perception abilities. We carefully select 1M highly representative images from uncurated LAION dataset and generate dense descriptions using our engine, dubbed DenseFusion-1M. Extensive experiments validate that our engine outperforms its counterparts, where the resulting dataset significantly improves the perception and cognition abilities of existing MLLMs across diverse vision-language benchmarks, especially with high-resolution images as inputs. The dataset and code are publicly available at https://github.com/baaivision/DenseFusion. △ Less

Submitted 11 July, 2024; originally announced July 2024.
arXiv:2407.08273

cs.CL

RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider. △ Less

Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Further improvement and modification are needed.
arXiv:2407.08154 [pdf, other]

cs.CE

Bayesian uncertainty analysis for underwater 3D reconstruction with neural radiance fields

Authors: Haojie Lian, Xinhao Li, Yilin Qu, Jing Du, Zhuxuan Meng, Jie Liu, Leilei Chen

Abstract: Neural radiance fields (NeRFs) are a deep learning technique that can generate novel views of 3D scenes using sparse 2D images from different viewing directions and camera poses. As an extension of conventional NeRFs in underwater environment, where light can get absorbed and scattered by water, SeaThru-NeRF was proposed to separate the clean appearance and geometric structure of underwater scene… ▽ More Neural radiance fields (NeRFs) are a deep learning technique that can generate novel views of 3D scenes using sparse 2D images from different viewing directions and camera poses. As an extension of conventional NeRFs in underwater environment, where light can get absorbed and scattered by water, SeaThru-NeRF was proposed to separate the clean appearance and geometric structure of underwater scene from the effects of the scattering medium. Since the quality of the appearance and structure of underwater scenes is crucial for downstream tasks such as underwater infrastructure inspection, the reliability of the 3D reconstruction model should be considered and evaluated. Nonetheless, owing to the lack of ability to quantify uncertainty in 3D reconstruction of underwater scenes under natural ambient illumination, the practical deployment of NeRFs in unmanned autonomous underwater navigation is limited. To address this issue, we introduce a spatial perturbation field D_omega based on Bayes' rays in SeaThru-NeRF and perform Laplace approximation to obtain a Gaussian distribution N(0,Sigma) of the parameters omega, where the diagonal elements of Sigma correspond to the uncertainty at each spatial location. We also employ a simple thresholding method to remove artifacts from the rendered results of underwater scenes. Numerical experiments are provided to demonstrate the effectiveness of this approach. △ Less

Submitted 10 July, 2024; originally announced July 2024.
arXiv:2407.08101 [pdf, other]

cs.CV

Live Fitness Coaching as a Testbed for Situated Interaction

Authors: Sunny Panchal, Apratim Bhattacharyya, Guillaume Berger, Antoine Mercier, Cornelius Bohm, Florian Dietrichkeit, Reza Pourreza, Xuanlin Li, Pulkit Madan, Mingu Lee, Mark Todorovich, Ingo Bax, Roland Memisevic

Abstract: Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However, models trained on existing tasks are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions where an AI model may proactively deliver tim… ▽ More Tasks at the intersection of vision and language have had a profound impact in advancing the capabilities of vision-language models such as dialog-based assistants. However, models trained on existing tasks are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions where an AI model may proactively deliver timely responses or feedback based on the unfolding situation in real-time are an open challenge. In this work, we present the QEVD benchmark and dataset which explores human-AI interaction in the challenging, yet controlled, real-world domain of fitness coaching - a task which intrinsically requires monitoring live user activity and providing timely feedback. It is the first benchmark that requires assistive vision-language models to recognize complex human actions, identify mistakes grounded in those actions, and provide appropriate feedback. Our experiments reveal the limitations of existing state of the art vision-language models for such asynchronous situated interactions. Motivated by this, we propose a simple end-to-end streaming baseline that can respond asynchronously to human actions with appropriate feedbacks at the appropriate time. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: The benchmark and dataset is available here: https://developer.qualcomm.com/software/ai-datasets/qevd
arXiv:2407.08031 [pdf, ps, other]

math.DG

Coarse extrinsic curvature of Riemannian submanifolds

Authors: Marc Arnaudon, Xue-Mei Li, Benedikt Petko

Abstract: Inspired by Y. Ollivier's coarse Ricci curvature, we introduce a novel concept of coarse extrinsic curvature on Riemannian submanifolds. This is defined through Wasserstein distances between test probability measures supported in the tubular neighbourhood of the submanifold. This framework provides an understanding of the geometric properties of embeddings, offering valuable insights into their cu… ▽ More Inspired by Y. Ollivier's coarse Ricci curvature, we introduce a novel concept of coarse extrinsic curvature on Riemannian submanifolds. This is defined through Wasserstein distances between test probability measures supported in the tubular neighbourhood of the submanifold. This framework provides an understanding of the geometric properties of embeddings, offering valuable insights into their curvature dynamics and intrinsic structures. Additionally, this coarse curvature can also be extracted from empirical measures supported on random point clouds generated by a Poisson point process, and has the potential to extend to metric embeddings. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 52 pages, 6 figures

MSC Class: 53B25 (Primary) 49Q22 (Secondary)
arXiv:2407.07863 [pdf, other]

astro-ph.IM physics.space-ph

Intensity-sensitive quality assessment of extended sources in astronomical images

Authors: X. Li, K. Adamek, W. Armour

Abstract: Radio astronomy studies the Universe by observing the radio emissions of celestial bodies. Different methods can be used to recover the sky brightness distribution (SBD), which describes the distribution of celestial sources from recorded data, with the output dependent on the method used. Image quality assessment (IQA) indexes can be used to compare the differences between restored SBDs produced… ▽ More Radio astronomy studies the Universe by observing the radio emissions of celestial bodies. Different methods can be used to recover the sky brightness distribution (SBD), which describes the distribution of celestial sources from recorded data, with the output dependent on the method used. Image quality assessment (IQA) indexes can be used to compare the differences between restored SBDs produced by different image reconstruction techniques to evaluate the effectiveness of different techniques. However, reconstructed images (for the same SBD) can appear to be very similar, especially when observed by the human visual system (HVS). Hence current structural similarity methods, inspired by the HVS, are not effective. In the past, we have proposed two methods to assess point source images, where low amounts of concentrated information are present in larger regions of noise-like data. But for images that include extended source(s), the increase in complexity of the structure makes the IQA methods for point sources over-sensitive since the important objects cannot be described by isolated point sources. Therefore, in this article we propose augmented Low-Information Similarity Index (augLISI), an improved version of LISI, to assess images including extended source(s). Experiments have been carried out to illustrate how this new IQA method can help with the development and study of astronomical imaging techniques. Note that although we focus on radio astronomical images herein, these IQA methods are also applicable to other astronomical images, and imaging techniques. △ Less

Submitted 10 July, 2024; originally announced July 2024.
arXiv:2407.07807 [pdf, other]

astro-ph.HE

Revisiting the dead time effects of Insight-HXMT/ME on timing analysis

Authors: Youli Tuo, Xiaobo Li, Ying Tan, Baiyang Wu, Weichun Jiang, Liming Song, Jinlu Qu, Sudeep Gogate, Shuang-Nan Zhang, Andrea Santangelo

Abstract: Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the sim… ▽ More Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the simulation of electronic read-out mechanism that causes the dead time, and the real data. We investigate dead time effects on the pulse profile as well as the Quasi-Periodic Oscillation (QPO) signals. The dead time coefficient suggests a linear correlation with the observed count rate in each phase bin of the pulse profile according to the simulation of periodic signal as well as the real data observed on Swift J0243.6+6124. The Fourier-amplitude-difference (FAD) method could well recover the intrinsic shape of the observed PDS in the case that the PDS is from two identical detectors. We apply this technique on ME, by splitting the 9 FPGA modules into 2 groups. The results indicate that the FAD technique suits the case when two groups of detectors are not largely different; and the recovered PDS of Sco X-1 observed by ME slightly enhances the significance of the previously known QPO signal, meanwhile the root-mean-square of QPO is significantly improved. We provide the FAD correction tool implemented in HXMTDAS for users in the future to better analyze QPO signals. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 8 figures, accepted for publication in MNRAS main journal
arXiv:2407.07763 [pdf, other]

cs.CV

S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation

Authors: Qixiang Zhang, Haonan Wang, Xiaomeng Li

Abstract: Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA).… ▽ More Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA). In this paper, we aim to develop a generic framework that masters all three tasks. We notice a critical shared challenge across three scenarios: the explicit semantic knowledge for segmentation performance and rich domain knowledge for generalizability exclusively exist in the labeled set and unlabeled set respectively. Such discrepancy hinders existing methods from effectively comprehending both types of knowledge under semi-supervised settings. To tackle this challenge, we develop a Semantic & Domain Knowledge Messenger (S&D Messenger) which facilitates direct knowledge delivery between the labeled and unlabeled set, and thus allowing the model to comprehend both of them in each individual learning flow. Equipped with our S&D Messenger, a naive pseudo-labeling method can achieve huge improvement on six benchmark datasets for SSMIS (+7.5%), UMDA (+5.6%), and Semi-MDG tasks (+1.14%), compared with state-of-the-art methods designed for specific tasks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 10 pages, under review of IEEE Transcations on Medical Imaging
arXiv:2407.07760 [pdf, other]

cs.CV cs.AI

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

Abstract: Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations. In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to… ▽ More Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations. In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to address the above issues. Specifically, we construct a spatial-semantic network comprising a semantic embedding block and spatial dependencies modeling block to associate the pretrained ViT features with global semantic features and local spatial features, providing a comprehensive target representation. In addition, we develop a masked cross-attention module to generate object queries that focus on the most discriminative parts of target objects during query propagation, alleviating noise accumulation and ensuring effective long-term query propagation. The experimental results show that the proposed method set a new state-of-the-art performance on multiple datasets, including the DAVIS2017 test (89.1%), YoutubeVOS 2019 (88.5%), MOSE (75.1%), LVOS test (73.0%), and LVOS val (75.1%), which demonstrate the effectiveness and generalization capacity of the proposed method. We will make all source code and trained models publicly available. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Winner solution of the VOTS2024 Challenge
arXiv:2407.07651 [pdf, other]

hep-ex physics.data-an

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.
arXiv:2407.07478 [pdf, other]

cs.CV

EA-VTR: Event-Aware Video-Text Retrieval

Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improvements from both data and model perspectives. In terms of pre-training data, we focus on supplementing the missing specific event content and event temporal transitions with the proposed event augmentation strategies. Based on the event-augmented data, we construct a novel Event-Aware Video-Text Retrieval model, ie, EA-VTR, which achieves powerful video-text retrieval ability through superior video event awareness. EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events. Our method not only significantly outperforms existing approaches on multiple datasets for Text-to-Video Retrieval and Video Action Recognition tasks, but also demonstrates superior event content perceive ability on Multi-event Video-Text Retrieval and Video Moment Retrieval tasks, as well as outstanding event temporal logic understanding ability on Test of Time task. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

Search v0.5.6 released 2020-02-24