subscribe to arXiv mailings

Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning

Authors: Tao Liu, Yuhang Zhang, Zhu Feng, Zhiqin Yang, Chen Xu, Dapeng Man, Wu Yang

Abstract: Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination… ▽ More Backdoors on federated learning will be diluted by subsequent benign updates. This is reflected in the significant reduction of attack success rate as iterations increase, ultimately failing. We use a new metric to quantify the degree of this weakened backdoor effect, called attack persistence. Given that research to improve this performance has not been widely noted,we propose a Full Combination Backdoor Attack (FCBA) method. It aggregates more combined trigger information for a more complete backdoor pattern in the global model. Trained backdoored global model is more resilient to benign updates, leading to a higher attack success rate on the test set. We test on three datasets and evaluate with two models across various settings. FCBA's persistence outperforms SOTA federated learning backdoor attacks. On GTSRB, postattack 120 rounds, our attack success rate rose over 50% from baseline. The core code of our method is available at https://github.com/PhD-TaoLiu/FCBA. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(19): 21359-21367

arXiv:2404.17589 [pdf]

An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However,… ▽ More As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However, the off-policy RL algorithms used for MTF so far have the following severe problems: 1) to avoid out-of-distribution (OOD) problem, their constraints are overly strict, which seriously damage their performance; 2) they are unaware of the exploration policy used for producing training data and never interact with real environment, so only suboptimal policy can be learned; 3) the traditional exploration policies are inefficient and hurt user experience. To solve the above problems, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs. IntegratedRL-MTF integrates off-policy RL model with our online exploration policy to relax overstrict and complicated constraints, which significantly improves its performance. We also design an extremely efficient exploration policy, which eliminates low-value exploration space and focuses on exploring potential high-value state-action pairs. Moreover, we adopt progressive training mode to further enhance our model's performance with the help of our exploration policy. We conduct extensive offline and online experiments in the short video channel of Tencent News. The results demonstrate that our model outperforms other models remarkably. IntegratedRL-MTF has been fully deployed in our RS and other large-scale RSs in Tencent, which have achieved significant improvements. △ Less

Submitted 6 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.17520 [pdf, other]

A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

Abstract: As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traff… ▽ More As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.17485 [pdf, other]

A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research

Authors: Tianyu Zhang, Chuanyu Xue, Jiachen Wang, Zelin Yun, Natong Lin, Song Han

Abstract: Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In th… ▽ More Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In this article, we conduct a comprehensive literature review on existing IIoT testbeds, aiming to identify benchmark performance, research gaps and explore emerging trends in IIoT systems. We first review the state-of-the-art resource management solutions proposed for IIoT applications. We then categorize the reviewed testbeds according to their deployed communication protocols (including TSN, IEEE 802.15.4, IEEE 802.11 and 5G) and discuss the design and usage of each testbed. Driven by the knowledge gained during this study, we present suggestions and good practices for researchers and practitioners who are planning to design and develop IIoT testbeds for connectivity research. △ Less

Submitted 30 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17396 [pdf, other]

Onset of global instability in a premixed annular V-flame

Authors: Chuhan Wang, Christopher M. Douglas, Yu Guan, Chunxiao Xu, Lutz Lesshafft

Abstract: We investigate self-excited axisymmetric oscillations of a lean premixed methane--air V-flame in a laminar annular jet. The flame is anchored near the rim of the centrebody, forming an inverted cone, while the strongest vorticity is concentrated along the outer shear layer of the annular jet. Consequently, the reaction and vorticity dynamics are largely separated, except where they coalesce near t… ▽ More We investigate self-excited axisymmetric oscillations of a lean premixed methane--air V-flame in a laminar annular jet. The flame is anchored near the rim of the centrebody, forming an inverted cone, while the strongest vorticity is concentrated along the outer shear layer of the annular jet. Consequently, the reaction and vorticity dynamics are largely separated, except where they coalesce near the flame tip. The global eigenmodes corresponding to the linearised reacting flow equations around the steady base state are computed in an axisymmetric setting. We identify an arc branch of eigenmodes exhibiting strong oscillations at the flame tip. The associated eigenvalues are robust with respect to domain truncation and numerical discretisation, and they become destabilised as the Reynolds number increases. The frequency of the leading eigenmode is found to correspond to the Lagrangian disturbance advection time from the nozzle outlet to the flame tip. This linear result suggests a non-local feedback mechanism consistent with the scenario of `intrinsic thermoacoustic instability'. Nonlinear time-resolved simulation further reveals notable hysteresis phenomena in the subcritical regime prior to instability. Hence, even when the flame is linearly stable, perturbations of sufficient amplitude can trigger limit-cycle oscillations and higher-dimensional dynamics sustained by nonlinear feedback. Notably, linear analysis of the subcritical time-averaged limit-cycle state yields eigenvalues that do not match the nonlinear periodic oscillation frequencies, emphasising the essential role of nonlinear harmonic interactions in the system dynamics. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17238 [pdf, other]

TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

Authors: Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

Abstract: Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal incon… ▽ More Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.17151 [pdf, other]

MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection

Authors: Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiaonan Luo, Xiangjian He

Abstract: Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a n… ▽ More Bottom-up text detection methods play an important role in arbitrary-shape scene text detection but there are two restrictions preventing them from achieving their great potential, i.e., 1) the accumulation of false text segment detections, which affects subsequent processing, and 2) the difficulty of building reliable connections between text segments. Targeting these two problems, we propose a novel approach, named ``MorphText", to capture the regularity of texts by embedding deep morphology for arbitrary-shape text detection. Towards this end, two deep morphological modules are designed to regularize text segments and determine the linkage between them. First, a Deep Morphological Opening (DMOP) module is constructed to remove false text segment detections generated in the feature extraction process. Then, a Deep Morphological Closing (DMCL) module is proposed to allow text instances of various shapes to stretch their morphology along their most significant orientation while deriving their connections. Extensive experiments conducted on four challenging benchmark datasets (CTW1500, Total-Text, MSRA-TD500 and ICDAR2017) demonstrate that our proposed MorphText outperforms both top-down and bottom-up state-of-the-art arbitrary-shape scene text detection approaches. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by Transaction on Multimedia

arXiv:2404.17134 [pdf, ps, other]

Boundedness of log Fano cone singularities and discreteness of local volumes

Authors: Chenyang Xu, Ziquan Zhuang

Abstract: We prove that in any fixed dimension, K-semistable log Fano cone singularities whose volumes are bounded from below by a fixed positive number form a bounded set. As a consequence, we show that the set of local volumes of klt singularities of a fixed dimension has zero as the only accumulation point. We prove that in any fixed dimension, K-semistable log Fano cone singularities whose volumes are bounded from below by a fixed positive number form a bounded set. As a consequence, we show that the set of local volumes of klt singularities of a fixed dimension has zero as the only accumulation point. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 20 pages. Comments are welcome!

arXiv:2404.16821 [pdf, other]

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. (2) Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448$\times$448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. (3) High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code has been released at https://github.com/OpenGVLab/InternVL. △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Technical report

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.15961 [pdf, other]

Soil analysis with machine-learning-based processing of stepped-frequency GPR field measurements: Preliminary study

Authors: Chunlei Xu, Michael Pregesbauer, Naga Sravani Chilukuri, Daniel Windhager, Mahsa Yousefi, Pedro Julian, Lothar Ratschbacher

Abstract: Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step… ▽ More Ground Penetrating Radar (GPR) has been widely studied as a tool for extracting soil parameters relevant to agriculture and horticulture. When combined with Machine-Learning-based (ML) methods, high-resolution Stepped Frequency Countinuous Wave Radar (SFCW) measurements hold the promise to give cost effective access to depth resolved soil parameters, including at root-level depth. In a first step in this direction, we perform an extensive field survey with a tractor mounted SFCW GPR instrument. Using ML data processing we test the GPR instrument's capabilities to predict the apparent electrical conductivity (ECaR) as measured by a simultaneously recording Electromagnetic Induction (EMI) instrument. The large-scale field measurement campaign with 3472 co-registered and geo-located GPR and EMI data samples distributed over ~6600 square meters was performed on a golf course. The selected terrain benefits from a high surface homogeneity, but also features the challenge of only small, and hence hard to discern, variations in the measured soil parameter. Based on the quantitative results we suggest the use of nugget-to-sill ratio as a performance metric for the evaluation of end-to-end ML performance in the agricultural setting and discuss the limiting factors in the multi-sensor regression setting. The code is released as open source and available at https://opensource.silicon-austria.com/xuc/soil-analysis-machine-learning-stepped-frequency-gpr. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15898 [pdf, ps, other]

Quantum metrology in a driven-dissipation down-conversion system beyond the parametric approximation

Authors: Dong Xie, Chunling Xu

Abstract: We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without… ▽ More We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without a dissipation, the super-Heisenberg limit can be obtained when the initial state is the direct product of classical state and quantum state. This does not require the use of entanglement resources which are not easy to prepare. When the pump mode suffers from a single-photon dissipation, the measurement uncertainty of the coupling strength is close to 0 as the coupling strength approaches 0 with a coherent driving. The direct photon detection is proved to be the optimal measurement. This result has not been changed when the signal modes suffer from the two-photon dissipation. When the signal modes also suffer from the single-mode dissipation, the information of the coupling strength can still be obtained in the steady state. In addition, the measurement uncertainty of the coupling strength can also be close to 0 and become independent of noise temperature as the critical point between the normal and superradiance phase approaches. Finally, we show that a driven-dissipation down-conversion system can be used as a precise quantum sensor to measure the driving strength. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 8 pages, 2 figures

arXiv:2404.15254 [pdf, other]

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

Abstract: This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributio… ▽ More This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributions prevalent in real-world scenarios. Therefore, the UniMER dataset enables the training of a robust and high-accuracy MER model and comprehensive evaluation of model performance. Moreover, we introduce the Universal Mathematical Expression Recognition Network (UniMERNet), an innovative framework designed to enhance MER in practical scenarios. UniMERNet incorporates a Length-Aware Module to process formulas of varied lengths efficiently, thereby enabling the model to handle complex mathematical expressions with greater accuracy. In addition, UniMERNet employs our UniMER-1M data and image augmentation techniques to improve the model's robustness under different noise conditions. Our extensive experiments demonstrate that UniMERNet outperforms existing MER models, setting a new benchmark in various scenarios and ensuring superior recognition quality in real-world applications. The dataset and model are available at https://github.com/opendatalab/UniMERNet. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 17 pages, 5 figures

arXiv:2404.14290 [pdf]

Comparison of h-BN and graphene layers as grain boundary materials for granular FePt-$\text{L}1_0$ thin films

Authors: B. S. D. Ch. S. Varaprasad, Chengchao Xu, Brandon Reese, David E. Laughlin, Jian-Gang, Zhu

Abstract: Granular $\text{L}1_0$-FePt thin films with small columnar grains are essential for heat-assisted magnetic recording media. While hexagonal boron nitride(h-BN) has proven effective for promoting columnar FePt grains, we explored multilayer graphene as an alternative grain boundary material leveraging its structural similarity to h-BN. The FePt granular thin films with carbon-based grain boundary m… ▽ More Granular $\text{L}1_0$-FePt thin films with small columnar grains are essential for heat-assisted magnetic recording media. While hexagonal boron nitride(h-BN) has proven effective for promoting columnar FePt grains, we explored multilayer graphene as an alternative grain boundary material leveraging its structural similarity to h-BN. The FePt granular thin films with carbon-based grain boundary materials(GBMs) were deposited by cosputtering on Si/SiO2 substrates with substrate bias at 650°C. The RF bias and high temperature facilitated formation of interlinked graphene nanoribbons wrapping around FePt grains, yielding 7.5 nm diameter, 8 nm height grains with an order parameter of 0.78 and a perpendicular coercivity of 40 kOe. However, the formation of graphene nanoribbons could not effectively promote columnar structures, likely due to co-existing amorphous carbon in grain boundaries. Optimizing deposition to improve graphene grain boundary quality is necessary to realize this 2D material's potential for achieving desirable microstructures for HAMR media. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 13 pages, 5 figures, first presented in MMM 2023 conference at Dallas

arXiv:2404.14219 [pdf, other]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts. △ Less

Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2404.13903 [pdf, other]

Accelerating Image Generation with Sub-path Linear Approximation Model

Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs allows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step generation tasks, achieving state-of-the-art performance both on FID and the quality of the generated images. △ Less

Submitted 22 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13840 [pdf, other]

Study of $e^+e^-\toωX(3872)$ and $γX(3872)$ from 4.66 to 4.95 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be… ▽ More Using data samples with an integrated luminosity of $4.5~\text{fb}^{-1}$ collected by the BESIII detector at center-of-mass energies ranging from 4.66 to 4.95 GeV, we study the processes of $e^+e^-\toωX(3872)$ and $e^+e^-\toγX(3872)$. With the $e^+e^-\toωX(3872)$ process, the branching fraction ratio $R\equiv\frac{\mathcal{B}(X(3872)\toγJ/ψ)}{\mathcal{B}(X(3872)\toπ^+π^- J/ψ)}$ is measured to be $0.38\pm0.20_\text{stat.}\pm0.01_\text{syst.}$ ($R< 0.83$ at 90\% confidence level). In addition, we measure the ratio of the average cross section of $e^+e^-\toωX(3872)$ to $e^+e^-\toωχ_{c1}(ωχ_{c2})$ to be $σ_{ωX(3872)}/σ_{ωχ_{c1}}~(σ_{ωX(3872)}/σ_{ωχ_{c2}})=5.2\pm1.0_\text{stat.}\pm1.9_\text{syst.}~ (5.5\pm1.1_\text{stat.}\pm2.4_\text{syst.})$. Finally, we search for the process of $e^+e^-\toγX(3872)$, and no obvious signal is observed. The upper limit on the ratio of the average cross section of $e^+e^-\toγX(3872)$ to $e^+e^-\toωX(3872)$ is set as $σ_{γX(3872)}/σ_{ωX(3872)}<0.23$ at 90\% confidence level. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 19 pages, 10 figures

arXiv:2404.13400 [pdf, other]

HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding

Authors: Linhui Xiao, Xiaoshan Yang, Fang Peng, Yaowei Wang, Changsheng Xu

Abstract: Visual grounding, which aims to ground a visual region via natural language, is a task that heavily relies on cross-modal alignment. Existing works utilized uni-modal pre-trained models to transfer visual/linguistic knowledge separately while ignoring the multimodal corresponding information. Motivated by recent advancements in contrastive language-image pre-training and low-rank adaptation (LoRA)… ▽ More Visual grounding, which aims to ground a visual region via natural language, is a task that heavily relies on cross-modal alignment. Existing works utilized uni-modal pre-trained models to transfer visual/linguistic knowledge separately while ignoring the multimodal corresponding information. Motivated by recent advancements in contrastive language-image pre-training and low-rank adaptation (LoRA) methods, we aim to solve the grounding task based on multimodal pre-training. However, there exists significant task gaps between pre-training and grounding. Therefore, to address these gaps, we propose a concise and efficient hierarchical multimodal fine-grained modulation framework, namely HiVG. Specifically, HiVG consists of a multi-layer adaptive cross-modal bridge and a hierarchical multimodal low-rank adaptation (Hi LoRA) paradigm. The cross-modal bridge can address the inconsistency between visual features and those required for grounding, and establish a connection between multi-level visual and text features. Hi LoRA prevents the accumulation of perceptual errors by adapting the cross-modal features from shallow to deep layers in a hierarchical manner. Experimental results on five datasets demonstrate the effectiveness of our approach and showcase the significant grounding capabilities as well as promising energy efficiency advantages. The project page: https://github.com/linhuixiao/HiVG. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: The project page: https://github.com/linhuixiao/HiVG

arXiv:2404.13349 [pdf, other]

Breaking the Memory Wall for Heterogeneous Federated Learning with Progressive Training

Authors: Yebo Wu, Li Li, Chunlin Tian, Chengzhong Xu

Abstract: This paper presents ProFL, a novel progressive FL framework to effectively break the memory wall. Specifically, ProFL divides the model into different blocks based on its original architecture. Instead of updating the full model in each training round, ProFL first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process iterates… ▽ More This paper presents ProFL, a novel progressive FL framework to effectively break the memory wall. Specifically, ProFL divides the model into different blocks based on its original architecture. Instead of updating the full model in each training round, ProFL first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process iterates until the training of the whole model is completed. In this way, the memory footprint is effectively reduced for feasible deployment on heterogeneous devices. In order to preserve the feature representation of each block, we decouple the whole training process into two stages: progressive model shrinking and progressive model growing. During the progressive model shrinking stage, we meticulously design corresponding output modules to assist each block in learning the expected feature representation and obtain the initialization parameters. Then, the obtained output modules are utilized in the corresponding progressive model growing stage. Additionally, to control the training pace for each block, a novel metric from the scalar perspective is proposed to assess the learning status of each block and determines when to trigger the training of the next one. Finally, we theoretically prove the convergence of ProFL and conduct extensive experiments on representative models and datasets to evaluate the effectiveness of ProFL. The results demonstrate that ProFL effectively reduces the peak memory footprint by up to 57.4% and improves model accuracy by up to 82.4%. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12622 [pdf, other]

Maser Activity of Organic Molecules toward Sgr B2(N)

Authors: Ci Xue, Anthony Remijan, Alexandre Faure, Emmanuel Momjian, Todd R. Hunter, Ryan A. Loomis, Eric Herbst, Brett McGuire

Abstract: At centimeter wavelengths, single-dish observations have suggested that the Sagittarius (Sgr) B2 molecular cloud at the Galactic Center hosts weak maser emission from several organic molecules, including CH$_2$NH, HNCNH, and HCOOCH$_3$. However, the lack of spatial distribution information of these new maser species has prevented us from assessing the excitation conditions of the maser emission as… ▽ More At centimeter wavelengths, single-dish observations have suggested that the Sagittarius (Sgr) B2 molecular cloud at the Galactic Center hosts weak maser emission from several organic molecules, including CH$_2$NH, HNCNH, and HCOOCH$_3$. However, the lack of spatial distribution information of these new maser species has prevented us from assessing the excitation conditions of the maser emission as well as their pumping mechanisms. Here, we present a mapping study toward Sgr B2 North (N) to locate the region where the complex maser emission originates. We report the first detection of the Class I methanol (CH$_3$OH) maser at 84 GHz and the first interferometric map of the methanimine (CH$_2$NH) maser at 5.29 GHz toward this region. In addition, we present a tool for modeling and fitting the unsaturated molecular maser signals with non-LTE radiative transfer models and Bayesian analysis using the Markov-Chain Monte Carlo approach. These enable us to quantitatively assess the observed spectral profiles. The results suggest a two-chain-clump model for explaining the intense CH$_3$OH Class I maser emission toward a region with low continuum background radiation. By comparing the spatial origin and extent of maser emission from several molecular species, we find that the 5.29 GHz CH$_2$NH maser has a close spatial relationship with the 84 GHz CH$_3$OH Class I masers. This relationship serves as observational evidence to suggest a similar collisional pumping mechanism for these maser transitions. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 17 figures

arXiv:2404.12353 [pdf, other]

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Authors: Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

Abstract: Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooki… ▽ More Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooking the contemporary need for multimodal video content summarization. Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT). However, the textual summaries in previous multimodal datasets are inadequate. To address these issues, we introduce Instruct-V2Xum, a cross-modal video summarization dataset featuring 30,000 diverse videos sourced from YouTube, with lengths ranging from 40 to 940 seconds and an average summarization ratio of 16.39\%. Each video summary in Instruct-V2Xum is paired with a textual summary that references specific frame indexes, facilitating the generation of aligned video and textual summaries. In addition, we propose a new video summarization framework named V2Xum-LLM. V2Xum-LLM, specifically V2Xum-LLaMA in this study, is the first framework that unifies different video summarization tasks into one large language model's (LLM) text decoder and achieves task-controllable video summarization with temporal prompts and task instructions. Experiments show that V2Xum-LLaMA outperforms strong baseline models on multiple video summarization tasks. Furthermore, we propose an enhanced evaluation metric for V2V and V2VT summarization tasks. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12104 [pdf, other]

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools without necessitating internal model revision. Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions by refining user commands and rectifying model outputs. Systematic evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess alignment capability. Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models like DALLE 3, ensuring user-generated content adheres to ethical standards while maintaining image quality. This study indicates the potential of Ethical-Lens to ensure the sustainable development of open-source text-to-image tools and their beneficial integration into society. Our code is available at https://github.com/yuzhu-cai/Ethical-Lens. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 42 pages, 17 figures, 29 tables

arXiv:2404.11968 [pdf, other]

P-NAL: an Effective and Interpretable Entity Alignment Method

Authors: Chuanhao Xu, Jingwei Cheng, Fu Zhang

Abstract: Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference st… ▽ More Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce P-NAL, an entity alignment method that captures two types of logical inference paths with Non-Axiomatic Logic (NAL). Type 1 is the bridge-like inference path between to-be-aligned entity pairs, consisting of two relation/attribute triples and a similarity sentence between the other two entities. Type 2 links the entity pair by their embeddings. P-NAL iteratively aligns entities and relations by integrating the conclusions of the inference paths. Moreover, our method is logically interpretable and extensible due to the expressiveness of NAL. Our proposed method is suitable for various EA settings. Experimental results show that our method outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of DBP15K with both supervised and unsupervised settings. To our knowledge, we present the first in-depth analysis of entity alignment's basic principles from a unified logical perspective. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 2 figures

ACM Class: I.2.4

arXiv:2404.11944 [pdf, other]

Trusted Multi-view Learning with Label Noise

Authors: Cai Xu, Yilin Zhang, Ziyu Guan, Wei Zhao

Abstract: Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However,… ▽ More Multi-view learning methods often focus on improving decision accuracy while neglecting the decision uncertainty, which significantly restricts their applications in safety-critical applications. To address this issue, researchers propose trusted multi-view methods that learn the class distribution for each instance, enabling the estimation of classification probabilities and uncertainty. However, these methods heavily rely on high-quality ground-truth labels. This motivates us to delve into a new generalized trusted multi-view learning problem: how to develop a reliable multi-view learning model under the guidance of noisy labels? We propose a trusted multi-view noise refining method to solve this problem. We first construct view-opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates. Subsequently, we design view-specific noise correlation matrices that transform the original opinions into noisy opinions aligned with the noisy labels. Considering label noises originating from low-quality data features and easily-confused classes, we ensure that the diagonal elements of these matrices are inversely proportional to the uncertainty, while incorporating class relations into the off-diagonal elements. Finally, we aggregate the noisy opinions and employ a generalized maximum likelihood loss on the aggregated opinion for model training, guided by the noisy labels. We empirically compare TMNR with state-of-the-art trusted multi-view learning and label noise learning baselines on 5 publicly available datasets. Experiment results show that TMNR outperforms baseline methods on accuracy, reliability and robustness. The code and appendix are released at https://github.com/YilinZhang107/TMNR. △ Less

Submitted 10 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 12 pages, accepted at IJCAI 2024

MSC Class: I.2.6

arXiv:2404.11832 [pdf, other]

Prolate spheroids settling in a quiescent fluid: clustering, microstructures and collisions

Authors: Xinyu Jiang, Chunxiao Xu, Lihao Zhao

Abstract: In the present study, we investigate the settling of prolate spheroidal particles in a quiescent fluid by means of particle-resolved direct numerical simulations. By varying the particle volume fraction (phi) from 0.1% to 10%, we observe a non-monotonic variation of particle mean settling velocity with a local maximum at phi=1%. To explain this finding, we examine the spatial distribution and orie… ▽ More In the present study, we investigate the settling of prolate spheroidal particles in a quiescent fluid by means of particle-resolved direct numerical simulations. By varying the particle volume fraction (phi) from 0.1% to 10%, we observe a non-monotonic variation of particle mean settling velocity with a local maximum at phi=1%. To explain this finding, we examine the spatial distribution and orientation of dispersed particles. The particle-pairs statistics show the tendency of particles to form column-like microsturctures, revealing the attraction and entrapment of paritcles in wake regions. This effect results in the formation of large-scale particle clusters at intermediate particle volume fractions. The most intense particle clustering at phi=1%, quantified by the standard deviation of Voronoi volume, induces a swarm effect and substantially enhances the particle settling rate by around 25%. While, in the very dilute suspension at phi=0.1%, the particle clustering is attenuated due to the long inter-particle distance. In dense suspensions, however, the crowded particle arrangement disrupts particle wakes, breaks particle microstructures and inhibits the particle clustering. Consequently, hindrance effect dominates and reduces the settling speed. In contrast to particle spatial distribution, the particle orientation plays a less important role in the mean settling velocity. At last, we examine the collision rate of settling particles by computing the collision kernel. As phi increases, the collision kernel decreases monotonically, which is primarily attributed to the decrease of radial distribution function. However, the radial relative velocity is almost a constant with the change of phi. These results reveal the importance of hydrodynamic interactions, including the wake-flow-induced attractions and the lubrication effect, in collisions among settling spheroids. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11457 [pdf, other]

Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models

Authors: Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu

Abstract: With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a compre… ▽ More With the rapid advancement of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11291 [pdf, other]

Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

Authors: Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee

Abstract: Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occ… ▽ More Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occlusion. In view of this, we propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information. This is based on the observation that human interaction has specific patterns following the social proxemics. Specifically, we first design a latent representation based on Vector Quantised-Variational AutoEncoder (VQ-VAE) to model human interaction. A proxemics and physics guided diffusion model is then introduced to denoise the initial distribution. We design the diffusion model as dual branch with each branch representing one individual such that the interaction can be modeled via cross attention. With the learned priors of VQ-VAE and physical constraint as the additional information, our proposed approach is capable of estimating accurate poses that are also proxemics and physics plausible. Experimental results on Hi4D, 3DPW, and CHI3D demonstrate that our method outperforms existing approaches. The code is available at \url{https://github.com/boycehbz/HumanInteraction}. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: CVPR2024

arXiv:2404.11161 [pdf, other]

Pre-processing matters: A segment search method for WSI classification

Authors: Jun Wang, Yufei Cui, Yu Mao, Nan Guan, Chun Jason Xue

Abstract: Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach f… ▽ More Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on single-domain data. Our method demonstrates significant performance improvements in accuracy, which raise accuracy from 0.512 to 0.847 in a single domain. We further extend our insight into training performance in multi-domain data by employing a novel Bayesian optimization to search optimal pre-processing parameters, resulting in a high AUC of 0.967. We highlight that better pre-processing for WSI can contribute to further accuracy improvement in the histology area. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10955 [pdf, other]

The Traveling Tournament Problem: Improved Algorithms Based on Cycle Packing

Authors: Jingyang Zhao, Mingyu Xiao, Chao Xu

Abstract: The Traveling Tournament Problem (TTP) is a well-known benchmark problem in the field of tournament timetabling, which asks us to design a double round-robin schedule such that each pair of teams plays one game in each other's home venue, minimizing the total distance traveled by all $n$ teams ($n$ is even). TTP-$k$ is the problem with one more constraint that each team can have at most $k$-consec… ▽ More The Traveling Tournament Problem (TTP) is a well-known benchmark problem in the field of tournament timetabling, which asks us to design a double round-robin schedule such that each pair of teams plays one game in each other's home venue, minimizing the total distance traveled by all $n$ teams ($n$ is even). TTP-$k$ is the problem with one more constraint that each team can have at most $k$-consecutive home games or away games. In this paper, we investigate schedules for TTP-$k$ and analyze the approximation ratio of the solutions. Most previous schedules were constructed based on a Hamiltonian cycle of the graph. We will propose a novel construction based on a $k$-cycle packing. Then, combining our $k$-cycle packing schedule with the Hamiltonian cycle schedule, we obtain improved approximation ratios for TTP-$k$ with deep analysis. The case where $k=3$, TTP-3, is one of the most investigated cases. We improve the approximation ratio of TTP-3 from $(1.667+\varepsilon)$ to $(1.598+\varepsilon)$, for any $\varepsilon>0$. For TTP-$4$, we improve the approximation ratio from $(1.750+\varepsilon)$ to $(1.700+\varepsilon)$. By a refined analysis of the Hamiltonian cycle construction, we also improve the approximation ratio of TTP-$k$ from $(\frac{5k-7}{2k}+\varepsilon)$ to $(\frac{5k^2-4k+3}{2k(k+1)}+\varepsilon)$ for any constant $k\geq 5$. Our methods can be extended to solve a variant called LDTTP-$k$ (TTP-$k$ where all teams are allocated on a straight line). We show that the $k$-cycle packing construction can achieve an approximation ratio of $(\frac{3k-3}{2k-1}+\varepsilon)$, which improves the approximation ratio of LDTTP-3 from $4/3$ to $(6/5+\varepsilon)$. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: A preliminary version of this article was presented at MFCS 2022; Sumitted in 2022

arXiv:2404.10541 [pdf, other]

MPCOM: Robotic Data Gathering with Radio Mapping and Model Predictive Communication

Authors: Zhiyou Ji, Guoliang Li, Ruihua Han, Shuai Wang, Bing Bai, Wei Xu, Kejiang Ye, Chengzhong Xu

Abstract: Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guide… ▽ More Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guided model predictive communication (MPCOM), which navigates the robot with both grid and radio maps for shape-aware collision avoidance and communication-aware trajectory generation in a dynamic environment. The proposed MPCOM is able to trade off the time spent on reaching goal, avoiding collision, and improving communication. MPCOM captures high-order signal propagation characteristics using radio maps and incorporates the map-guided communication regularizer to the motion planning block. Experiments in IRSIM and CARLA simulators show that the proposed MPCOM outperforms other benchmarks in both LOS and NLOS cases. Real-world testing based on car-like robots is also provided to demonstrate the effectiveness of MPCOM in indoor environments. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: submit to IROS

arXiv:2404.10373 [pdf]

A newly developed multi-kilo-channel high-speed and precision waveform digitization system for neutrino experiments

Authors: H. Yang, T. Xue, L. Jiang, C. Xu, Q. Pan, B. Liang, G. Gong, B. Xu, Z. Wang, S. Chen, Y. Liu, J. Li

Abstract: The Jinping Neutrino Experiment(JNE), conducted within the China Jinping Underground Laboratory, aims to detect and analyze of solar neutrinos, geo-neutrinos, and supernova neutrinos. A one-ton prototype will soon be in commision with an upgrade from 30 channels to 60 channels, which will increase the data bandwidth by one to two orders of magnitude and exceed the capacity of the current CAEN DAQ… ▽ More The Jinping Neutrino Experiment(JNE), conducted within the China Jinping Underground Laboratory, aims to detect and analyze of solar neutrinos, geo-neutrinos, and supernova neutrinos. A one-ton prototype will soon be in commision with an upgrade from 30 channels to 60 channels, which will increase the data bandwidth by one to two orders of magnitude and exceed the capacity of the current CAEN DAQ system. Additionally, enhancing the performance and flexibility of JNE DAQ system is crucial. This paper presents the design of a new Tsinghua DAQ system for the JNE and its performance and stability. The new Tsinghua DAQ(THDAQ) system for JNE is based on the cPCI protocol and demonstrates powerful performance improvements: ADC ENOB of the THDAQ system approximately exceeds 9.8-bit, marking a 14% improvement over the CAEN DAQ system; The maximum clock deviation within a single chassis is 85.6 ps, satisfying sub-nanosecond synchronization criteria; Each DAQ board features two QSFP+ optical ports with 82.5Gbps transmission capability, while the PCIe board supports a transmission rate of 100.2 Gbps. In addition, comparative experiments between the two systems were also tested in detail. The analysis results of waveform and charge spectrum prove the high stability of the THDAQ system. This provides a foundation for the 60-channel and 4000-channel DAQ systems. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10235 [pdf, ps, other]

Integrated Sensing and Communication for Edge Inference with End-to-End Multi-View Fusion

Authors: Xibin Jin, Guoliang Li, Shuai Wang, Miaowen Wen, Chengzhong Xu, H. Vincent Poor

Abstract: Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC desi… ▽ More Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC designs and high-level inference services. This letter proposes an inference-oriented ISAC (IO-ISAC) scheme, which minimizes upper bounds on end-to-end inference error and latency using multi-objective optimization. The key to our approach is to derive a multi-view inference model that accounts for both the number of observations and the angles of observations, by integrating a half-voting fusion rule and an angle-aware sensing model. Simulation results show that the proposed IO-ISAC outperforms other benchmarks in terms of both accuracy and latency. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.10178 [pdf, other]

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Authors: Chentianye Xu, Xueying Zhan, Min Xu

Abstract: Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-bas… ▽ More Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-based approaches often require extensive labeled datasets, limiting their practicality. To overcome these obstacles, we introduce cryoMAE, a novel approach based on few-shot learning that harnesses the capabilities of Masked Autoencoders (MAE) to enable efficient selection of single particles in cryo-EM images. Contrary to conventional NN-based techniques, cryoMAE requires only a minimal set of positive particle images for training yet demonstrates high performance in particle detection. Furthermore, the implementation of a self-cross similarity loss ensures distinct features for particle and background regions, thereby enhancing the discrimination capability of cryoMAE. Experiments on large-scale cryo-EM datasets show that cryoMAE outperforms existing state-of-the-art (SOTA) methods, improving 3D reconstruction resolution by up to 22.4%. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09496 [pdf, other]

Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Authors: Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang, Siheng Chen

Abstract: Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous drivin… ▽ More Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous driving: a machine learning approach that optimizes the information sharing strategy to improve the driving performance of each vehicle. This effort necessitates two key foundations: a platform capable of generating data to facilitate the training and testing of V2X-AD, and a comprehensive system that integrates full driving-related functionalities with mechanisms for information sharing. From the platform perspective, we present V2Xverse, a comprehensive simulation platform for collaborative autonomous driving. This platform provides a complete pipeline for collaborative driving. From the system perspective, we introduce CoDriving, a novel end-to-end collaborative driving system that properly integrates V2X communication over the entire autonomous pipeline, promoting driving with shared perceptual information. The core idea is a novel driving-oriented communication strategy. Leveraging this strategy, CoDriving improves driving performance while optimizing communication efficiency. We make comprehensive benchmarks with V2Xverse, analyzing both modular performance and closed-loop driving performance. Experimental results show that CoDriving: i) significantly improves the driving score by 62.49% and drastically reduces the pedestrian collision rate by 53.50% compared to the SOTA end-to-end driving method, and ii) achieves sustaining driving performance superiority over dynamic constraint communication conditions. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09219 [pdf, ps, other]

Observation of $D \to a_{0}(980)π$ in the decays $D^{0} \rightarrow π^{+}π^{-}η$ and $D^{+} \rightarrow π^{+}π^{0}η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the… ▽ More We report the first amplitude analysis of the decays $D^{0} \to π^{+} π^{-} η$ and $D^{+} \rightarrow π^{+}π^{0}η$ using a data sample taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, corresponding to an integrated luminosity of 7.9 ${\rm fb}^{-1}$. The contribution from the process $D^{0(+)} \to a_{0}(980)^{+} π^{-(0)}$ is significantly larger than the $D^{0(+)} \to a_{0}(980)^{-(0)} π^{+}$ contribution. The ratios $\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{+}π^{-})/\mathcal{B}(D^{0} \rightarrow a_{0}(980)^{-}π^{+})$ and $\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{+}π^{0})/\mathcal{B}(D^{+} \rightarrow a_{0}(980)^{0}π^{+})$ are measured to be $7.5^{+2.5}_{-0.8\,\mathrm{stat.}}\pm1.7_{\mathrm{syst.}}$ and $2.6\pm0.6_{\mathrm{stat.}}\pm0.3_{\mathrm{syst.}}$, respectively. The measured $D^{0}$ ratio disagrees with the theoretical predictions by orders of magnitudes, thus implying a substantial contribution from final-state interactions. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08965 [pdf, other]

Seeing Text in the Dark: Algorithm and Benchmark

Authors: Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for l… ▽ More Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released. △ Less

Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.08470 [pdf, other]

Gamma positivity of variations of $(α,t)$-Eulerian polynomials

Authors: Chao Xu, Jiang Zeng

Abstract: In 1977 Carlitz and Scoville introduced the cycle $(α,t)$-Eulerian polynomials $A^{\mathrm{cyc}}_n(x,y, t\,|\,α)$ by enumerating permutations with respect to the number of excedances, drops, fixed points and cycles. In this paper, we introduce a nine-variable generalization of the Eulerian polynomials $A_n(u_1,u_2,u_3,u_4, f, g, t\,|\,α, β)$ in terms of descent based statistics of permutations and… ▽ More In 1977 Carlitz and Scoville introduced the cycle $(α,t)$-Eulerian polynomials $A^{\mathrm{cyc}}_n(x,y, t\,|\,α)$ by enumerating permutations with respect to the number of excedances, drops, fixed points and cycles. In this paper, we introduce a nine-variable generalization of the Eulerian polynomials $A_n(u_1,u_2,u_3,u_4, f, g, t\,|\,α, β)$ in terms of descent based statistics of permutations and prove a connection formula between these two kinds of generalized Eulerian polynomials. By exploring the connection formula, we derive plainly the exponential generating function of the latter polynomials and various $γ$-positive formulas for variants of Eulerian polynomials. In particular, our results unify and strengthen the recent results by Ji and Ji-Lin. In related work to the transition matrix between the Specht and web bases, Hwang, Jang and Oh recently introduced the web permutations, which can be characterised by cycle André permutations. We show that enumerating the latter permutations with respect to the number of drops, fixed points and cycles gives rise to the normalised $γ$-vectors of the $(α,t)$-Eulerian polynomials. Our result generalizes and unifies several known results in the literature. △ Less

Submitted 25 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 37 pages

MSC Class: 05A15; 05A19

arXiv:2404.08412 [pdf, other]

PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction

Authors: Siming Shan, Pengkai Wang, Song Chen, Jiaxu Liu, Chao Xu, Shengze Cai

Abstract: The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In ad… ▽ More The use of machine learning in fluid dynamics is becoming more common to expedite the computation when solving forward and inverse problems of partial differential equations. Yet, a notable challenge with existing convolutional neural network (CNN)-based methods for data fidelity enhancement is their reliance on specific low-fidelity data patterns and distributions during the training phase. In addition, the CNN-based method essentially treats the flow reconstruction task as a computer vision task that prioritizes the element-wise precision which lacks a physical and mathematical explanation. This dependence can dramatically affect the models' effectiveness in real-world scenarios, especially when the low-fidelity input deviates from the training data or contains noise not accounted for during training. The introduction of diffusion models in this context shows promise for improving performance and generalizability. Unlike direct mapping from a specific low-fidelity to a high-fidelity distribution, diffusion models learn to transition from any low-fidelity distribution towards a high-fidelity one. Our proposed model - Physics-informed Residual Diffusion, demonstrates the capability to elevate the quality of data from both standard low-fidelity inputs, to low-fidelity inputs with injected Gaussian noise, and randomly collected samples. By integrating physics-based insights into the objective function, it further refines the accuracy and the fidelity of the inferred high-quality data. Experimental results have shown that our approach can effectively reconstruct high-quality outcomes for two-dimensional turbulent flows from a range of low-fidelity input conditions without requiring retraining. △ Less

Submitted 9 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 22 pages

arXiv:2404.08364 [pdf, other]

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarrassed to parallelize. In this paper, we propose FlowWalker, a GPU-based dynamic graph random walk framework. FlowWalker implements an efficient parallel sampling method to fully exploit the GPU parallelism and reduce space complexity. Moreover, it employs a sampler-centric paradigm alongside a dynamic scheduling strategy to handle the huge amounts of walking queries. FlowWalker stands as a memory-efficient framework that requires no auxiliary data structures in GPU global memory. We examine the performance of FlowWalker extensively on ten datasets, and experiment results show that FlowWalker achieves up to 752.2x, 72.1x, and 16.4x speedup compared with existing CPU, GPU, and FPGA random walk frameworks, respectively. Case study shows that FlowWalker diminishes random walk time from 35% to 3% in a pipeline of ByteDance friend recommendation GNN training. △ Less

Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08304 [pdf, ps, other]

Uncertainty relations based on the $ρ$-absolute variance for quantum channels

Authors: Cong Xu, Wen Zhou, Qing-Hua Zhang, Shao-Ming Fei

Abstract: Uncertainty principle reveals the intrinsic differences between the classical and quantum worlds, which plays a significant role in quantum information theory. By using $ρ$-absolute variance, we introduce the uncertainty of quantum channels and explore its properties. By using Cauchy-Schwarz inequality and the parallelogram law, we establish the product and summation forms of the uncertainty relat… ▽ More Uncertainty principle reveals the intrinsic differences between the classical and quantum worlds, which plays a significant role in quantum information theory. By using $ρ$-absolute variance, we introduce the uncertainty of quantum channels and explore its properties. By using Cauchy-Schwarz inequality and the parallelogram law, we establish the product and summation forms of the uncertainty relations for arbitrary two quantum channels, respectively. The summation form of the uncertainty inequalities based on the $ρ$-absolute variance for arbitrary $N$ quantum channels are also investigated and the optimal lower bounds are presented. We illustrate our results by several typical examples. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.07436 [pdf, other]

Measurement of $e^{+}e^{-}\to ωη^{\prime}$ cross sections at $\sqrt{s}=$ 2.000 to 3.080 GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, M. R. An, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (599 additional authors not shown)

Abstract: The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be… ▽ More The Born cross sections for the process $e^{+}e^{-}\to ωη^{\prime}$ are measured at 22 center-of-mass energies from 2.000 to 3.080 GeV using data collected with the BESIII detector at the BEPCII collider. A resonant structure is observed with a statistical significance of 9.6$σ$. A Breit-Wigner fit determines its mass to be $M_R=(2153\pm30\pm31)~{\rm{MeV}}/c^{2}$ and its width to be $Γ_{R}=(167\pm77\pm7)~\rm{MeV}$, where the first uncertainties are statistical and the second are systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06718 [pdf, other]

Measurement of the Born cross section for $e^{+}e^{-}\to ηh_c $ at center-of-mass energies between 4.1 and 4.6\,GeV

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth,… ▽ More We measure the Born cross section for the reaction $e^{+}e^{-} \rightarrow ηh_c$ from $\sqrt{s} = 4.129$ to $4.600$~GeV using data sets collected by the BESIII detector running at the BEPCII collider. A resonant structure in the cross section line shape near 4.200~GeV is observed with a statistical significance of 7$σ$. The parameters of this resonance are measured to be \MeasMass\ and \MeasWidth, where the first uncertainties are statistical and the second systematic. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06676 [pdf]

Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

Authors: Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue

Abstract: In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classifica… ▽ More In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classification processes. Topological Data Analysis (TDA) offers a novel perspective for ADHD classification, diverging from traditional time-frequency domain features. Yet, conventional TDA models are restricted to single-channel time series and are susceptible to noise, leading to the loss of topological features in persistence diagrams.This paper presents an enhanced TDA approach applicable to multi-channel EEG in ADHD. Initially, optimal input parameters for multi-channel EEG are determined. Subsequently, each channel's EEG undergoes phase space reconstruction (PSR) followed by the utilization of k-Power Distance to Measure (k-PDTM) for approximating ideal point clouds. Then, multi-dimensional time series are re-embedded, and TDA is applied to obtain topological feature information. Gaussian function-based Multivariate Kernel Density Estimation (MKDE) is employed in the merger persistence diagram to filter out desired topological feature mappings. Finally, persistence image (PI) method is utilized to extract topological features, and the influence of various weighting functions on the results is discussed.The effectiveness of our method is evaluated using the IEEE ADHD dataset. Results demonstrate that the accuracy, sensitivity, and specificity reach 85.60%, 83.61%, and 88.33%, respectively. Compared to traditional TDA methods, our method was effectively improved and outperforms typical nonlinear descriptors. These findings indicate that our method exhibits higher precision and robustness. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05979 [pdf, other]

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

Abstract: Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability i… ▽ More Story visualization aims to generate a series of realistic and coherent images based on a storyline. Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner. Although these models have shown notable progress, there are still three flaws. 1) The unidirectional generation of auto-regressive manner restricts the usability in many scenarios. 2) The additional introduced story history encoders bring an extremely high computational cost. 3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly. To these ends, we propose a bidirectional, unified, and efficient framework, namely StoryImager. The StoryImager enhances the storyboard generative ability inherited from the pre-trained text-to-image model for a bidirectional generation. Specifically, we introduce a Target Frame Masking Strategy to extend and unify different story image generation tasks. Furthermore, we propose a Frame-Story Cross Attention Module that decomposes the cross attention for local fidelity and global coherence. Moreover, we design a Contextual Feature Extractor to extract contextual information from the whole storyline. The extensive experimental results demonstrate the excellent performance of our StoryImager. The code is available at https://github.com/tobran/StoryImager. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 17 pages

arXiv:2404.05973 [pdf, ps, other]

Search for the Rare Decays $D_s^+\to h^+(h^{0})e^+e^-$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (618 additional authors not shown)

Abstract: Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay… ▽ More Using 7.33~fb$^{-1}$ of $e^{+}e^{-}$ collision data collected by the BESIII detector at center-of-mass energies in the range of $\sqrt{s}=4.128 - 4.226$~GeV, we search for the rare decays $D_{s}^+\to h^+(h^{0})e^{+}e^{-}$, where $h$ represents a kaon or pion. By requiring the $e^{+}e^{-}$ invariant mass to be consistent with a $φ(1020)$, $0.98<M(e^{+}e^{-})<1.04$ ~GeV/$c^2$, the decay $D_s^+\toπ^+φ,φ\to e^{+}e^{-}$ is observed with a statistical significance of 7.8$σ$, and evidence for the decay $D_s^+\toρ^+φ,φ\to e^{+}e^{-}$ is found for the first time with a statistical significance of 4.4$σ$. The decay branching fractions are measured to be $\mathcal{B}(D_s^+\toπ^+φ, φ\to e^{+}e^{-} )=(1.17^{+0.23}_{-0.21}\pm0.03)\times 10^{-5}$, and $\mathcal{B}(D_s^+\toρ^+φ, φ\to e^{+}e^{-} )=(2.44^{+0.67}_{-0.62}\pm 0.16)\times 10^{-5}$, where the first uncertainties are statistical and the second systematic. No significant signal for the three four-body decays of $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-},\ D_{s}^{+}\to K^{+}π^{0}e^{+}e^{-}$, and $D_{s}^{+}\to K_{S}^{0}π^{+}e^{+}e^{-}$ is observed. For $D_{s}^{+}\to π^{+}π^{0}e^{+}e^{-}$, the $φ$ mass region is vetoed to minimize the long-distance effects. The 90$\%$ confidence level upper limits set on the branching fractions of these decays are in the range of $(7.0-8.1)\times 10^{-5}$. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 10 pages, 2 figures, 1 table

arXiv:2404.05285 [pdf, other]

Detecting Every Object from Events

Authors: Haitian Zhang, Chang Xu, Xinya Wang, Bingde Liu, Guang Hua, Lei Yu, Wen Yang

Abstract: Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In th… ▽ More Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In this study, we turn to a new modality enabled by the so-called event camera, featured by its sub-millisecond latency and high dynamic range, for robust CAOD. We propose Detecting Every Object in Events (DEOE), an approach tailored for achieving high-speed, class-agnostic open-world object detection in event-based vision. Built upon the fast event-based backbone: recurrent vision transformer, we jointly consider the spatial and temporal consistencies to identify potential objects. The discovered potential objects are assimilated as soft positive samples to avoid being suppressed as background. Moreover, we introduce a disentangled objectness head to separate the foreground-background classification and novel object discovery tasks, enhancing the model's generalization in localizing novel objects while maintaining a strong ability to filter out the background. Extensive experiments confirm the superiority of our proposed DEOE in comparison with three strong baseline methods that integrate the state-of-the-art event-based object detector with advancements in RGB-based CAOD. Our code is available at https://github.com/Hatins/DEOE. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05196 [pdf, other]

HSViT: Horizontally Scalable Vision Transformer

Authors: Chenhao Xu, Chang-Tsun Li, Chee Peng Lim, Douglas Creighton

Abstract: While the Vision Transformer (ViT) architecture gains prominence in computer vision and attracts significant attention from multimedia communities, its deficiency in prior knowledge (inductive bias) regarding shift, scale, and rotational invariance necessitates pre-training on large-scale datasets. Furthermore, the growing layers and parameters in both ViT and convolutional neural networks (CNNs)… ▽ More While the Vision Transformer (ViT) architecture gains prominence in computer vision and attracts significant attention from multimedia communities, its deficiency in prior knowledge (inductive bias) regarding shift, scale, and rotational invariance necessitates pre-training on large-scale datasets. Furthermore, the growing layers and parameters in both ViT and convolutional neural networks (CNNs) impede their applicability to mobile multimedia services, primarily owing to the constrained computational resources on edge devices. To mitigate the aforementioned challenges, this paper introduces a novel horizontally scalable vision transformer (HSViT). Specifically, a novel image-level feature embedding allows ViT to better leverage the inductive bias inherent in the convolutional layers. Based on this, an innovative horizontally scalable architecture is designed, which reduces the number of layers and parameters of the models while facilitating collaborative training and inference of ViT models across multiple nodes. The experimental results depict that, without pre-training on large-scale datasets, HSViT achieves up to 10% higher top-1 accuracy than state-of-the-art schemes, ascertaining its superior preservation of inductive bias. The code is available at https://github.com/xuchenhao001/HSViT. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04917 [pdf, ps, other]

Search for $η_c(2S)\to 2(π^+π^-)$ and improved measurement of $χ_{cJ}\to 2(π^+π^-)$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (638 additional authors not shown)

Abstract: We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level… ▽ More We search for the hadronic decay $η_c(2S)\to 2(π^+π^-)$ in the $ψ(3686)\toγη_c(2S)$ radiative decay using $(27.12\pm 0.14)\times 10^8$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider. No significant signal is found, and the upper limit of $\mathcal{B}[ψ(3686)\toγη_c(2S)]\mathcal{B}[η_c(2S)\to 2(π^+π^-)]$ is determined to be $0.78\times 10^{-6}$ at the 90\% confidence level. Using $ψ(3686)\toγχ_{cJ}$ transitions, we also measure the branching fractions of $\mathcal{B}[χ_{cJ(J=0,1,2)}\to 2(π^+π^-)]$, which are $\mathcal{B}[χ_{c0}\to 2(π^+π^-)]=(2.127\pm 0.002~(\mathrm{stat.})\pm 0.101~(\mathrm{syst.}))$\%, $\mathcal{B}[χ_{c1}\to 2(π^+π^-)]=(0.685\pm 0.001~(\mathrm{stat.})\pm 0.031~\mathrm{syst.}))$\%, and $\mathcal{B}[χ_{c2}\to 2(π^+π^-)]=(1.153\pm 0.001~(\mathrm{stat.})\pm 0.063~(\mathrm{syst.}))$\%. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04875 [pdf, other]

NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization

Authors: Peng Tu, Xun Zhou, Mingming Wang, Xiaojun Yang, Bo Peng, Ping Chen, Xiu Su, Yawen Huang, Yefeng Zheng, Chang Xu

Abstract: Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility o… ▽ More Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility of NeRF: the derivation of point clouds from aggregated urban landscape imagery. The transmutation of street-view data into point clouds is fraught with complexities, attributable to a nexus of interdependent variables. First, high-quality point cloud generation hinges on precise camera poses, yet many datasets suffer from inaccuracies in pose metadata. Also, the standard approach of NeRF is ill-suited for the distinct characteristics of street-view data from autonomous vehicles in vast, open settings. Autonomous vehicle cameras often record with limited overlap, leading to blurring, artifacts, and compromised pavement representation in NeRF-based point clouds. In this paper, we present NeRF2Points, a tailored NeRF variant for urban point cloud synthesis, notable for its high-quality output from RGB inputs alone. Our paper is supported by a bespoke, high-resolution 20-kilometer urban street dataset, designed for point cloud generation and evaluation. NeRF2Points adeptly navigates the inherent challenges of NeRF-based point cloud synthesis through the implementation of the following strategic innovations: (1) Integration of Weighted Iterative Geometric Optimization (WIGO) and Structure from Motion (SfM) for enhanced camera pose accuracy, elevating street-view data precision. (2) Layered Perception and Integrated Modeling (LPiM) is designed for distinct radiance field modeling in urban environments, resulting in coherent point cloud representations. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 18 pages

arXiv:2404.04701 [pdf, other]

Flat-Band Enhanced Antiferromagnetic Fluctuations and Unconventional Superconductivity in Pressurized CsCr$_3$Sb$_5$

Authors: Siqi Wu, Chenchao Xu, Xiaoqun Wang, Hai-Qing Lin, Chao Cao, Guang-Han Cao

Abstract: The interrelationship between flat bands and correlated phenomena such as unconventional superconductivity stands as an intriguing subject in condensed matter physics. Here, by first-principles calculations and random phase approximation analyses, we investigate the electronic structure, superconducting instability, as well as roles of the incipient flat bands in kagome superconductor CsCr$_3$Sb… ▽ More The interrelationship between flat bands and correlated phenomena such as unconventional superconductivity stands as an intriguing subject in condensed matter physics. Here, by first-principles calculations and random phase approximation analyses, we investigate the electronic structure, superconducting instability, as well as roles of the incipient flat bands in kagome superconductor CsCr$_3$Sb$_5$. Our calculations reveal strong antiferromagnetic spin fluctuations in CsCr$_3$Sb$_5$, which mediates two sets of spin-singlet superconducting orders with $s_{\pm}$- and ($d_{xy}$, $d_{x^2-y^2}$)-wave symmetries. Under the dominance of local Coulomb interactions, the unoccupied incipient flat bands are shown to be crucial for the momentum dependence of spin fluctuations and thus the superconductivity. Our further analyses unveil a sublattice-momentum-coupling-driven mechanism for this momentum-dependent enhancement of the fluctuations, which provides us a new perspective for future studies of geometrically frustrated systems. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Showing 151–200 of 3,215 results for author: Xu, C