Skip to main content

Showing 1–50 of 2,583 results for author: Zhang, F

  1. arXiv:2407.11510  [pdf, other

    eess.AS

    VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

    Authors: Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, Ming Li

    Abstract: In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted By InterSpeech2024

  2. arXiv:2407.10540  [pdf, other

    astro-ph.HE

    Sudden polarization angle jumps of the repeating fast radio burst FRB 20201124A

    Authors: J. R. Niu, W. Y. Wang, J. C. Jiang, Y. Qu, D. J. Zhou, W. W. Zhu, K. J. Lee, J. L. Han, B. Zhang, D. Li, S. Cao, Z. Y. Fang, Y. Feng, Q. Y. Fu, P. Jiang, W. C. Jing, J. Li, Y. Li, R. Luo, L. Q. Meng, C. C. Miao, X. L. Miao, C. H. Niu, Y. C. Pan, B. J. Wang , et al. (19 additional authors not shown)

    Abstract: We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes tha… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages, 5 figures, submitted to APJL

  3. arXiv:2407.08883  [pdf

    cs.CV

    TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

    Authors: Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverage… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

  4. arXiv:2407.08661  [pdf, other

    cond-mat.str-el cond-mat.mes-hall

    Self-consistent theory for the fractional quantum anomalous Hall effect in rhombohedral pentalayer graphene

    Authors: Ke Huang, Xiao Li, Sankar Das Sarma, Fan Zhang

    Abstract: The fractional quantum anomalous Hall (FQAH) effect in rhombohedral pentalayer graphene (PLG) has attracted significant attention due to its potential for observing exotic quantum states. In this work, we present a self-consistent Hartree-Fock theory for the FQAH effect in rhombohedral PLG. In particular, we focus on the convergence of the Hartree-Fock calculation with various reference fields and… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 18 pages, 12 figures. Comments are welcome

  5. arXiv:2407.08303  [pdf, other

    cs.CV cs.AI

    DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Authors: Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan

    Abstract: Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  6. arXiv:2407.07334  [pdf, other

    cond-mat.str-el cond-mat.stat-mech

    First-order Néel-VBS transition in $S=3/2$ antiferromagnets

    Authors: Fan Zhang, Wenan Guo, Ribhu K. Kaul

    Abstract: We study the transition between Néel and columnar valence-bond solid ordering in two-dimensional $S=3/2$ square lattice quantum antiferromagnets with SO(3) symmetry. According to the deconfined criticality scenario, this transition can be direct and continuous like the well-studied $S=1/2$ case. To study the global phase diagram, we work with four multi-spin couplings with full rotational symmetry… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 16 figures

  7. arXiv:2407.07318  [pdf

    physics.optics eess.IV

    Serial coherent diffraction imaging of dynamic samples based on inter-frame constraint

    Authors: Pengju Sheng, Fucai Zhang

    Abstract: We proposed a novel approach to coherent imaging of dynamic samples. The inter-frame similarity of the sample's local structures is found to be a powerful constraint in phasing a sequence of diffraction patterns. We devised a new image reconstruction algorithm that exploits this inter-frame constraint enabled by an adaptive similar region determination approach. We demonstrated the feasibility of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  8. arXiv:2407.05749  [pdf, other

    eess.SP cs.HC cs.LG

    LDGCN: An Edge-End Lightweight Dual GCN Based on Single-Channel EEG for Driver Drowsiness Monitoring

    Authors: Jingwei Huang, Chuansheng Wang, Jiayan Huang, Haoyi Fan, Antoni Grau, Fuquan Zhang

    Abstract: Driver drowsiness electroencephalography (EEG) signal monitoring can timely alert drivers of their drowsiness status, thereby reducing the probability of traffic accidents. Graph convolutional networks (GCNs) have shown significant advancements in processing the non-stationary, time-varying, and non-Euclidean nature of EEG signals. However, the existing single-channel EEG adjacency graph construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  9. arXiv:2407.04396  [pdf, other

    cs.CV cs.AI

    Graph-Guided Test-Time Adaptation for Glaucoma Diagnosis using Fundus Photography

    Authors: Qian Zeng, Le Zhang, Yipeng Liu, Ce Zhu, Fan Zhang

    Abstract: Glaucoma is a leading cause of irreversible blindness worldwide. While deep learning approaches using fundus images have largely improved early diagnosis of glaucoma, variations in images from different devices and locations (known as domain shifts) challenge the use of pre-trained models in real-world settings. To address this, we propose a novel Graph-guided Test-Time Adaptation (GTTA) framework… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures, 3 tables, submitted to MICCAI

  10. arXiv:2407.03964  [pdf, other

    cs.CL cs.LG

    Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

    Authors: Fuxiang Zhang, Junyou Li, Yi-Chen Li, Zongzhang Zhang, Yang Yu, Deheng Ye

    Abstract: Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extr… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  11. arXiv:2407.03856  [pdf, other

    cs.LG

    Q-Adapter: Training Your LLM Adapter as a Residual Q-Function

    Authors: Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu

    Abstract: We consider the problem of adapting Large Language Models (LLMs) pre-trained with Reinforcement Learning from Human Feedback (RLHF) to downstream preference data. Naive approaches to achieve this could be supervised fine-tuning on preferred responses or reinforcement learning with a learned reward model. However, the LLM runs the risk of forgetting its initial knowledge as the fine-tuning progress… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  12. arXiv:2407.03560  [pdf, ps, other

    math.CO

    Numerical semigroups from rational matrices I: power-integral matrices and nilpotent representations

    Authors: Arsh Chhabra, Stephan Ramon Garcia, Fangqian Zhang, Hechun Zhang

    Abstract: Our aim in this paper is to initiate the study of exponent semigroups for rational matrices. We prove that every numerical semigroup is the exponent semigroup of some rational matrix. We also obtain lower bounds on the size of such matrices and discuss the related class of power-integral matrices.

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 13 pages

  13. arXiv:2407.02880  [pdf, other

    cs.LG cs.AI cs.CV

    Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

    Authors: Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad

    Abstract: Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This pa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  14. arXiv:2407.02675  [pdf, other

    eess.IV cs.CV

    Depth-Aware Endoscopic Video Inpainting

    Authors: Francis Xiatian Zhang, Shuang Chen, Xianghua Xie, Hubert P. H. Shum

    Abstract: Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorpor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  15. arXiv:2407.01461  [pdf, other

    cs.CL

    Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement

    Authors: Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: The capacity of large language models (LLMs) to generate honest, harmless, and helpful responses heavily relies on the quality of user prompts. However, these prompts often tend to be brief and vague, thereby significantly limiting the full potential of LLMs. Moreover, harmful prompts can be meticulously crafted and manipulated by adversaries to jailbreak LLMs, inducing them to produce potentially… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  16. arXiv:2407.01230  [pdf, other

    cs.CV

    DaBiT: Depth and Blur informed Transformer for Joint Refocusing and Super-Resolution

    Authors: Crispian Morris, Nantheera Anantrasirichai, Fan Zhang, David Bull

    Abstract: In many real-world scenarios, recorded videos suffer from accidental focus blur, and while video deblurring methods exist, most specifically target motion blur. This paper introduces a framework optimised for the joint task of focal deblurring (refocusing) and video super-resolution (VSR). The proposed method employs novel map guided transformers, in addition to image propagation, to effectively l… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2407.01219  [pdf, other

    cs.CL

    Searching for Best Practices in Retrieval-Augmented Generation

    Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  18. arXiv:2407.00136  [pdf, other

    hep-ex

    Observation of the Electromagnetic Dalitz Transition $h_c \rightarrow e^+e^-η_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, S. Ahmed, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (495 additional authors not shown)

    Abstract: Using $(27.12\pm 0.14)\times10^8$ $ψ(3686)$ decays and data samples of $e^+e^-$ collisions with $\sqrt{s}$ from 4.130 to 4.780~GeV collected with the BESIII detector, we report the first observation of the electromagnetic Dalitz transition $h_c\to e^+e^-η_c$ with a statistical significance of $5.4σ$. We measure the ratio of the branching fractions… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  19. arXiv:2406.20036  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Direct observation of layer skyrmions in twisted WSe2 bilayers

    Authors: Fan Zhang, Nicolás Morales-Durán, Yanxing Li, Wang Yao, Jung-Jung Su, Yu-Chuan Lin, Chengye Dong, Hyunsue Kim, Joshua A. Robinson, Allan H. Macdonald, Chih-Kang Shih

    Abstract: Transition metal dichalcogenide (TMD) twisted homobilayers have been established as an ideal platform for studying strong correlation phenomena, as exemplified by the recent discovery of fractional Chern insulator (FCI) states in twisted MoTe2 and Chern insulators (CI) and unconventional superconductivity in twisted WSe2. In these systems, nontrivial topology in the strongly layer-hybridized regim… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  20. arXiv:2406.19240  [pdf, other

    cs.SE

    Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

    Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

    Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.17694  [pdf, other

    cs.CR

    Protecting the 'Stop Using My Data' Right through Blockchain-assisted Evidence Generation

    Authors: Fan Zhang, Peng Liu

    Abstract: In order to provide personalized services to users, Internet-based platforms collect and utilize user-generated behavioral data. Although the 'stop using my data' right should be a fundamental data right, which allows individuals to request their personal data to be no longer utilized by online platforms, the existing preventive data protection measures (e.g., cryptographic data elimination, diffe… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  22. Einstein-Podolsky-Rosen steering paradox "2=1'' for $N$ qubits

    Authors: Zhi-Jie Liu, Jie Zhou, Hui-Xian Meng, Xing-Yan Fan, Mi Xie, Fu-lin Zhang, Jing-Ling Chen

    Abstract: Einstein-Podolsky-Rosen (EPR) paradox highlights the absence of a local realistic explanation for quantum mechanics, and shows the incompatibility of the local-hidden-state models with quantum theory. For $N$-qubit states, or more importantly, the $N$-qubit mixed states, we present the EPR steering paradox in the form of the contradictory equality "2=1". We show that the contradiction holds for an… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 12 pages, 0 figure

    Journal ref: Modern Physics Letters A Vol. 39, No. 9, 2450030 (2024)

  23. arXiv:2406.16486  [pdf, other

    cs.AI

    Towards Comprehensive Preference Data Collection for Reward Modeling

    Authors: Yulan Hu, Qingyang Li, Sheng Ouyang, Ge Chen, Kaihui Chen, Lijun Mei, Xucheng Ye, Fuzheng Zhang, Yong Liu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models (LLMs) with human preferences, thereby enhancing the quality of responses generated. A critical component of RLHF is the reward model, which is trained on preference data and outputs a scalar reward during the inference stage. However, the collection of preference data still lacks thorough investig… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  25. arXiv:2406.15879  [pdf

    physics.optics cond-mat.mtrl-sci

    Robust Ptychographic Reconstruction with an Out-of-Focus Electron Probe

    Authors: Shoucong Ning, Wenhui Xu, Pengju Sheng, Leyi Loh, Stephen Pennycook, Fucai Zhang, Michel Bosman, Qian He

    Abstract: As a burgeoning technique, out-of-focus electron ptychography offers the potential for rapidly imaging atomic-scale large fields of view (FoV) using a single diffraction dataset. However, achieving robust out-of-focus ptychographic reconstruction poses a significant challenge due to the inherent scan instabilities of electron microscopes, compounded by the presence of unknown aberrations in the pr… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures

  26. arXiv:2406.13597  [pdf, other

    cs.LG cs.AI

    GraphKAN: Enhancing Feature Extraction with Graph Kolmogorov Arnold Networks

    Authors: Fan Zhang, Xin Zhang

    Abstract: Massive number of applications involve data with underlying relationships embedded in non-Euclidean space. Graph neural networks (GNNs) are utilized to extract features by capturing the dependencies within graphs. Despite groundbreaking performances, we argue that Multi-layer perceptrons (MLPs) and fixed activation functions impede the feature extraction due to information loss. Inspired by Kolmog… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  28. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  29. arXiv:2406.12520  [pdf, other

    cond-mat.soft physics.data-an

    On the analysis of two-time correlation functions: equilibrium vs non-equilibrium systems

    Authors: Anastasia Ragulskaya, Vladimir Starostin, Fajun Zhang, Christian Gutt, Frank Schreiber

    Abstract: X-ray photon correlation spectroscopy (XPCS) is a powerful tool for the investigation of dynamics covering a broad range of time and length scales. The two-time correlation function (TTC) is commonly used to track non-equilibrium dynamical evolution in XPCS measurements, followed by the extraction of one-time correlations. While the theoretical foundation for the quantitative analysis of TTCs is p… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  30. arXiv:2406.11653  [pdf, other

    eess.SY

    Communication-Efficient MARL for Platoon Stability and Energy-efficiency Co-optimization in Cooperative Adaptive Cruise Control of CAVs

    Authors: Min Hua, Dong Chen, Kun Jiang, Fanggang Zhang, Jinhai Wang, Bo Wang, Quan Zhou, Hongming Xu

    Abstract: Cooperative adaptive cruise control (CACC) has been recognized as a fundamental function of autonomous driving, in which platoon stability and energy efficiency are outstanding challenges that are difficult to accommodate in real-world operations. This paper studied the CACC of connected and autonomous vehicles (CAVs) based on the multi-agent reinforcement learning algorithm (MARL) to optimize pla… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11512  [pdf, ps, other

    math.AG

    Asymptotic Behaviors of Moduli of One-dimensional Sheaves on Surfaces

    Authors: Fei Si, Feinuo Zhang

    Abstract: In this paper, we study the asymptotic behaviors of the Betti numbers and Picard numbers of the moduli space $M_{β,χ}$ of one-dimensional sheaves supported in a curve class $β$ on $S$ with Euler characteristic $χ$. We determine the intersection cohomology Betti numbers of $M_{β,χ}$ when $S$ is a del Pezzo surface and $β$ is sufficiently positive. As an application, we formulate a $P = C$ conjectur… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 28 pages, comments are very welcome!

  32. arXiv:2406.11277  [pdf, other

    cs.CL

    Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen

    Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.10810  [pdf, other

    cs.RO

    RGBlimp-Q: Robotic Gliding Blimp With Moving Mass Control Based on a Bird-Inspired Continuum Arm

    Authors: Hao Cheng, Feitian Zhang

    Abstract: Robotic blimps, as lighter-than-air aerial systems, offer prolonged duration and enhanced safety in human-robot interactions due to their buoyant lift. However, robust flight against environmental airflow disturbances remains a significant challenge, limiting the broader application of these robots. Drawing inspiration from the flight mechanics of birds and their ability to perch against natural w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  34. arXiv:2406.10558  [pdf, other

    cs.RO

    A Hybrid Controller Design for Human-Assistive Piloting of an Underactuated Blimp

    Authors: Wugang Meng, Tianfu Wu, Qiuyang Tao, Fumin Zhang

    Abstract: This paper introduces a novel solution to the manual control challenge for indoor blimps. The problem's complexity arises from the conflicting demands of executing human commands while maintaining stability through automatic control for underactuated robots. To tackle this challenge, we introduced an assisted piloting hybrid controller with a preemptive mechanism, that seamlessly switches between… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  35. arXiv:2406.09598  [pdf, other

    cs.CV

    Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

    Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

    Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of object… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  36. arXiv:2406.08997  [pdf, ps, other

    cs.CV

    Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

    Authors: Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

    Abstract: Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing wo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by ICME 2024

  37. arXiv:2406.08759  [pdf, other

    cs.CV cs.MM

    Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling

    Authors: Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo

    Abstract: The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage chal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  38. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  39. arXiv:2406.08090  [pdf, other

    cs.CV

    From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

    Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, Jinwei Gu, Tianfan Xue, Shi Guo

    Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.06951  [pdf, other

    astro-ph.SR astro-ph.GA

    Determination method of binary fractions by the integrated spectrum

    Authors: F. Zhang, L. Li, Z. Han, X. Wang

    Abstract: We need to resolve the individual stars for binary fraction determinations of stellar systems. Therefore, it is not possible to obtain the binary fractions for dense or distant stellar systems. % We proposed a method to determine the binary fraction of a dense or distant stellar system. The method is to first determine the binary fraction variation for any two adjacent regions and then add up thos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, accepted by MNRAS

  41. arXiv:2406.06640  [pdf

    physics.comp-ph eess.IV physics.optics

    A high-performance reconstruction method for partially coherent ptychography

    Authors: Wenhui Xu, Shoucong Ning, Pengju Sheng, Huixiang Lin, Angus I Kirkland, Yong Peng, Fucai Zhang

    Abstract: Ptychography is now integrated as a tool in mainstream microscopy allowing quantitative and high-resolution imaging capabilities over a wide field of view. However, its ultimate performance is inevitably limited by the available coherent flux when implemented using electrons or laboratory X-ray sources. We present a universal reconstruction algorithm with high tolerance to low coherence for both f… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  42. arXiv:2406.05357  [pdf, other

    astro-ph.HE

    Classification of Fermi Gamma-Ray Bursts Based on Machine Learning

    Authors: Si-Yuan Zhu, Wan-Peng Sun, Da-Ling Ma, Fu-Wen Zhang

    Abstract: Gamma-ray bursts (GRBs) are typically classified into long and short GRBs based on their durations. However, there is a significant overlapping in the duration distributions of these two categories. In this paper, we apply the unsupervised dimensionality reduction algorithm called t-SNE and UMAP to classify 2061 Fermi GRBs based on four observed quantities: duration, peak energy, fluence, and peak… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures, revised version submitted to MNRAS

    Report number: https://doi.org/10.1093/mnras/stae1594

    Journal ref: MNRAS, 2024, 532, 1434-1443

  43. arXiv:2406.03394  [pdf, other

    cs.CV

    Gaussian Representation for Deformable Image Registration

    Authors: Jihe Li, Fabian Zhang, Xia Li, Tianhao Zhang, Ye Zhang, Joachim Buhmann

    Abstract: Deformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.01007  [pdf, other

    hep-ex

    Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

    Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

    Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  45. arXiv:2406.00947  [pdf, other

    cs.CV

    Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation

    Authors: Fei Gao, Siwen Wang, Fandong Zhang, Hong-Yu Zhou, Yizhou Wang, Churan Wang, Gang Yu, Yizhou Yu

    Abstract: Medical image analysis suffers from a shortage of data, whether annotated or not. This becomes even more pronounced when it comes to 3D medical images. Self-Supervised Learning (SSL) can partially ease this situation by using unlabeled data. However, most existing SSL methods can only make use of data in a single dimensionality (e.g. 2D or 3D), and are incapable of enlarging the training dataset b… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 accept

  46. arXiv:2406.00707  [pdf, other

    cs.RO

    QUADFormer: Learning-based Detection of Cyber Attacks in Quadrotor UAVs

    Authors: Pengyu Wang, Zhaohua Yang, Nachuan Yang, Zikai Wang, Jialu Li, Fan Zhang, Chaoqun Wang, Jiankun Wang, Max Q. -H. Meng, Ling Shi

    Abstract: Safety-critical intelligent cyber-physical systems, such as quadrotor unmanned aerial vehicles (UAVs), are vulnerable to different types of cyber attacks, and the absence of timely and accurate attack detection can lead to severe consequences. When UAVs are engaged in large outdoor maneuvering flights, their system constitutes highly nonlinear dynamics that include non-Gaussian noises. Therefore,… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  47. arXiv:2406.00706  [pdf, other

    cs.RO

    MINER-RRT*: A Hierarchical and Fast Trajectory Planning Framework in 3D Cluttered Environments

    Authors: Pengyu Wang, Jiawei Tang, Hin Wang Lin, Fan Zhang, Chaoqun Wang, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Trajectory planning for quadrotors in cluttered environments has been challenging in recent years. While many trajectory planning frameworks have been successful, there still exists potential for improvements, particularly in enhancing the speed of generating efficient trajectories. In this paper, we present a novel hierarchical trajectory planning framework to reduce computational time and memory… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  48. arXiv:2406.00312  [pdf, other

    cs.RO

    NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization

    Authors: Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang

    Abstract: Can we localize a robot in radiance fields only using monocular vision? This study presents NuRF, a nudged particle filter framework for 6-DoF robot visual localization in radiance fields. NuRF sets anchors in SE(3) to leverage visual place recognition, which provides image comparisons to guide the sampling process. This guidance could improve the convergence and robustness of particle filters for… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 11 pages, 14 figures

  49. arXiv:2406.00212  [pdf, other

    eess.IV cs.CV

    MVAD: A Multiple Visual Artifact Detector for Video Streaming

    Authors: Chen Feng, Duolikun Danier, Fan Zhang, David Bull

    Abstract: Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and/or delivery. Since these can degrade the quality of the user's experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/o… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 9 pages

  50. arXiv:2405.19883  [pdf, other

    cs.LG cs.AI cs.CL

    From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

    Authors: Jianliang He, Siyu Chen, Fengzhuo Zhang, Zhuoran Yang

    Abstract: In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. To this end, consider a hierarchical reinforcement learning (RL) model where the LLM Planner and the Actor perform high-level task planning and low-level execution, respectively. Under this model, the LLM Planner navigates a p… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024