Skip to main content

Showing 1–50 of 1,447 results for author: Yu, W

  1. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.10701  [pdf, other

    cs.CL

    DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

    Authors: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

    Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  3. Towards Robust Recommendation via Decision Boundary-aware Graph Contrastive Learning

    Authors: Jiakai Tang, Sunhao Dai, Zexu Sun, Xu Chen, Jun Xu, Wenhui Yu, Lantao Hu, Peng Jiang, Han Li

    Abstract: In recent years, graph contrastive learning (GCL) has received increasing attention in recommender systems due to its effectiveness in reducing bias caused by data sparsity. However, most existing GCL models rely on heuristic approaches and usually assume entity independence when constructing contrastive views. We argue that these methods struggle to strike a balance between semantic invariance an… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: KDD 2024

  4. arXiv:2407.09932  [pdf, other

    quant-ph

    Quantum Clock Synchronization Network with Silicon-chip Dual-Pumped Entangled Photon Source

    Authors: J. A. Li, H. Han, X. P. Huang, B. Y. Tang, K. Guo, J. Q. Huang, S. Y. Xiong, W. R. Yu, Z. J. Zhang, J. B. Yang, B. Liu, H. Chen, Z. K. Lu

    Abstract: In this paper, we propose a quantum clock synchronization (QCS) network scheme with silicon-chip dual-pumped entangled photon source. This scheme couples two pump beams into the silicon-based waveguide, where degenerate and non-degenerate spontaneous four-wave mixing (SFWM) occurs, generating entanglement between one signal channel and three idler channels. The entangled photons are distributed to… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  5. arXiv:2407.09324  [pdf, other

    cs.LG cs.AI cs.IT

    Provable Privacy Advantages of Decentralized Federated Learning via Distributed Optimization

    Authors: Wenrui Yu, Qiongxiu Li, Milan Lopuhaä-Zwakenberg, Mads Græsbøll Christensen, Richard Heusdens

    Abstract: Federated learning (FL) emerged as a paradigm designed to improve data privacy by enabling data to reside at its source, thus embedding privacy as a core consideration in FL architectures, whether centralized or decentralized. Contrasting with recent findings by Pasquini et al., which suggest that decentralized FL does not empirically offer any additional privacy or security benefits over centrali… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.09013  [pdf, ps, other

    cs.AI cs.LG

    Procedural Content Generation via Generative Artificial Intelligence

    Authors: Xinyu Mao, Wanli Yu, Kazunori D Yamada, Michael R. Zielewski

    Abstract: The attempt to utilize machine learning in PCG has been made in the past. In this survey paper, we investigate how generative artificial intelligence (AI), which saw a significant increase in interest in the mid-2010s, is being used for PCG. We review applications of generative AI for the creation of various types of content, including terrains, items, and even storylines. While generative AI is e… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  7. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.08421  [pdf, other

    astro-ph.HE

    X-ray spectral and timing evolution during the 2018 outburst of MAXI J1820+070

    Authors: YaXing Li, Zhen Yan, ChenXu Gao, Wenfei Yu

    Abstract: We made use high-cadence observations from the $Insight$-HXMT and $NICER$ to scrutinize the spectral and timing evolution during the 2018 outburst of the black hole X-ray binary (BHXRB) MAXI J1820+070. It's hardness-intensity diagram (HID) displays a ''q''-like track including all the spectral states, along a unique loop in the hard state. The tracks observed in the HID is anticipated in the evolu… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures, submitted to MNRAS

  9. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  10. arXiv:2407.07304  [pdf, other

    cs.AI

    Inference Performance Optimization for Large Language Models on CPUs

    Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

    Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

  11. arXiv:2407.06222  [pdf, other

    math.LO

    Formalization of the Filter Extension Principle (FEP) in Coq

    Authors: Guowei Dou, Wensheng Yu

    Abstract: The Filter Extension Principle (FEP) asserts that every filter can be extended to an ultrafilter, which plays a crucial role in the quest for non-principal ultrafilters. Non-principal ultrafilters find widespread applications in logic, set theory, topology, model theory, and especially non-standard extensions of algebraic structures. Since non-principal ultrafilters are challenging to construct di… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Conference on Intelligent Networked Things, 2024 (CINT2024)

  12. arXiv:2407.05540  [pdf, other

    cs.CV

    GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

    Authors: Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

    Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph fo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  13. arXiv:2407.05413  [pdf, other

    cs.AI cs.CL cs.LG

    SBoRA: Low-Rank Adaptation with Regional Weight Updates

    Authors: Lai-Man Po, Yuyang Liu, Haoxuan Wu, Tianqi Zhang, Wing-Yin Yu, Zeyu Jiang, Kun Li

    Abstract: This paper introduces Standard Basis LoRA (SBoRA), a novel parameter-efficient fine-tuning approach for Large Language Models that builds upon the pioneering works of Low-Rank Adaptation (LoRA) and Orthogonal Adaptation. SBoRA further reduces the computational and memory requirements of LoRA while enhancing learning performance. By leveraging orthogonal standard basis vectors to initialize one of… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures

  14. arXiv:2407.05236  [pdf, other

    astro-ph.HE

    A timing view of the additional high-energy spectral component discovered in the black hole candidate Swift J1727.8-1613

    Authors: Zi-Xu Yang, Liang Zhang, Shuang-Nan Zhang, L. Tao, Shu Zhang, Ruican Ma, Qingcui Bu, Yue Huang, He-Xin Liu, Wei Yu, Guang C. Xiao, Peng-Ju Wang, Hua Feng, Li-Ming Song, Xiang Ma, Mingyu Ge, QingChang Zhao, J. L. Qu

    Abstract: We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. I… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  15. arXiv:2407.03971  [pdf, other

    cs.CV

    MineNetCD: A Benchmark for Global Mining Change Detection on Remote Sensing Imagery

    Authors: Weikang Yu, Xiaokang Zhang, Xiao Xiang Zhu, Richard Gloaguen, Pedram Ghamisi

    Abstract: Monitoring changes triggered by mining activities is crucial for industrial controlling, environmental management and regulatory compliance, yet it poses significant challenges due to the vast and often remote locations of mining sites. Remote sensing technologies have increasingly become indispensable to detect and analyze these changes over time. We thus introduce MineNetCD, a comprehensive benc… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  16. arXiv:2407.02190  [pdf, other

    cs.RO

    I2EKF-LO: A Dual-Iteration Extended Kalman Filter Based LiDAR Odometry

    Authors: Wenlu Yu, Jie Xu, Chengwei Zhao, Lijun Zhao, Thien-Minh Nguyen, Shenghai Yuan, Mingming Bai, Lihua Xie

    Abstract: LiDAR odometry is a pivotal technology in the fields of autonomous driving and autonomous mobile robotics. However, most of the current works focus on nonlinear optimization methods, and still existing many challenges in using the traditional Iterative Extended Kalman Filter (IEKF) framework to tackle the problem: IEKF only iterates over the observation equation, relying on a rough estimate of the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by IROS 2024

  17. arXiv:2407.01950  [pdf, other

    cs.RO cs.AI

    LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance

    Authors: Wenhao Yu, Jie Peng, Huanyu Yang, Junrui Zhang, Yifan Duan, Jianmin Ji, Yanyong Zhang

    Abstract: The conditional diffusion model has been demonstrated as an efficient tool for learning robot policies, owing to its advancement to accurately model the conditional distribution of policies. The intricate nature of real-world scenarios, characterized by dynamic obstacles and maze-like structures, underscores the complexity of robot local navigation decision-making as a conditional distribution pro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, accepted by IROS 2024

  18. arXiv:2407.01875  [pdf, ps, other

    cs.AI

    Spatio-Temporal Graphical Counterfactuals: An Overview

    Authors: Mingyu Kang, Duxin Chen, Ziyuan Pu, Jianxi Gao, Wenwu Yu

    Abstract: Counterfactual thinking is a critical yet challenging topic for artificial intelligence to learn knowledge from data and ultimately improve their performances for new scenarios. Many research works, including Potential Outcome Model and Structural Causal Model, have been proposed to realize it. However, their modelings, theoretical foundations and application approaches are usually different. More… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  19. arXiv:2407.01029  [pdf, other

    cs.CV

    EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

    Authors: Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

    Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accpeted by MICCAI2024

  20. arXiv:2407.00946  [pdf

    cond-mat.mtrl-sci

    Atomic cluster expansion interatomic potential for defects and thermodynamics of Cu-W system

    Authors: Jiahao Pan, Huiqun Cheng, Gaosheng Yan, Lei Zhang, Wenshan Yu, Shengping Shen

    Abstract: The unique properties exhibited in immiscible metals, such as excellent strength, hardness, and radiation-damage tolerance, have stimulated the interest of many researchers. As a typical immiscible metal system, the Cu-W nano-multilayers combine the plasticity of copper and the strength of tungsten, making it a suitable candidate for applications in aerospace, nuclear fusion engineering, and elect… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 26 pages, 14 figures

  21. arXiv:2407.00029  [pdf, other

    cs.DC

    Distributed Inference Performance Optimization for LLMs on CPUs

    Authors: Pujiang He, Shan Zhou, Changqing Li, Wenhuan Huang, Weifei Yu, Duyi Wang, Chen Meng, Sheng Gui

    Abstract: Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and ex… ▽ More

    Submitted 16 May, 2024; originally announced July 2024.

    Comments: 4 pages, 3 figures, Practical ML for Low Resource Settings Workshop @ ICLR 2024

  22. arXiv:2406.20019  [pdf, other

    cs.IT

    Capacity Bounds for Broadcast Channels with Bidirectional Conferencing Decoders

    Authors: Reza K. Farsani, Wei Yu

    Abstract: The two-user broadcast channel (BC) with receivers connected by cooperative links of given capacities, known as conferencing decoders, is considered. A novel outer bound on the capacity region is established. This outer bound is derived using multiple applications of the Csiszár-Körner identity. New achievable rate regions are also presented. A first achievable rate region is derived by applying M… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  23. arXiv:2406.19820  [pdf, other

    cs.CL cs.AI

    BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

    Authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Haotian Wang, Kun Zhu, Xiyuan Du, Weijiang Yu, Ming Liu, Bing Qin

    Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities. Nevertheless, they still suffer from factual errors when tackling knowledge-intensive tasks. Retrieval-augmented reasoning represents a promising approach. However, significant challenges still persist, including inaccurate and insufficient retrieval for complex questions, as well as difficulty in integrating multi-sourc… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  24. arXiv:2406.19627  [pdf

    eess.SY

    Practical Power System Inertia Monitoring Based on Pumped Storage Hydropower Operation Signature

    Authors: Hongyu Li, Chang Chen, Mark Baldwin, Shutang You, Wenpeng Yu, Lin Zhu, Yilu Liu

    Abstract: This paper proposes a practical method to monitor power system inertia using Pumped Storage Hydropower (PSH) switching-off events. This approach offers real-time system-level inertia estimation with minimal expenses, no disruption, and the inclusion of behind-the-meter inertia. First, accurate inertia estimation is achieved through improved RoCoF calculation that accounts for pre-event RoCoF, redu… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 8 pages, 15 figures

  25. arXiv:2406.19483  [pdf, ps, other

    eess.SP

    Localization in Multipath Environments via Active Sensing with Reconfigurable Intelligent Surfaces

    Authors: Yinghan Li, Wei Yu

    Abstract: This letter investigates an uplink pilot-based wireless indoor localization problem in a multipath environment for a single-input single-output (SISO) narrowband communication system aided by reconfigurable intelligent surface (RIS). The indoor localization problem is challenging because the uplink channel consists of multiple overlapping propagation paths with varying amplitudes and phases, which… ▽ More

    Submitted 8 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  26. arXiv:2406.18361  [pdf, other

    cs.CV cs.AI eess.IV

    Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

    Authors: Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng

    Abstract: Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first laten… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at MICCAI 2024. Code and citation info see https://github.com/lin-tianyu/Stable-Diffusion-Seg

  27. arXiv:2406.18008  [pdf, other

    cs.IT

    Rate-Distortion-Perception Tradeoff for Gaussian Vector Sources

    Authors: Jingjing Qian, Sadaf Salehkalaibar, Jun Chen, Ashish Khisti, Wei Yu, Wuxian Shi, Yiqun Ge, Wen Tong

    Abstract: This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  28. arXiv:2406.17269  [pdf, other

    hep-th

    Elko as an inflaton candidate

    Authors: Xinglong Chen, Cheng-Yang Lee, Yanjiao Ma, Haomin Rao, Wenqi Yu, Siyi Zhou

    Abstract: Elko is a spin-half fermion with a two-fold Wigner degeneracy and Klein-Gordon dynamics. In this paper, we show that in a spatially flat FLRW space-time, slow-roll inflation can be initiated by the homogeneous Elko fields. The inflaton is a composite scalar field obtained by contracting the spinor field with its dual. This is possible because the background evolution as described by the Friedmann… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 15 pages, 8 figures

  29. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  30. arXiv:2406.15704  [pdf, other

    cs.CV

    video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

    Authors: Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024. arXiv admin note: substantial text overlap with arXiv:2310.05863

  31. arXiv:2406.12064  [pdf, other

    q-bio.GN

    skandiver: a divergence-based analysis tool for identifying intercellular mobile genetic elements

    Authors: Xiaolei Brian Zhang, Grace Oualline, Jim Shaw, Yun William Yu

    Abstract: Mobile genetic elements (MGEs) are as ubiquitous in nature as they are varied in type, ranging from viral insertions to transposons to incorporated plasmids. Horizontal transfer of MGEs across bacterial species may also pose a significant threat to global health due to their capability to harbour antibiotic resistance genes. However, despite cheap and rapid whole genome sequencing, the varied natu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 9 pages, 6 figures

  32. arXiv:2406.12050  [pdf, other

    cs.CL

    Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

    Authors: Zhihan Zhang, Zhenwen Liang, Wenhao Yu, Dian Yu, Mengzhao Jia, Dong Yu, Meng Jiang

    Abstract: Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening the training set with various data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating a deeper under… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11551  [pdf, other

    cs.CV

    Simple Yet Efficient: Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment

    Authors: Jianan Jiang, Di Wu, Zhilin Jiang, Weiren Yu

    Abstract: Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose a simple yet efficient approach to narrow the gap between the two modes. It mainly facilitate… ▽ More

    Submitted 22 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages,8 figures, 4 tables

  34. arXiv:2406.11507  [pdf, other

    cs.CV

    Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

    Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

    Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Industrial Informatics

  35. arXiv:2406.10780  [pdf, ps, other

    math.CO

    Colouring negative exact-distance graphs of signed graphs

    Authors: Reza Naserasr, Patrice Ossona de Mendez, Daniel A. Quiroz, Robert Šámal, Weiqiang Yu

    Abstract: The $k$-th exact-distance graph, of a graph $G$ has $V(G)$ as its vertex set, and $xy$ as an edge if and only if the distance between $x$ and $y$ is (exactly) $k$ in $G$. We consider two possible extensions of this notion for signed graphs. Finding the chromatic number of a negative exact-distance square of a signed graph is a weakening of the problem of finding the smallest target graph to which… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 16 pages, 2 figures, 3 tables

    MSC Class: 05C10; 05C12; 05C15; 05C22; 05C60

  36. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  37. arXiv:2406.10583  [pdf, other

    hep-ex

    Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

    Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

    Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Report number: FERMILAB-PUB-24-0301

  38. arXiv:2406.10222  [pdf, other

    astro-ph.IM

    Ultra-low noise laser and optical frequency comb-based timing system for the Black Hole Explorer (BHEX) mission

    Authors: Hannah Tomio, Guangning Yang, Holly F. Leopardi, Kenji Numata, Anthony W. Yu, Andrew Attar, Xiaozhen Xu, Wei Lu, Cheryl Gramling, T. K. Sridharan, Peter Kurczynski

    Abstract: In this effort, we demonstrate the performance of a highly stable time reference for the proposed Black Hole Explorer (BHEX) mission, a space-based extension to the Event Horizon Telescope (EHT) Very Long Baseline Interferometry (VLBI) project. This precision timing system is based on the use of a space-qualified, ultra-low noise laser developed as part of the Laser Interferometer Space Antenna (L… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: To be published in the proceedings of SPIE Astronomical Telescopes + Instrumentation 2024

  39. arXiv:2406.10123  [pdf, other

    hep-ex physics.ins-det

    Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

    Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (164 additional authors not shown)

    Abstract: We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Report number: FERMILAB-PUB-24-0287

  40. arXiv:2406.09747  [pdf, ps, other

    quant-ph physics.atom-ph

    Hybrid atom-photon entangling gates via Gaussian soft control

    Authors: Wanrang Yu, Qiuyu Yin, Yanzhao Liang, Ning Ji, Thibault Vogt

    Abstract: Hybrid atom-photon gates play an important role for the realization of a quantum interface capable of mapping atomic states to photons for communication across quantum networks. Here, we propose a feasible theoretical scheme for implementing a hybrid atom-photon controlled-Z gate between an atom and a microwave photon in a superconducting coplanar waveguide resonator based on the Gaussian soft con… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  41. arXiv:2406.09742  [pdf, other

    cs.IR

    IFA: Interaction Fidelity Attention for Entire Lifelong Behaviour Sequence Modeling

    Authors: Wenhui Yu, Chao Feng, Yanze Zhang, Lantao Hu, Peng Jiang, Han Li

    Abstract: The lifelong user behavior sequence provides abundant information of user preference and gains impressive improvement in the recommendation task, however increases computational consumption significantly. To meet the severe latency requirement in online service, a short sub-sequence is sampled based on similarity to the target item. Unfortunately, items not in the sub-sequence are abandoned, leadi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures

  42. arXiv:2406.09295  [pdf, other

    cs.CL cs.CV

    AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

    Authors: Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

    Abstract: Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  43. arXiv:2406.09166  [pdf, other

    cs.CV cs.AI

    Fine-Grained Domain Generalization with Feature Structuralization

    Authors: Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu

    Abstract: Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distributi… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  44. arXiv:2406.07914  [pdf, other

    cs.SD eess.AS

    Can Large Language Models Understand Spatial Audio?

    Authors: Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

    Abstract: This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  45. arXiv:2406.07333  [pdf, other

    cs.CV

    Global-Regularized Neighborhood Regression for Efficient Zero-Shot Texture Anomaly Detection

    Authors: Haiming Yao, Wei Luo, Yunkang Cao, Yiheng Zhang, Wenyong Yu, Weiming Shen

    Abstract: Texture surface anomaly detection finds widespread applications in industrial settings. However, existing methods often necessitate gathering numerous samples for model training. Moreover, they predominantly operate within a close-set detection framework, limiting their ability to identify anomalies beyond the training dataset. To tackle these challenges, this paper introduces a novel zero-shot te… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SUBMISSION TO IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

  46. arXiv:2406.06420  [pdf, other

    cs.LG

    An Improved Empirical Fisher Approximation for Natural Gradient Descent

    Authors: Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

    Abstract: Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementati… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 33 pages, 11 figures, 7 tables

  47. arXiv:2406.05491  [pdf, other

    cs.CV cs.CR

    One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

    Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

    Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  48. Fast-Fading Channel and Power Optimization of the Magnetic Inductive Cellular Network

    Authors: Honglei Ma, Erwu Liu, Zhijun Fang, Rui Wang, Yongbin Gao, Wenjun Yu, Dongming Zhang

    Abstract: The cellular network of magnetic Induction (MI) communication holds promise in long-distance underground environments. In the traditional MI communication, there is no fast-fading channel since the MI channel is treated as a quasi-static channel. However, for the vehicle (mobile) MI (VMI) communication, the unpredictable antenna vibration brings the remarkable fast-fading. As such fast-fading cann… ▽ More

    Submitted 7 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by the IEEE TWC for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  49. arXiv:2406.04649  [pdf, other

    cs.CV

    SMART: Scene-motion-aware human action recognition framework for mental disorder group

    Authors: Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Qi Wu, Zhen Sun, Wenxian Yu, Ling Pei

    Abstract: Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  50. arXiv:2406.03834  [pdf, other

    astro-ph.HE

    The Broadband X-ray Spectral Properties during the Rising Phases of the Outburst of the New Black Hole X-ray Binary Candidate Swift J1727.8-1613

    Authors: He-Xin Liu, Yan-Jun Xu, Shuang-Nan Zhang, Wei Yu, Yue Huang, Lian Tao, Liang Zhang, Zi-Xu Yang, Qing-Chang Zhao, Jin-Lu Qu, Li-Ming Song

    Abstract: We report data analysis results about the outburst evolution and spectral properties during the hard state of the recently discovered X-ray transient Swift J1727.8-163 as observed by \emph{Insight}-HXMT and NuSTAR. We find that the broadband X-ray spectrum of Swift J1727.8-163 is more complex than the most typical spectral patterns of black hole X-ray binary systems, with not only a comparatively… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 16 pages, 6 figures