Skip to main content

Showing 1–50 of 638 results for author: Ge, Y

  1. arXiv:2407.11643  [pdf, other

    eess.SP

    Batch SLAM with PMBM Data Association Sampling and Graph-Based Optimization

    Authors: Yu Ge, Ossi Kaltiokallio, Yuxuan Xia, Ángel F. García-Fernández, Hyowon Kim, Jukka Talvitie, Mikko Valkama, Henk Wymeersch, Lennart Svensson

    Abstract: Simultaneous localization and mapping (SLAM) methods need to both solve the data association (DA) problem and the joint estimation of the sensor trajectory and the map, conditioned on a DA. In this paper, we propose a novel integrated approach to solve both the DA problem and the batch SLAM problem simultaneously, combining random finite set (RFS) theory and the graph-based SLAM approach. A sampli… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.10694  [pdf

    cs.CV

    Features Reconstruction Disentanglement Cloth-Changing Person Re-Identification

    Authors: Zhihao Chen, Yiyuan Ge, Qing Yue

    Abstract: Cloth-changing person re-identification (CC-ReID) aims to retrieve specific pedestrians in a cloth-changing scenario. Its main challenge is to disentangle the clothing-related and clothing-unrelated features. Most existing approaches force the model to learn clothing-unrelated features by changing the color of the clothes. However, due to the lack of ground truth, these methods inevitably introduc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 2024 International Conference on Intelligent Computing

  4. arXiv:2407.09868  [pdf

    physics.med-ph

    Separation of Sodium Signals Between Mono- and Bi-Exponential T2 Decays via Multi-TE Single-Quantum Sodium (23Na) MRI

    Authors: Yongxian Qian, Ying-Chia Lin, Xingye Chen, Tiejun Zhao, Karthik Lakshmanan, Yulin Ge, Yvonne W. Lui, Fernando E. Boada

    Abstract: Purpose. It is a long standing pursuit in sodium (23Na) MRI to separate signals between mono and bi exponential T2 decays in the human brain, due to lack of clinically translational solutions under the restriction of intrinsically low signal to noise ratio (SNR). Here we propose a new technique called multi TE single quantum (MSQ) sodium MRI to address the challenge. Methods. We exploit an intrins… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 37 pages and 14 figures

  5. arXiv:2407.08683  [pdf, other

    cs.CV

    SEED-Story: Multimodal Long Story Generation with Large Language Model

    Authors: Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen

    Abstract: With the remarkable advancements in image generation and open-form text generation, the creation of interleaved image-text content has become an increasingly intriguing field. Multimodal story generation, characterized by producing narrative texts and vivid images in an interleaved manner, has emerged as a valuable and practical task with broad applications. However, this task poses significant ch… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Our models, codes and datasets are released in https://github.com/TencentARC/SEED-Story

  6. arXiv:2407.03842  [pdf, other

    cs.CV

    Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation

    Authors: Linlong Fan, Ye Huang, Yanqi Ge, Wen Li, Lixin Duan

    Abstract: Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint positions and quantities, and their poses are not aligned. However, most view-based methods, which aggregate multiple view features to obtain a global fe… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  7. arXiv:2407.00736  [pdf, other

    quant-ph cs.ET cs.LG

    Quantum Circuit Synthesis and Compilation Optimization: Overview and Prospects

    Authors: Yan Ge, Wu Wenjie, Chen Yuheng, Pan Kaisen, Lu Xudong, Zhou Zixiang, Wang Yuhan, Wang Ruocheng, Yan Junchi

    Abstract: Quantum computing is regarded as a promising paradigm that may overcome the current computational power bottlenecks in the post-Moore era. The increasing maturity of quantum processors, especially superconducting ones, provides more possibilities for the development and implementation of quantum algorithms. As the crucial stages for quantum algorithm implementation, the logic circuit design and qu… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 32 page, 3 figures, 3 tables

  8. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  9. arXiv:2406.18165  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci

    Prediction of superconductivity in Bilayer Kagome borophene

    Authors: Yifan Han, Yue Shang, Wenhui Wan, Yong Liu, Yanfeng Ge

    Abstract: The element boron has long been central to two-dimensional superconducting materials, and numerous studies have demonstrated the presence of superconductivity in various boron-based structures. Recent work introduced a new variant: Bilayer Kagome borophene, characterized by its bilayer Kagome lattice with van Hove singularity. Using first-principles calculations, our research investigates the uniq… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  10. arXiv:2406.18008  [pdf, other

    cs.IT

    Rate-Distortion-Perception Tradeoff for Gaussian Vector Sources

    Authors: Jingjing Qian, Sadaf Salehkalaibar, Jun Chen, Ashish Khisti, Wei Yu, Wuxian Shi, Yiqun Ge, Wen Tong

    Abstract: This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.17950  [pdf, other

    eess.SP

    V2X Sidelink Positioning in FR1: From Ray-Tracing and Channel Estimation to Bayesian Tracking

    Authors: Yu Ge, Maximilian Stark, Musa Furkan Keskin, Hui Chen, Guillaume Jornod, Thomas Hansen, Frank Hofmann, Henk Wymeersch

    Abstract: Sidelink positioning research predominantly focuses on the snapshot positioning problem, often within the mmWave band. Only a limited number of studies have delved into vehicle-to-anything (V2X) tracking within sub-6 GHz bands. In this paper, we investigate the V2X sidelink tracking challenges over sub-6 GHz frequencies. We propose a Kalman-filter-based tracking approach that leverages the estimat… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  12. arXiv:2406.12671  [pdf, other

    cs.CV

    GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models

    Authors: Yongtao Ge, Guangkai Xu, Zhiyue Zhao, Libo Sun, Zheng Huang, Yanlong Sun, Hao Chen, Chunhua Shen

    Abstract: Recent advances in discriminative and generative pretraining have yielded geometry estimation models with strong generalization capabilities. While discriminative monocular geometry estimation methods rely on large-scale fine-tuning data to achieve zero-shot generalization, several generative-based paradigms show the potential of achieving impressive generalization performance on unseen scenes by… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Code and Benchmark are available at: https://github.com/aim-uofa/GeoBench

  13. arXiv:2406.12275  [pdf, other

    cs.CV

    VoCo-LLaMA: Towards Vision Compression with Large Language Models

    Authors: Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Ying Shan, Yansong Tang

    Abstract: Vision-Language Models (VLMs) have achieved remarkable success in various multi-modal tasks, but they are often bottlenecked by the limited context window and high computational cost of processing high-resolution image inputs and videos. Vision compression can alleviate this problem by reducing the vision token count. Previous approaches compress vision tokens with external modules and force LLMs… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures

  14. arXiv:2406.11602  [pdf, other

    astro-ph.SR

    Association between a Failed Prominence Eruption and the Drainage of Mass from Another Prominence

    Authors: Jianchao Xue, Li Feng, Hui Li, Ping Zhang, Jun Chen, Guanglu Shi, Kaifan Ji, Ye Qiu, Chuan Li, Lei Lu, Beili Ying, Ying Li, Yu Huang, Youping Li, Jingwei Li, Jie Zhao, Dechao Song, Shuting Li, Zhengyuan Tian, Yingna Su, Qingmin Zhang, Yunyi Ge, Jiahui Shan, Qiao Li, Gen Li , et al. (9 additional authors not shown)

    Abstract: Sympathetic eruptions of solar prominences have been studied for decades, however, it is usually difficult to identify their causal links. Here we present two failed prominence eruptions on 26 October 2022 and explore their connections. Using stereoscopic observations, the south prominence (PRO-S) erupts with untwisting motions, flare ribbons occur underneath, and new connections are formed during… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures, has been accepted by Solar Physics

  15. CAMEL. II. A 3D Coronal Mass Ejection Catalog Based on Coronal Mass Ejection Automatic Detection with Deep Learning

    Authors: Jiahui Shan, Huapeng Zhang, Lei Lu, Yan Zhang, Li Feng, Yunyi Ge, Jianchao Xue, Shuting Li

    Abstract: Coronal mass ejections (CMEs) are major drivers of geomagnetic storms, which may cause severe space weather effects. Automating the detection, tracking, and three-dimensional (3D) reconstruction of CMEs is important for operational predictions of CME arrivals. The COR1 coronagraphs on board the Solar Terrestrial Relations Observatory spacecraft have facilitated extensive polarization observations,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  16. arXiv:2406.02395  [pdf, other

    cs.LG cs.CV

    GrootVL: Tree Topology is All You Need in State Space Model

    Authors: Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

    Abstract: The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/EasonXiao-888/GrootVL

  17. arXiv:2406.01371  [pdf, other

    eess.SY

    An Origami-Inspired Endoscopic Capsule with Tactile Perception for Early Tissue Anomaly Detection

    Authors: Yukun Ge, Rui Zong, Xiaoshuai Zhang, Thrishantha Nanayakkara

    Abstract: Video Capsule Endoscopy (VCE) is currently one of the most effective methods for detecting intestinal diseases. However, it is challenging to detect early-stage small nodules with this method because they lack obvious color or shape features. In this letter, we present a new origami capsule endoscope to detect early small intestinal nodules using tactile sensing. Four soft tactile sensors made out… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  18. arXiv:2405.19519  [pdf, other

    cs.CL cs.AI

    Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

    Authors: Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker

    Abstract: Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for qu… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.15287  [pdf, other

    cs.CV

    StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models

    Authors: Chengming Xu, Kai Hu, Donghao Luo, Jiangning Zhang, Wei Li, Yanhao Ge, Chengjie Wang

    Abstract: Stylized Text-to-Image Generation (STIG) aims to generate images based on text prompts and style reference images. We in this paper propose a novel framework dubbed as StyleMaster for this task by leveraging pretrained Stable Diffusion (SD), which tries to solve the previous problems such as insufficient style and inconsistent semantics. The enhancement lies in two novel module, namely multi-sourc… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  20. arXiv:2405.12970  [pdf, ps, other

    cs.CV

    Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

    Authors: Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong Liu

    Abstract: Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed… ▽ More

    Submitted 8 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted to ECCV2024; Project Page: https://faceadapter.github.io/face-adapter.github.io/

  21. The Milky Way Atlas for Linear Filaments

    Authors: Ke Wang, Yifei Ge, Tapas Baug

    Abstract: Filamentary structure is important for the ISM and star formation. Galactic distribution of filaments may regulate the star formation rate in the Milky Way. However, interstellar filaments are intrinsically complex, making it difficult to study quantitatively. Here, we focus on linear filaments, the simplest morphology that can be treated as building blocks of any filamentary structure. We present… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to A&A Letters. 15 pages, 6 figures, 1 table

    Journal ref: A&A 686, L11 (2024)

  22. arXiv:2405.09546  [pdf, other

    cs.CV

    BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation

    Authors: Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu

    Abstract: The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and renderin… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 (Highlight). Project website: https://behavior-vision-suite.github.io/

  23. arXiv:2405.07990  [pdf, other

    cs.CL cs.CV

    Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

    Authors: Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo

    Abstract: The remarkable progress of Multi-modal Large Language Models (MLLMs) has attracted significant attention due to their superior performance in visual contexts. However, their capabilities in turning visual figure to executable code, have not been evaluated thoroughly. To address this, we introduce Plot2Code, a comprehensive visual coding benchmark designed for a fair and in-depth assessment of MLLM… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  24. arXiv:2405.07027  [pdf, other

    cs.CV cs.AI cs.RO

    TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

    Authors: Zhen Tan, Zongtan Zhou, Yangbing Ge, Zi Wang, Xieyuanli Chen, Dewen Hu

    Abstract: The reliance on accurate camera poses is a significant barrier to the widespread deployment of Neural Radiance Fields (NeRF) models for 3D reconstruction and SLAM tasks. The existing method introduces monocular depth priors to jointly optimize the camera poses and NeRF, which fails to fully exploit the depth priors and neglects the impact of their inherent noise. In this paper, we propose Truncate… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  25. arXiv:2405.06145  [pdf, other

    cs.CL cs.AI cs.LG

    Reddit-Impacts: A Named Entity Recognition Dataset for Analyzing Clinical and Social Effects of Substance Use Derived from Social Media

    Authors: Yao Ge, Sudeshna Das, Karen O'Connor, Mohammed Ali Al-Garadi, Graciela Gonzalez-Hernandez, Abeed Sarker

    Abstract: Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entit… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 7 pages, 1 figure, 4 tables

  26. arXiv:2405.04007  [pdf, other

    cs.CV

    SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

    Authors: Yuying Ge, Sijie Zhao, Chen Li, Yixiao Ge, Ying Shan

    Abstract: In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario da… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Technical Report; Dataset released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit

  27. arXiv:2405.03119  [pdf, ps, other

    cs.IT eess.SP

    DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission

    Authors: Yiwei Tao, Miaowen Wen, Yao Ge, Tianqi Mao, Lixia Xiao, Jun Li

    Abstract: Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In thi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  28. arXiv:2405.02604  [pdf, ps, other

    cs.IT eess.SP

    Interleave Frequency Division Multiplexing

    Authors: Yuhao Chi, Lei Liu, Yao Ge, Xuehui Chen, Ying Li, Zhaoyang Zhang

    Abstract: In this letter, we study interleave frequency division multiplexing (IFDM) for multicarrier modulation in static multipath and mobile time-varying channels, which outperforms orthogonal frequency division multiplexing (OFDM), orthogonal time frequency space (OTFS), and affine frequency division multiplexing (AFDM) by considering practical advanced detectors. The fundamental principle underlying ex… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Wireless Communications Letters

  29. arXiv:2405.01312  [pdf, other

    cs.DB cs.CR

    Privacy-Enhanced Database Synthesis for Benchmark Publishing

    Authors: Yongrui Zhong, Yunqing Ge, Jianbin Qin, Shuyuan Zheng, Bo Tang, Yu-Xuan Qiu, Rui Mao, Ye Yuan, Makoto Onizuka, Chuan Xiao

    Abstract: Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often fail to reflect the varied nature of user workloads. As a result, there is increasing momentum toward creating databases that incorporate real-world user data to more accurately mirror business environments. However, privacy concerns deter users from directly sharing their data, underscoring the importance of creating syn… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  30. arXiv:2405.01308  [pdf, ps, other

    astro-ph.SR

    Spectral and Imaging Observations of a C2.3 White-Light Flare from the Advanced Space-Based Solar Observatory (ASO-S) and the Chinese H$α$ Solar Explorer (CHASE)

    Authors: Qiao Li, Ying Li, Yang Su, Dechao Song, Hui Li, Li Feng, Yu Huang, Youping Li, Jingwei Li, Jie Zhao, Lei Lu, Beili Ying, Jianchao Xue, Ping Zhang, Jun Tian, Xiaofeng Liu, Gen Li, Zhichen Jing, Shuting Li, Guanglu Shi, Zhengyuan Tian, Wei Chen, Yingna Su, Qingmin Zhang, Dong Li , et al. (5 additional authors not shown)

    Abstract: Solar white-light flares are characterized by an enhancement in the optical continuum, which are usually large flares (say X- and M-class flares). Here we report a small C2.3 white-light flare (SOL2022-12-20T04:10) observed by the \emph{Advanced Space-based Solar Observatory} and the \emph{Chinese H$α$ Solar Explorer}. This flare exhibits an increase of $\approx$6.4\% in the photospheric Fe \texts… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 23 pages, 6 figures, accepted by Solar Physics

  31. arXiv:2404.19752  [pdf, other

    cs.CV

    Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

    Authors: Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui

    Abstract: Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning model… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  32. arXiv:2404.16957  [pdf, other

    cs.AI cs.CY

    Attributing Responsibility in AI-Induced Incidents: A Computational Reflective Equilibrium Framework for Accountability

    Authors: Yunfei Ge, Quanyan Zhu

    Abstract: The pervasive integration of Artificial Intelligence (AI) has introduced complex challenges in the responsibility and accountability in the event of incidents involving AI-enabled systems. The interconnectivity of these systems, ethical concerns of AI-induced incidents, coupled with uncertainties in AI technology and the absence of corresponding regulations, have made traditional responsibility at… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  33. arXiv:2404.16790  [pdf, other

    cs.CV

    SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

    Authors: Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan

    Abstract: Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive texts embedded within images. Recently, the advent of MLLMs with impressive versatility has raised the bar for what we can expect from MLLMs. However, their pro… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  34. arXiv:2404.14396  [pdf, other

    cs.CV

    SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    Authors: Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

    Abstract: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a gap between its capability and the real-world applicability, primarily due to the model's limited capacity to effectively respond to various user instructions and interact with diverse visual data. I… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project released at: https://github.com/AILab-CVC/SEED-X

  35. arXiv:2404.13884  [pdf

    eess.IV cs.CV

    MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPs

    Authors: Zhihao Chen, Yiyuan Ge

    Abstract: Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach i… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.08824 by other authors

  36. arXiv:2404.13600  [pdf, other

    cs.RO

    Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments

    Authors: Zirui Wang, Chen Yao, Yangtao Ge, Guowei Shi, Ningbo Yang, Zheng Zhu, Kewei Dong, Hexiang Wei, Zhenzhong Jia, Jing Wu

    Abstract: So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and mapping capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

  37. arXiv:2404.10291  [pdf, other

    eess.SP

    Robust Snapshot Radio SLAM

    Authors: Ossi Kaltiokallio, Elizaveta Rastorgueva-Foi, Jukka Talvitie, Yu Ge, Henk Wymeersch, Mikko Valkama

    Abstract: The intrinsic geometric connections between millimeter-wave (mmWave) signals and the propagation environment can be leveraged for simultaneous localization and mapping (SLAM) in 5G and beyond networks. However, estimated channel parameters that are mismatched to the utilized geometric model can cause the SLAM solution to degrade. In this paper, we propose a robust snapshot radio SLAM algorithm for… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  38. arXiv:2404.08882  [pdf, other

    physics.med-ph physics.optics

    Explanations of MTF discrepancy in grating-based X-ray differential phase contrast CT imaging

    Authors: Yuhang Tan, Jiecheng Yang, Hairong Zheng, Dong Liang, Peiping Zhu, Yongshuai Ge

    Abstract: As a multi-contrast X-ray computed tomography (CT) imaging system, the grating-based Talbot-Lau interferometer is able to generate the absorption contrast and differential phase contrast (DPC) images concurrently. However, experiments found that the absorption CT (ACT) images have better spatial resolution, i.e., higher modulation transfer function (MTF), than the differential phase contrast CT (D… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 7 pages,3 figures

    ACM Class: J.2

  39. arXiv:2404.07855  [pdf, other

    cs.CV

    Resolve Domain Conflicts for Generalizable Remote Physiological Measurement

    Authors: Weiyu Sun, Xinyu Zhang, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen

    Abstract: Remote photoplethysmography (rPPG) technology has become increasingly popular due to its non-invasive monitoring of various physiological indicators, making it widely applicable in multimedia interaction, healthcare, and emotion analysis. Existing rPPG methods utilize multiple datasets for training to enhance the generalizability of models. However, they often overlook the underlying conflict issu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by ACM MM 2023

  40. arXiv:2404.07019  [pdf, other

    physics.optics nlin.CD quant-ph

    Chiral Chaos Enhanced Sensing

    Authors: Yun-Qiu Ge, Zhe Wang, Qian-Chuan Zhao, Jing Zhang, Yu-xi Liu

    Abstract: Chirality refers to the property that an object and its mirror image cannot overlap each other by spatial rotation and translation, and can be found in various research fields. We here propose chiral chaos and construct a chiral chaotic device via coupled whispering gallery mode resonators, where routes to chaos exhibit pronounced chirality for two opposite pumping directions. The mechanism respon… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  41. arXiv:2404.06835  [pdf, other

    cs.CV

    Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer

    Authors: Yanqi Ge, Jiaqi Liu, Qingnan Fan, Xi Jiang, Ye Huang, Shuai Qin, Hong Gu, Wen Li, Lixin Duan

    Abstract: In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this w… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  42. arXiv:2404.03443  [pdf, ps, other

    cs.CV

    Part-Attention Based Model Make Occluded Person Re-Identification Stronger

    Authors: Zhihao Chen, Yiyuan Ge

    Abstract: The goal of occluded person re-identification (ReID) is to retrieve specific pedestrians in occluded situations. However, occluded person ReID still suffers from background clutter and low-quality local feature representations, which limits model performance. In our research, we introduce a new framework called PAB-ReID, which is a novel ReID model incorporating part-attention mechanisms to tackle… ▽ More

    Submitted 1 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted By International Joint Conference on Neural Networks 2024

  43. arXiv:2404.00308  [pdf, other

    cs.CV

    ST-LLM: Large Language Models Are Effective Temporal Learners

    Authors: Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li

    Abstract: Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation, prompting research efforts towards video LLMs to facilitate human-AI interaction at the video level. However, how to effectively encode and understand videos in video-based dialogue systems remains to be solved. In this paper, we investigate a straightforward yet unexplored question: Can we fe… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  44. arXiv:2403.19021  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    IDGenRec: LLM-RecSys Alignment with Textual ID Learning

    Authors: Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang

    Abstract: Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using… ▽ More

    Submitted 17 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted in SIGIR 2024

  45. arXiv:2403.18691  [pdf, other

    cond-mat.stat-mech cond-mat.str-el hep-th quant-ph

    Building defect conformal field theory from the Sachdev-Ye-Kitaev interactions

    Authors: Yang Ge, Shao-Kai Jian

    Abstract: The coupling between defects and extended critical degrees of freedom gives rise to the intriguing theory known as defect conformal field theory (CFT). In this work, we introduce a novel family of boundary and interface CFTs by coupling $N$ Majorana chains with SYK$_q$ interactions at the defect. Our analysis reveals that the interaction with $q=2$ constitutes a new marginal defect. Employing a ve… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 15 pages, 6 figures

  46. arXiv:2403.18189  [pdf

    cond-mat.mes-hall

    Interfacial magnetic spin Hall effect in van der Waals Fe3GeTe2/MoTe2 heterostructure

    Authors: Yudi Dai, Junlin Xiong, Yanfeng Ge, Bin Cheng, Lizheng Wang, Pengfei Wang, Zenglin Liu, Shengnan Yan, Cuiwei Zhang, Xianghan Xu, Youguo Shi, Sang-Wook Cheong, Cong Xiao, Shengyuan A. Yang, Shi-Jun Liang, Feng Miao

    Abstract: The spin Hall effect (SHE) allows efficient generation of spin polarization or spin current through charge current and plays a crucial role in the development of spintronics. While SHE typically occurs in non-magnetic materials and is time-reversal even, exploring time-reversal-odd (T-odd) SHE, which couples SHE to magnetization in ferromagnetic materials, offers a new charge-spin conversion mecha… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Journal ref: Nature Communications 15, 1129 (2024)

  47. arXiv:2403.17664  [pdf, other

    cs.CV

    DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation

    Authors: Qilin Wang, Jiangning Zhang, Chengming Xu, Weijian Cao, Ying Tai, Yue Han, Yanhao Ge, Hong Gu, Chengjie Wang, Yanwei Fu

    Abstract: Facial Appearance Editing (FAE) aims to modify physical attributes, such as pose, expression and lighting, of human facial images while preserving attributes like identity and background, showing great importance in photograph. In spite of the great progress in this area, current researches generally meet three challenges: low generation fidelity, poor attribute preservation, and inefficient infer… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  48. arXiv:2403.16971  [pdf, other

    cs.OS cs.AI cs.CL

    AIOS: LLM Agent Operating System

    Authors: Kai Mei, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, Yongfeng Zhang

    Abstract: The integration and deployment of large language model (LLM)-based intelligent agents have been fraught with challenges that compromise their efficiency and efficacy. Among these issues are sub-optimal scheduling and resource allocation of agent requests over the LLM, the difficulties in maintaining context during interactions between agent and LLM, and the complexities inherent in integrating het… ▽ More

    Submitted 25 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 5 figures, 5 tables; comments and suggestions are appreciated

  49. arXiv:2403.16875  [pdf, other

    cs.RO

    TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments

    Authors: Chen Yao, Yangtao Ge, Guowei Shi, Zirui Wang, Ningbo Yang, Zheng Zhu, Hexiang Wei, Yuntian Zhao, Jing Wu, Zhenzhong Jia

    Abstract: Terrain-aware perception holds the potential to improve the robustness and accuracy of autonomous robot navigation in the wilds, thereby facilitating effective off-road traversals. However, the lack of multi-modal perception across various motion patterns hinders the solutions of Simultaneous Localization And Mapping (SLAM), especially when confronting non-geometric hazards in demanding landscapes… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Robotics and Automation Letters

  50. arXiv:2403.16411  [pdf, other

    eess.SY

    A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups

    Authors: Yixiao Ge, Pieter van Goor, Robert Mahony

    Abstract: Stochastic inference on Lie groups plays a key role in state estimation problems such as; inertial navigation, visual inertial odometry, pose estimation in virtual reality, etc. A key problem is fusing independent concentrated Gaussian distributions defined at different reference points on the group. In this paper we approximate distributions at different points in the group in a single set of exp… ▽ More

    Submitted 30 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Preprint for L-CSS