Skip to main content

Showing 1–50 of 319 results for author: Yu, Q

  1. arXiv:2407.01928  [pdf, other

    cs.CV

    SymPoint Revolutionized: Boosting Panoptic Symbol Spotting with Layer Feature Enhancement

    Authors: Wenlong Liu, Tianyu Yang, Qizhi Yu, Lei Zhang

    Abstract: SymPoint is an initial attempt that utilizes point set representation to solve the panoptic symbol spotting task on CAD drawing. Despite its considerable success, it overlooks graphical layer information and suffers from prohibitively slow training convergence. To tackle this issue, we introduce SymPoint-V2, a robust and efficient solution featuring novel, streamlined designs that overcome these l… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: code at https://github.com/nicehuster/SymPointV2

  2. Small Aerial Target Detection for Airborne Infrared Detection Systems using LightGBM and Trajectory Constraints

    Authors: Xiaoliang Sun, Liangchao Guo, Wenlong Zhang, Zi Wang, Qifeng Yu

    Abstract: Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial targ… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages,10 figures

    Journal ref: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 14 9959-9973 2021

  3. arXiv:2406.19617  [pdf, ps, other

    cs.LG cs.IT math.OC

    Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

    Authors: Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

    Abstract: Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the mi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.16905  [pdf

    cs.LG cs.AI

    Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

    Authors: Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong

    Abstract: In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2406.15811  [pdf, other

    cs.CV

    PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting

    Authors: Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen

    Abstract: Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2406.09416  [pdf, other

    cs.CV

    Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

    Authors: Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

    Abstract: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Introducing DiMR, a new diffusion backbone that surpasses all existing image generation models of various sizes on ImageNet 256 with only 505M parameters. Project page: https://qihao067.github.io/projects/DiMR

  7. arXiv:2406.07550  [pdf, other

    cs.CV

    An Image is Worth 32 Tokens for Reconstruction and Generation

    Authors: Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

    Abstract: Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: A compact 1D Image Tokenization method, leading to SOTA generation performance while being substantially faster. Project page at https://yucornetto.github.io/projects/titok.html

  8. arXiv:2406.06792  [pdf, other

    cs.LG cs.AI

    Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

    Authors: Dingrong Wang, Hitesh Sapkota, Zhiqiang Tao, Qi Yu

    Abstract: Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristi… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 17 pages

  9. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  10. arXiv:2406.03866  [pdf, other

    cs.CV

    LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

    Authors: Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

    Abstract: Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  11. arXiv:2406.02541  [pdf, other

    cs.CV

    Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

    Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

    Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page at https://video-3dgs-project.github.io/

  12. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  13. arXiv:2405.11874  [pdf, other

    cs.CL

    xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

    Authors: Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen

    Abstract: The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evalu… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 37 Pages

  14. arXiv:2405.11734  [pdf, other

    cs.IT

    Finite Field Multiple Access for Sourced Massive Random Access with Finite Blocklength

    Authors: Qi-yue Yu, Shi-wen Lin, Shu Lin

    Abstract: For binary source transmission, this paper proposes an element-pair (EP) coding scheme for supporting sourced massive random access, which is used to solve the finite blocklength (FBL) of multiuser reliability transmission problem. In this paper, we first give the definition of an EP, which is used as a virtual resource. If the Cartesian product of $J$ distinct EPs satisfies the unique sum-pattern… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14086

  15. arXiv:2405.05983  [pdf

    cs.CV cs.AI cs.LG

    Real-Time Pill Identification for the Visually Impaired Using Deep Learning

    Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu

    Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  16. arXiv:2405.04771  [pdf, other

    cs.CV

    Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

    Authors: Qing Yu, Mikihiro Tanaka, Kent Fujiwara

    Abstract: To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using Vision Trans… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024, Project website: https://yu1ut.com/MotionPatches-HP/

  17. arXiv:2405.02962  [pdf, other

    cs.CV

    VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

    Authors: Juncheng Hu, Ximing Xing, Zhengqi Zhang, Jing Zhang, Qian Yu

    Abstract: We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovativ… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  18. arXiv:2405.02615  [pdf, other

    cs.CR

    TetraBFT: Reducing Latency of Unauthenticated, Responsive BFT Consensus

    Authors: Qianyu Yu, Giuliano Losa, Xuechao Wang

    Abstract: This paper presents TetraBFT, a novel unauthenticated Byzantine fault tolerant protocol for solving consensus in partial synchrony, eliminating the need for public key cryptography and ensuring resilience against computationally unbounded adversaries. TetraBFT has several compelling features: it necessitates only constant local storage, has optimal communication complexity, satisfies optimistic re… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: The full version of the PODC 2024 paper

  19. arXiv:2405.02288  [pdf, other

    cs.CV cs.AI cs.RO

    Prospective Role of Foundation Models in Advancing Autonomous Vehicles

    Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

    Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 December, 2023; originally announced May 2024.

    Comments: 45 pages,8 figures

  20. arXiv:2405.01413  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

    Abstract: Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, whic… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 17 pages, 9 figures

  21. arXiv:2404.16027  [pdf, other

    cs.RO

    ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

    Authors: Qinxi Yu, Masoud Moghani, Karthik Dharmarajan, Vincent Schorp, William Chung-Ho Panitch, Jingzhou Liu, Kush Hari, Huang Huang, Mayank Mittal, Ken Goldberg, Animesh Garg

    Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.14037  [pdf, other

    cs.CV cs.MM

    GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

    Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

    Abstract: Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method… ▽ More

    Submitted 28 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: https://yuhongyun777.github.io/GaussianTalker/

  23. arXiv:2404.08951  [pdf, other

    cs.CV cs.LG

    Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

    Authors: Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised med… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  24. arXiv:2404.08639  [pdf, other

    cs.CV

    COCONut: Modernizing COCO Segmentation

    Authors: Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

    Abstract: In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coar… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR2024, data available at https://xdeng7.github.io/coconut.github.io/

  25. arXiv:2404.07445  [pdf, other

    cs.CV

    Multi-view Aggregation Network for Dichotomous Image Segmentation

    Authors: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

    Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious mu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 as Highlight

  26. arXiv:2404.07234  [pdf, other

    cs.CR cs.AI cs.CL

    Goal-guided Generative Prompt Injection Attack on Large Language Models

    Authors: Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin

    Abstract: Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic str… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures

  27. arXiv:2404.07066  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

    Authors: Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

    Abstract: Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are ty… ▽ More

    Submitted 30 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages

  28. arXiv:2404.03819  [pdf, other

    cs.CV

    Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

    Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

    Abstract: Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Technical report

  29. arXiv:2404.02132  [pdf, other

    cs.CV

    ViTamin: Designing Scalable Vision Models in the Vision-Language Era

    Authors: Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

    Abstract: Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large-scale Internet image-text pairs. However, despite the amazing achievement from the VLMs, vanilla Vision Transformers (ViTs) remain the default choice… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024; https://github.com/Beckschen/ViTamin

  30. arXiv:2404.00603  [pdf, other

    cs.CV

    Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

    Authors: Kun Ding, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, Chunhong Pan

    Abstract: We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring funct… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by AAAI2024

  31. arXiv:2403.20331  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

    Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

    Abstract: This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to withhold answers when faced with unsolvable problems in the context of Visual Question Answering (VQA) tasks. UPD encompasses three distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Inco… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/AtsuMiyai/UPD

  32. arXiv:2403.17782  [pdf, other

    cs.CV cs.GR

    GenesisTex: Adapting Image Denoising Diffusion to Texture Space

    Authors: Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu

    Abstract: We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, 10 figures

  33. arXiv:2403.16023  [pdf, other

    cs.RO cs.AI cs.CV

    RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

    Authors: Junbo Wang, Wenhai Liu, Qiaojun Yu, Yang You, Liu Liu, Weiming Wang, Cewu Lu

    Abstract: Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), project website at https://r-pmart.github.io

  34. arXiv:2403.13365  [pdf, other

    cs.RO cs.CV

    ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

    Authors: Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

    Abstract: Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  35. arXiv:2403.11631  [pdf, other

    cs.CV

    Compositional Kronecker Context Optimization for Vision-Language Models

    Authors: Kun Ding, Xiaohui Li, Qiang Yu, Ying Wang, Haojian Zhang, Shiming Xiang

    Abstract: Context Optimization (CoOp) has emerged as a simple yet effective technique for adapting CLIP-like vision-language models to downstream image recognition tasks. Nevertheless, learning compact context with satisfactory base-to-new, domain and cross-task generalization ability while adapting to new tasks is still a challenge. To tackle such a challenge, we propose a lightweight yet generalizable app… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  36. arXiv:2403.11229  [pdf, other

    cs.CV

    Concatenate, Fine-tuning, Re-training: A SAM-enabled Framework for Semi-supervised 3D Medical Image Segmentation

    Authors: Shumeng Li, Lei Qi, Qian Yu, Jing Huo, Yinghuan Shi, Yang Gao

    Abstract: Segment Anything Model (SAM) fine-tuning has shown remarkable performance in medical image segmentation in a fully supervised manner, but requires precise annotations. To reduce the annotation cost and maintain satisfactory performance, in this work, we leverage the capabilities of SAM for establishing semi-supervised medical image segmentation models. Rethinking the requirements of effectiveness,… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  37. arXiv:2403.05059  [pdf, other

    cs.SE

    Bug Priority Change: An Empirical Study on Apache Projects

    Authors: Zengyang Li, Guangzong Cai, Qinyi Yu, Peng Liang, Ran Mo, Hui Liu

    Abstract: In issue tracking systems, each bug is assigned a priority level (e.g., Blocker, Critical, Major, Minor, or Trivial in JIRA from highest to lowest), which indicates the urgency level of the bug. In this sense, understanding bug priority changes helps to arrange the work schedule of participants reasonably, and facilitates a better analysis and resolution of bugs. According to the data extracted fr… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Preprint accepted for publication in Journal of Systems and Software, 2024

  38. arXiv:2402.19473  [pdf, other

    cs.CV

    Retrieval-Augmented Generation for AI-Generated Content: A Survey

    Authors: Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui

    Abstract: Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-Augmented Generation… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Citing 353 papers, 22 pages, 1 table, 12 figures. Project: https://github.com/PKU-DAIR/RAG-Survey

  39. arXiv:2402.18411  [pdf, other

    cs.CV

    Unsupervised Cross-Domain Image Retrieval via Prototypical Optimal Transport

    Authors: Bin Li, Ye Shi, Qian Yu, Jingya Wang

    Abstract: Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tas… ▽ More

    Submitted 24 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI2024

  40. arXiv:2402.16117  [pdf, other

    cs.RO cs.AI cs.CV

    RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

    Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

    Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  41. arXiv:2402.14623  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

    Authors: Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

    Abstract: Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 10 pages of main paper, 4 pages of appendix; 10 figures in main paper, 3 figures in appendix

    ACM Class: I.2.7; I.2.8; I.2.9; I.2.10

  42. arXiv:2402.10835  [pdf, other

    cs.CL

    Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities

    Authors: Mingyu Jin, Hua Tang, Chong Zhang, Qinkai Yu, Chengzhi Liu, Suiyuan Zhu, Yongfeng Zhang, Mengnan Du

    Abstract: Large language models (LLMs) have been applied in many fields with rapid development in recent years. As a classic machine learning task, time series forecasting has recently received a boost from LLMs. However, there is a research gap in the LLMs' preferences in this field. In this paper, by comparing LLMs with traditional models, many properties of LLMs in time series prediction are found. For e… ▽ More

    Submitted 18 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  43. arXiv:2402.04252  [pdf, other

    cs.CV

    EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

    Authors: Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang

    Abstract: Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models. We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters. With only 6-billion training samples seen, EVA-CLIP-18B achieves an exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classificati… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  44. arXiv:2402.00746  [pdf, other

    cs.CL

    Health-LLM: Personalized Retrieval-Augmented Disease Prediction System

    Authors: Mingyu Jin, Qinkai Yu, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang

    Abstract: Recent advancements in artificial intelligence (AI), especially large language models (LLMs), have significantly advanced healthcare applications and demonstrated potentials in intelligent medical treatment. However, there are conspicuous challenges such as vast data volumes and inconsistent symptom characterization standards, preventing full integration of healthcare AI systems with individual pa… ▽ More

    Submitted 19 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  45. arXiv:2401.15657  [pdf, other

    cs.CV

    Data-Free Generalized Zero-Shot Learning

    Authors: Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

    Abstract: Deep learning models have the ability to extract rich knowledge from large-scale datasets. However, the sharing of data has become increasingly challenging due to concerns regarding data copyright and privacy. Consequently, this hampers the effective transfer of knowledge from existing data to novel downstream tasks and concepts. Zero-shot learning (ZSL) approaches aim to recognize new classes by… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI24

  46. arXiv:2401.11447  [pdf, other

    cs.LG q-bio.QM

    Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis

    Authors: Yin Li, Yu Xiong, Wenxin Fan, Kai Wang, Qingqing Yu, Liping Si, Patrick van der Smagt, Jun Tang, Nutan Chen

    Abstract: Objective: Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR). How to enhance the adherence of patients to maximize the benefit of allergen immunotherapy (AIT) plays a crucial role in the management of AIT. This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients and related local symptom s… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: Frontiers in Pharmacology, research topic: Methods and Metrics to Measure Medication Adherence

  47. arXiv:2401.10556  [pdf, other

    cs.CV cs.GR

    Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

    Authors: Wenlong Liu, Tianyu Yang, Yuhan Wang, Qizhi Yu, Lei Zhang

    Abstract: This work studies the problem of panoptic symbol spotting, which is to spot and parse both countable object instances (windows, doors, tables, etc.) and uncountable stuff (wall, railing, etc.) from computer-aided design (CAD) drawings. Existing methods typically involve either rasterizing the vector graphics into images and using image-based methods for symbol spotting, or directly building graphs… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  48. arXiv:2401.04925  [pdf, other

    cs.CL cs.AI

    The Impact of Reasoning Step Length on Large Language Models

    Authors: Mingyu Jin, Qinkai Yu, Dong Shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du

    Abstract: Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the ra… ▽ More

    Submitted 22 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Findings of ACL 2024

  49. arXiv:2401.03385  [pdf, other

    cs.CL

    Grimoire is All You Need for Enhancing Large Language Models

    Authors: Ding Chen, Shichao Song, Qingchen Yu, Zhiyu Li, Wenjin Wang, Feiyu Xiong, Bo Tang

    Abstract: In-context Learning (ICL) is one of the key methods for enhancing the performance of large language models on specific tasks by providing a set of few-shot examples. However, the ICL capability of different types of models shows significant variation due to factors such as model architecture, volume of learning data, and the size of parameters. Generally, the larger the model's parameter size and… ▽ More

    Submitted 10 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

    Comments: 9 pages

  50. arXiv:2401.01673  [pdf, other

    cs.IT eess.SP

    Coded Beam Training

    Authors: Tianyue Zheng, Jieao Zhu, Qiumo Yu, Yongli Yan, Linglong Dai

    Abstract: In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi… ▽ More

    Submitted 6 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: In this paper, we introduce channel coding theory into hierarchical beam training and propose a beam training scheme called coded beam training. By leveraging the error-correcting capability of channel codes, the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR, while keeping training overhead low