Skip to main content

Showing 1–50 of 615 results for author: Yu, Q

  1. arXiv:2407.11356  [pdf, other

    cs.CV

    The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation

    Authors: Muyang Qiu, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Despite the recent success of domain generalization in medical image segmentation, voxel-wise annotation for all source domains remains a huge burden. Semi-supervised domain generalization has been proposed very recently to combat this challenge by leveraging limited labeled data along with abundant unlabeled data collected from multiple medical institutions, depending on precisely harnessing unla… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.11282  [pdf, other

    cs.CL

    Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

    Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

    Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Qiao Yu, Li Li, Fei-Yue Wang

    Abstract: Large Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  4. arXiv:2407.10827  [pdf, other

    cs.LG cs.CL

    LLM Circuit Analyses Are Consistent Across Training and Scale

    Authors: Curt Tigges, Michael Hanna, Qinan Yu, Stella Biderman

    Abstract: Most currently deployed large language models (LLMs) undergo continuous training or additional finetuning. By contrast, most research into LLMs' internal mechanisms focuses on models at one snapshot in time (the end of pre-training), raising the question of whether their results generalize to real-world settings. Existing studies of mechanisms over time focus on encoder-only or toy models, which d… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder

    Authors: Yuchen Jiang, Ying Wu, Shiyao Zhang, James J. Q. Yu

    Abstract: The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits,… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 2023 IEEE 98th Vehicular Technology Conference

  6. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  7. arXiv:2407.05309  [pdf, other

    math.DS

    Unfolding a Hopf bifurcation in a linear reaction-diffusion equation with strongly localized impurity existence of breathing pulses

    Authors: Ji Li, Qing Yu, Qian Zhang

    Abstract: This paper presents a general framework to derive the weakly nonlinear stability near a Hopf bifurcation in a special class of multi-scale reaction-diffusion equations. The main focus is on how the linearity and nonlinearity of the fast variables in system influence the emergence of the breathing pulses when the slow variables are linear and the bifurcation parameter is around the Hopf bifurcation… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  8. arXiv:2407.04068  [pdf, other

    cs.CV

    CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting

    Authors: Qinkai Yu, Jianyang Xie, Anh Nguyen, He Zhao, Jiong Zhang, Huazhu Fu, Yitian Zhao, Yalin Zheng, Yanda Meng

    Abstract: Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for ac… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  9. arXiv:2407.03546  [pdf, other

    math.PR

    Exponential Euler method for stiff SDEs driven by fractional Brownian motion

    Authors: Haozhe Chen, Zhaotong Shen, Qian Yu

    Abstract: In a recent paper by Kamrani et al. (2024), exponential Euler method for stiff stochastic differential equations with additive fractional Brownian noise was discussed, and the convergence order close to the Hurst parameter H was proved. Utilizing the technique of Malliavin derivative, we prove the exponential Euler scheme and obtain a convergence order of one, which is the optimal rate in numerica… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.01928  [pdf, other

    cs.CV

    SymPoint Revolutionized: Boosting Panoptic Symbol Spotting with Layer Feature Enhancement

    Authors: Wenlong Liu, Tianyu Yang, Qizhi Yu, Lei Zhang

    Abstract: SymPoint is an initial attempt that utilizes point set representation to solve the panoptic symbol spotting task on CAD drawing. Despite its considerable success, it overlooks graphical layer information and suffers from prohibitively slow training convergence. To tackle this issue, we introduce SymPoint-V2, a robust and efficient solution featuring novel, streamlined designs that overcome these l… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: code at https://github.com/nicehuster/SymPointV2

  11. Small Aerial Target Detection for Airborne Infrared Detection Systems using LightGBM and Trajectory Constraints

    Authors: Xiaoliang Sun, Liangchao Guo, Wenlong Zhang, Zi Wang, Qifeng Yu

    Abstract: Factors, such as rapid relative motion, clutter background, etc., make robust small aerial target detection for airborne infrared detection systems a challenge. Existing methods are facing difficulties when dealing with such cases. We consider that a continuous and smooth trajectory is critical in boosting small infrared aerial target detection performance. A simple and effective small aerial targ… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages,10 figures

    Journal ref: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 14 9959-9973 2021

  12. arXiv:2406.19617  [pdf, ps, other

    cs.LG cs.IT math.OC

    Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

    Authors: Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

    Abstract: Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the mi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.17278  [pdf, other

    stat.ME econ.EM math.ST

    Estimation and Inference for CP Tensor Factor Models

    Authors: Bin Chen, Yuefeng Han, Qiyang Yu

    Abstract: High-dimensional tensor-valued data have recently gained attention from researchers in economics and finance. We consider the estimation and inference of high-dimensional tensor factor models, where each dimension of the tensor diverges. Our focus is on a factor model that admits CP-type tensor decomposition, which allows for non-orthogonal loading vectors. Based on the contemporary covariance mat… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.16905  [pdf

    cs.LG cs.AI

    Optimising Random Forest Machine Learning Algorithms for User VR Experience Prediction Based on Iterative Local Search-Sparrow Search Algorithm

    Authors: Xirui Tang, Feiyang Li, Zinan Cao, Qixuan Yu, Yulu Gong

    Abstract: In this paper, an improved method for VR user experience prediction is investigated by introducing a sparrow search algorithm and a random forest algorithm improved by an iterative local search-optimised sparrow search algorithm. The study firstly conducted a statistical analysis of the data, and then trained and tested using the traditional random forest model, the random forest model improved by… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.15811  [pdf, other

    cs.CV

    PointDreamer: Zero-shot 3D Textured Mesh Reconstruction from Colored Point Cloud by 2D Inpainting

    Authors: Qiao Yu, Xianzhi Li, Yuan Tang, Jinfeng Xu, Long Hu, Yixue Hao, Min Chen

    Abstract: Reconstructing textured meshes from colored point clouds is an important but challenging task in 3D graphics and vision. Most existing methods predict colors as implicit functions in 3D or UV space, suffering from blurry textures or the lack of generalization capability. Addressing this, we propose PointDreamer, a novel framework for textured mesh reconstruction from colored point cloud. It produc… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  16. arXiv:2406.09416  [pdf, other

    cs.CV

    Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

    Authors: Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

    Abstract: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Introducing DiMR, a new diffusion backbone that surpasses all existing image generation models of various sizes on ImageNet 256 with only 505M parameters. Project page: https://qihao067.github.io/projects/DiMR

  17. arXiv:2406.07550  [pdf, other

    cs.CV

    An Image is Worth 32 Tokens for Reconstruction and Generation

    Authors: Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

    Abstract: Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: A compact 1D Image Tokenization method, leading to SOTA generation performance while being substantially faster. Project page at https://yucornetto.github.io/projects/titok.html

  18. arXiv:2406.06792  [pdf, other

    cs.LG cs.AI

    Reinforced Compressive Neural Architecture Search for Versatile Adversarial Robustness

    Authors: Dingrong Wang, Hitesh Sapkota, Zhiqiang Tao, Qi Yu

    Abstract: Prior neural architecture search (NAS) for adversarial robustness works have discovered that a lightweight and adversarially robust neural network architecture could exist in a non-robust large teacher network, generally disclosed by heuristic rules through statistical analysis and neural architecture search, generally disclosed by heuristic rules from neural architecture search. However, heuristi… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 17 pages

  19. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  20. arXiv:2406.03866  [pdf, other

    cs.CV

    LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

    Authors: Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

    Abstract: Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  21. arXiv:2406.02541  [pdf, other

    cs.CV

    Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

    Authors: Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

    Abstract: Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailo… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page at https://video-3dgs-project.github.io/

  22. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  23. arXiv:2405.15519  [pdf

    physics.optics eess.IV

    Confocal structured illumination microscopy

    Authors: Weishuai Zhou, Manhong Yao, Xi Lin, Quan Yu, Junzheng Peng, Jingang Zhong

    Abstract: Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  24. arXiv:2405.11874  [pdf, other

    cs.CL

    xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

    Authors: Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen

    Abstract: The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evalu… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 37 Pages

  25. arXiv:2405.11734  [pdf, other

    cs.IT

    Finite Field Multiple Access for Sourced Massive Random Access with Finite Blocklength

    Authors: Qi-yue Yu, Shi-wen Lin, Shu Lin

    Abstract: For binary source transmission, this paper proposes an element-pair (EP) coding scheme for supporting sourced massive random access, which is used to solve the finite blocklength (FBL) of multiuser reliability transmission problem. In this paper, we first give the definition of an EP, which is used as a virtual resource. If the Cartesian product of $J$ distinct EPs satisfies the unique sum-pattern… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14086

  26. arXiv:2405.05983  [pdf

    cs.CV cs.AI cs.LG

    Real-Time Pill Identification for the Visually Impaired Using Deep Learning

    Authors: Bo Dang, Wenchao Zhao, Yufeng Li, Danqing Ma, Qixuan Yu, Elly Yijun Zhu

    Abstract: The prevalence of mobile technology offers unique opportunities for addressing healthcare challenges, especially for individuals with visual impairments. This paper explores the development and implementation of a deep learning-based mobile application designed to assist blind and visually impaired individuals in real-time pill identification. Utilizing the YOLO framework, the application aims to… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  27. arXiv:2405.04771  [pdf, other

    cs.CV

    Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

    Authors: Qing Yu, Mikihiro Tanaka, Kent Fujiwara

    Abstract: To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using Vision Trans… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024, Project website: https://yu1ut.com/MotionPatches-HP/

  28. arXiv:2405.02962  [pdf, other

    cs.CV

    VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

    Authors: Juncheng Hu, Ximing Xing, Zhengqi Zhang, Jing Zhang, Qian Yu

    Abstract: We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovativ… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  29. arXiv:2405.02615  [pdf, other

    cs.CR

    TetraBFT: Reducing Latency of Unauthenticated, Responsive BFT Consensus

    Authors: Qianyu Yu, Giuliano Losa, Xuechao Wang

    Abstract: This paper presents TetraBFT, a novel unauthenticated Byzantine fault tolerant protocol for solving consensus in partial synchrony, eliminating the need for public key cryptography and ensuring resilience against computationally unbounded adversaries. TetraBFT has several compelling features: it necessitates only constant local storage, has optimal communication complexity, satisfies optimistic re… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: The full version of the PODC 2024 paper

  30. arXiv:2405.02288  [pdf, other

    cs.CV cs.AI cs.RO

    Prospective Role of Foundation Models in Advancing Autonomous Vehicles

    Authors: Jianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

    Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 December, 2023; originally announced May 2024.

    Comments: 45 pages,8 figures

  31. arXiv:2405.01413  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen

    Abstract: Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging Large Language Models (LLMs) with images using a simple projector. Inspired by their success, large 3D point cloud-language models (3D-LLMs) also integrate point clouds into LLMs. However, directly aligning point clouds with LLM requires expensive training costs, typically in hundreds of GPU-hours on A100, whic… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 17 pages, 9 figures

  32. arXiv:2404.18999  [pdf, other

    astro-ph.GA

    CO Observations of Early-mid Stage Major-mergers in MaNGA Survey

    Authors: Qingzheng Yu, Taotao Fang, Cong Kevin Xu, Shuai Feng, Siyi Feng, Yu Gao, Xue-Jian Jiang, Ute Lisenfeld

    Abstract: We present a study of the molecular gas in early-mid stage major-mergers, with a sample of 43 major-merger galaxy pairs selected from the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey and a control sample of 195 isolated galaxies selected from the xCOLD GASS survey. Adopting kinematic asymmetry as a new effective indicator to describe the merger stage, we aim to study the role… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 25 pages, 12 figures, 5 tables, accepted for publication in ApJS

  33. arXiv:2404.16027  [pdf, other

    cs.RO

    ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

    Authors: Qinxi Yu, Masoud Moghani, Karthik Dharmarajan, Vincent Schorp, William Chung-Ho Panitch, Jingzhou Liu, Kush Hari, Huang Huang, Mayank Mittal, Ken Goldberg, Animesh Garg

    Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  34. arXiv:2404.14037  [pdf, other

    cs.CV cs.MM

    GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

    Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

    Abstract: Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method… ▽ More

    Submitted 28 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: https://yuhongyun777.github.io/GaussianTalker/

  35. arXiv:2404.09800  [pdf, ps, other

    math.PR

    Fractional derivatives of local times for some Gaussian processes

    Authors: Minhao Hong, Qian Yu

    Abstract: In this article, we consider fractional derivatives of local time for $d-$dimensional centered Gaussian processes satisfying certain strong local nondeterminism property. We first give a condition for existence of fractional derivatives of the local time defined by Marchaud derivatives in $L^p(p\ge1)$ and show that these derivatives are Hölder continuous with respect to both time and space variabl… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  36. arXiv:2404.08951  [pdf, other

    cs.CV cs.LG

    Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation

    Authors: Qinghe Ma, Jian Zhang, Lei Qi, Qian Yu, Yinghuan Shi, Yang Gao

    Abstract: Both limited annotation and domain shift are prevalent challenges in medical image segmentation. Traditional semi-supervised segmentation and unsupervised domain adaptation methods address one of these issues separately. However, the coexistence of limited annotation and domain shift is quite common, which motivates us to introduce a novel and challenging scenario: Mixed Domain Semi-supervised med… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  37. arXiv:2404.08639  [pdf, other

    cs.CV

    COCONut: Modernizing COCO Segmentation

    Authors: Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

    Abstract: In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coar… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR2024, data available at https://xdeng7.github.io/coconut.github.io/

  38. arXiv:2404.07445  [pdf, other

    cs.CV

    Multi-view Aggregation Network for Dichotomous Image Segmentation

    Authors: Qian Yu, Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu

    Abstract: Dichotomous Image Segmentation (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images. When designing an effective DIS model, the main challenge is how to balance the semantic dispersion of high-resolution targets in the small receptive field and the loss of high-precision details in the large receptive field. Existing methods rely on tedious mu… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 as Highlight

  39. arXiv:2404.07234  [pdf, other

    cs.CR cs.AI cs.CL

    Goal-guided Generative Prompt Injection Attack on Large Language Models

    Authors: Chong Zhang, Mingyu Jin, Qinkai Yu, Chengzhi Liu, Haochen Xue, Xiaobo Jin

    Abstract: Current large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface, thus causing LLMs model security challenges. Although there is currently a large amount of research on prompt injection attacks, most of these black-box attacks use heuristic str… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures

  40. arXiv:2404.07066  [pdf, other

    cs.CL cs.AI cs.LG

    Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

    Authors: Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

    Abstract: Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are ty… ▽ More

    Submitted 30 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages

  41. arXiv:2404.03819  [pdf, other

    cs.CV

    Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer

    Authors: Qinji Yu, Yirui Wang, Ke Yan, Haoshen Li, Dazhou Guo, Li Zhang, Le Lu, Na Shen, Qifeng Wang, Xiaowei Ding, Xianghua Ye, Dakai Jin

    Abstract: Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Technical report

  42. arXiv:2404.02132  [pdf, other

    cs.CV

    ViTamin: Designing Scalable Vision Models in the Vision-Language Era

    Authors: Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

    Abstract: Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large-scale Internet image-text pairs. However, despite the amazing achievement from the VLMs, vanilla Vision Transformers (ViTs) remain the default choice… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024; https://github.com/Beckschen/ViTamin

  43. arXiv:2404.00603  [pdf, other

    cs.CV

    Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

    Authors: Kun Ding, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, Chunhong Pan

    Abstract: We propose a generalized method for boosting the generalization ability of pre-trained vision-language models (VLMs) while fine-tuning on downstream few-shot tasks. The idea is realized by exploiting out-of-distribution (OOD) detection to predict whether a sample belongs to a base distribution or a novel distribution and then using the score generated by a dedicated competition based scoring funct… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by AAAI2024

  44. arXiv:2403.20331  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

    Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa

    Abstract: This paper introduces a novel and significant challenge for Vision Language Models (VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to withhold answers when faced with unsolvable problems in the context of Visual Question Answering (VQA) tasks. UPD encompasses three distinct settings: Absent Answer Detection (AAD), Incompatible Answer Set Detection (IASD), and Inco… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/AtsuMiyai/UPD

  45. arXiv:2403.19212  [pdf, ps, other

    astro-ph.GA

    Close Major-merger Pairs at $z=0$: Star-forming Galaxies with Pseudobulges

    Authors: Chuan He, Cong Kevin Xu, Ute Lisenfeld, Y Sophia Dai, Taotao Fang, Jia-Sheng Huang, Wei Wang, Qingzheng Yu

    Abstract: We present a study of star-forming galaxies (SFGs) with pseudobulges (bulges with Sérsic index $\rm n < 2$) in a local close major-merger galaxy pair sample (H-KPAIR). With data from new aperture photometries in the optical and near-infrared bands (aperture size of 7\;kpc) and from the literature, we find that the mean Age of central stellar populations in Spirals with pseudobulges is consistent w… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in RAA, ?? pages, 10 figures, 4 tables

  46. arXiv:2403.17782  [pdf, other

    cs.CV cs.GR

    GenesisTex: Adapting Image Denoising Diffusion to Texture Space

    Authors: Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu

    Abstract: We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, 10 figures

  47. arXiv:2403.16023  [pdf, other

    cs.RO cs.AI cs.CV

    RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

    Authors: Junbo Wang, Wenhai Liu, Qiaojun Yu, Yang You, Liu Liu, Weiming Wang, Cewu Lu

    Abstract: Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024), project website at https://r-pmart.github.io

  48. arXiv:2403.13365  [pdf, other

    cs.RO cs.CV

    ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

    Authors: Qiaojun Yu, Ce Hao, Junbo Wang, Wenhai Liu, Liu Liu, Yao Mu, Yang You, Hengxu Yan, Cewu Lu

    Abstract: Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  49. arXiv:2403.11777  [pdf

    cond-mat.mtrl-sci

    Ultralarge polarization in ferroelectric hafnia-based thin films

    Authors: Han Wu, Kun Lin, Qinghua Zhang, Qian Yu, Xiaoqian Fu, Qiang Li, Meera Cheviri, Oswaldo Dieguez, Shuai Xu, Lin Gu, Yili Cao, Jiaou Wang, Zhen Wang, Yu Chen, Huanhua Wang, Jinxia Deng, Jun Miao, Xianran Xing

    Abstract: Hafnia-based ferroelectrics have become a valuable class of electronic functional materials at the nanoscale, showing great potential for next-generation memory and logic devices. However, more robust ferroelectric properties and better understanding of the polarization mechanisms are currently needed both in technology and science. Herein, we report the properties of oxygen-deficient Hf0.5Zr0.5O2… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  50. arXiv:2403.11631  [pdf, other

    cs.CV

    Compositional Kronecker Context Optimization for Vision-Language Models

    Authors: Kun Ding, Xiaohui Li, Qiang Yu, Ying Wang, Haojian Zhang, Shiming Xiang

    Abstract: Context Optimization (CoOp) has emerged as a simple yet effective technique for adapting CLIP-like vision-language models to downstream image recognition tasks. Nevertheless, learning compact context with satisfactory base-to-new, domain and cross-task generalization ability while adapting to new tasks is still a challenge. To tackle such a challenge, we propose a lightweight yet generalizable app… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.