Skip to main content

Showing 1–50 of 213 results for author: Dong, W

  1. arXiv:2407.03217  [pdf, other

    cs.CV

    MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI

    Authors: Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang

    Abstract: Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification.… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages

  2. arXiv:2407.02918  [pdf, other

    cs.CV eess.IV

    Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

    Authors: Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu

    Abstract: Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3D… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  3. arXiv:2406.18406  [pdf, other

    cs.CL cs.AI

    IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

    Authors: Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong

    Abstract: It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Ide… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, 5 tables

  4. arXiv:2406.11219  [pdf, other

    cs.RO eess.SY

    A Swift and Omnidirectional Formation Approach based on Hierarchical Reorganization

    Authors: Yuzhu Li, Wei Dong

    Abstract: Current formations commonly rely on invariant hierarchical structures, such as predetermined leaders or enumerated formation shapes. These structures could be unidirectional and sluggish, constraining their adaptability and agility when encountering cluttered environments. To surmount these constraints, this work proposes an omnidirectional affine formation approach with hierarchical reorganizatio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.07949  [pdf, other

    cs.CV

    Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

    Authors: Jie Feng, Xiaojian Zhong, Di Li, Weisheng Dong, Ronghua Shang, Licheng Jiao

    Abstract: Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teach… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.07741  [pdf, other

    cs.CV

    Back to the Color: Learning Depth to Specific Color Transformation for Unsupervised Depth Estimation

    Authors: Yufan Zhu, Chongzhi Ran, Mingtao Feng, Fangfang Wu, Le Dong, Weisheng Dong, Antonio M. López, Guangming Shi

    Abstract: Virtual engines can generate dense depth maps for various synthetic scenes, making them invaluable for training depth estimation models. However, discrepancies between synthetic and real-world colors pose significant challenges for depth estimation in real-world scenes, especially in complex and uncertain environments encountered in unsupervised monocular depth estimation tasks. To address this is… ▽ More

    Submitted 3 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2406.02559  [pdf, other

    cs.CV

    ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer

    Authors: Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, Jun Chen

    Abstract: Shadow-affected images often exhibit pronounced spatial discrepancies in color and illumination, consequently degrading various vision applications including object detection and segmentation systems. To effectively eliminate shadows in real-world images while preserving intricate details and producing visually compelling outcomes, we introduce a mask-free Shadow Removal and Refinement network (Sh… ▽ More

    Submitted 2 July, 2024; v1 submitted 17 April, 2024; originally announced June 2024.

    Comments: Accepted by CVPR workshop 2024 (NTIRE 2024); Corrected references

  8. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  9. arXiv:2405.15438  [pdf, other

    cs.CV cs.LG eess.IV

    Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

    Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

    Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  10. arXiv:2405.13578  [pdf, other

    cs.CL

    ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation

    Authors: Weilong Dong, Xinwei Wu, Renren Jin, Shaoyang Xu, Deyi Xiong

    Abstract: Ensuring large language models (LLM) behave consistently with human goals, values, and intentions is crucial for their safety but yet computationally expensive. To reduce the computational cost of alignment training of LLMs, especially for those with a huge number of parameters, and to reutilize learned value alignment, we propose ConTrans, a novel framework that enables weak-to-strong alignment t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  11. arXiv:2405.09308  [pdf, other

    cs.LG cs.AI

    TimeX++: Learning Time-Series Explanations with Information Bottleneck

    Authors: Zichuan Liu, Tianchun Wang, Jimeng Shi, Xu Zheng, Zhuomin Chen, Lei Song, Wenqian Dong, Jayantha Obeysekera, Farhad Shirani, Dongsheng Luo

    Abstract: Explaining deep learning models operating on time series data is crucial in various applications of interest which require interpretable and transparent insights from time series signals. In this work, we investigate this problem from an information theoretic perspective and show that most existing measures of explainability may suffer from trivial solutions and distributional shift issues. To add… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  12. arXiv:2405.09001  [pdf, other

    cs.RO

    BEVRender: Vision-based Cross-view Vehicle Registration in Off-road GNSS-denied Environment

    Authors: Lihong Jin, Wei Dong, Michael Kaess

    Abstract: We introduce BEVRender, a novel learning-based approach for the localization of ground vehicles in Global Navigation Satellite System (GNSS)-denied off-road scenarios. These environments are typically challenging for conventional vision-based state estimation due to the lack of distinct visual landmarks and the instability of vehicle poses. To address this, BEVRender generates high-quality local b… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

    ACM Class: I.2.9

  13. arXiv:2405.08054  [pdf, other

    cs.GR cs.CV

    Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

    Authors: Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, Zhaopeng Cui

    Abstract: As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tas… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Project webpage: https://zju3dv.github.io/coin3d

  14. arXiv:2404.19527  [pdf, other

    cs.CV

    Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

    Authors: Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang

    Abstract: In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via i… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  15. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  16. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  17. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  18. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  19. arXiv:2404.10399  [pdf, other

    cs.RO

    FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models

    Authors: Chao Tang, Dehao Huang, Wenlong Dong, Ruinian Xu, Hong Zhang

    Abstract: Task-oriented grasping (TOG), which refers to the problem of synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the complex relationship between objects, ta… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  20. arXiv:2404.09146  [pdf, other

    cs.CV cs.AI

    Fusion-Mamba for Cross-modality Object Detection

    Authors: Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang

    Abstract: Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different types of images or merge different backbone features through elaborated neural network modules. However, these methods neglect that modality disparities affect cr… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  21. arXiv:2403.19456  [pdf, other

    cs.CV cs.GR cs.MM

    Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization

    Authors: Yu Xu, Fan Tang, Juan Cao, Yuxin Zhang, Oliver Deussen, Weiming Dong, Jintao Li, Tong-Yee Lee

    Abstract: Personalized generation paradigms empower designers to customize visual intellectual properties with the help of textual descriptions by tuning or adapting pre-trained text-to-image models on a few images. Recent works explore approaches for concurrently customizing both content and detailed visual style appearance. However, these existing approaches often generate images where the content and sty… ▽ More

    Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  22. arXiv:2403.19067  [pdf, other

    cs.CV

    Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach

    Authors: Wei Dong, Xing Zhang, Bihui Chen, Dawei Yan, Zhijun Lin, Qingsen Yan, Peng Wang, Yang Yang

    Abstract: Parameter-efficient fine-tuning for pre-trained Vision Transformers aims to adeptly tailor a model to downstream tasks by learning a minimal set of new adaptation parameters while preserving the frozen majority of pre-trained parameters. Striking a balance between retaining the generalizable representation capacity of the pre-trained model and acquiring task-specific features poses a key challenge… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  23. arXiv:2403.14133  [pdf, other

    cs.CV

    3D Object Detection from Point Cloud via Voting Step Diffusion

    Authors: Haoran Hou, Mingtao Feng, Zijie Wu, Weisheng Dong, Qing Zhu, Yaonan Wang, Ajmal Mian

    Abstract: 3D object detection is a fundamental task in scene understanding. Numerous research efforts have been dedicated to better incorporate Hough voting into the 3D object detection pipeline. However, due to the noisy, cluttered, and partial nature of real 3D scans, existing voting-based methods often receive votes from the partial surfaces of individual objects together with severe noises, leading to s… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  24. arXiv:2403.14121  [pdf, other

    cs.CV

    External Knowledge Enhanced 3D Scene Generation from Sketch

    Authors: Zijie Wu, Mingtao Feng, Yaonan Wang, He Xie, Weisheng Dong, Bo Miao, Ajmal Mian

    Abstract: Generating realistic 3D scenes is challenging due to the complexity of room layouts and object geometries.We propose a sketch based knowledge enhanced diffusion architecture (SEK) for generating customized, diverse, and plausible 3D scenes. SEK conditions the denoising process with a hand-drawn sketch of the target scene and cues from an object relationship knowledge base. We first construct an ex… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  25. arXiv:2403.13492  [pdf, ps, other

    cs.CR cs.DB

    Secure Query Processing with Linear Complexity

    Authors: Qiyao Luo, Yilei Wang, Wei Dong, Ke Yi

    Abstract: We present LINQ, the first join protocol with linear complexity (in both running time and communication) under the secure multi-party computation model (MPC). It can also be extended to support all free-connex queries, a large class of select-join-aggregate queries, still with linear complexity. This matches the plaintext result for the query processing problem, as free-connex queries are the larg… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  26. arXiv:2403.13238  [pdf, other

    cs.CV

    Beyond Skeletons: Integrative Latent Mapping for Coherent 4D Sequence Generation

    Authors: Qitong Yang, Mingtao Feng, Zijie Wu, Shijie Sun, Weisheng Dong, Yaonan Wang, Ajmal Mian

    Abstract: Directly learning to model 4D content, including shape, color and motion, is challenging. Existing methods depend on skeleton-based motion control and offer limited continuity in detail. To address this, we propose a novel framework that generates coherent 4D sequences with animation of 3D shapes under given conditions with dynamic evolution of shape and color over time through integrative latent… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  27. arXiv:2403.12538  [pdf, other

    cs.RO

    Multi-View Active Sensing for Human-Robot Interaction via Hierarchically Connected Tree

    Authors: Yuanjiong Ying, Xian Huang, Wei Dong

    Abstract: Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  28. arXiv:2403.11689  [pdf, other

    eess.IV cs.CV

    MoreStyle: Relax Low-frequency Constraint of Fourier-based Image Reconstruction in Generalizable Medical Image Segmentation

    Authors: Haoyu Zhao, Wenhui Dong, Rui Yu, Zhou Zhao, Du Bo, Yongchao Xu

    Abstract: The task of single-source domain generalization (SDG) in medical image segmentation is crucial due to frequent domain shifts in clinical image datasets. To address the challenge of poor generalization across different domains, we introduce a Plug-and-Play module for data augmentation called MoreStyle. MoreStyle diversifies image styles by relaxing low-frequency constraints in Fourier space, guidin… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: MICCAI2024

  29. arXiv:2403.10116  [pdf, other

    cs.CR cs.DS

    Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy

    Authors: Wei Dong, Qiyao Luo, Giulia Fanti, Elaine Shi, Ke Yi

    Abstract: Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integ… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  30. arXiv:2403.05761  [pdf, other

    cs.RO

    CEASE: Collision-Evaluation-based Active Sense System for Collaborative Robotic Arms

    Authors: Xian Huang, Yuanjiong Ying, Wei Dong

    Abstract: Collision detection via visual fences can significantly enhance the safety of collaborative robotic arms. Existing work typically performs such detection based on pre-deployed stationary cameras outside the robotic arm's workspace. These stationary cameras can only provide a restricted detection range and constrain the mobility of the robotic system. To cope with this issue, we propose an active s… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  31. arXiv:2403.01381  [pdf, other

    cs.CV

    SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images

    Authors: Jie Feng, Hao Huang, Junpeng Zhang, Weisheng Dong, Dingwen Zhang, Licheng Jiao

    Abstract: Mainstreamed weakly supervised road extractors rely on highly confident pseudo-labels propagated from scribbles, and their performance often degrades gradually as the image scenes tend various. We argue that such degradation is due to the poor model's invariance to scenes with different complexities, whereas existing solutions to this problem are commonly based on crafted priors that cannot be der… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  32. arXiv:2402.18120  [pdf, other

    cs.CL

    Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

    Authors: Shaoyang Xu, Weilong Dong, Zishan Guo, Xinwei Wu, Deyi Xiong

    Abstract: Prior research in representation engineering has revealed that LLMs encode concepts within their representation spaces, predominantly centered around English. In this study, we extend this philosophy to a multilingual scenario, delving into multilingual human value concepts in LLMs. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  33. arXiv:2402.17504  [pdf, other

    cs.RO

    Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association

    Authors: Zhaoying Wang, Wei Dong

    Abstract: Leveraging multiple cameras on Unmanned Aerial Vehicles (UAVs) to form a variable-baseline stereo camera for collaborative perception is highly promising. The critical steps include high-rate cross-camera feature association and frame-rate relative pose estimation of multiple UAVs. To accelerate the feature association rate to match the frame rate, we propose a dual-channel structure to decouple t… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  34. arXiv:2402.13763  [pdf, other

    cs.SD eess.AS

    Music Style Transfer with Time-Varying Inversion of Diffusion Models

    Authors: Sifei Li, Yuxin Zhang, Fan Tang, Chongyang Ma, Weiming dong, Changsheng Xu

    Abstract: With the development of diffusion models, text-guided image style transfer has demonstrated high-quality controllable synthesis results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even withi… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures, AAAI 2024

  35. arXiv:2402.10436  [pdf, other

    cs.CL

    I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models

    Authors: Wenchao Dong, Assem Zhunis, Hyojin Chin, Jiyoung Han, Meeyoung Cha

    Abstract: We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation toward… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  36. arXiv:2402.09270  [pdf, other

    cs.CV

    Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

    Authors: Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi

    Abstract: Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  37. arXiv:2402.05809  [pdf, other

    cs.CV cs.AI

    You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement

    Authors: Qingsen Yan, Yixu Feng, Cheng Zhang, Pei Wang, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang

    Abstract: Low-Light Image Enhancement (LLIE) task tends to restore the details and visual information from corrupted low-light images. Most existing methods learn the mapping function between low/normal-light images by Deep Neural Networks (DNNs) on sRGB and HSV color space. Nevertheless, enhancement involves amplifying image signals, and applying these color spaces to low-light images with a low signal-to-… ▽ More

    Submitted 17 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Qingsen Yan, Yixu Feng, Cheng Zhang contributed equally to this work. Corresponding author: Yanning Zhang

  38. arXiv:2402.05039  [pdf, other

    cs.LG

    PAC Learnability under Explanation-Preserving Graph Perturbations

    Authors: Xu Zheng, Farhad Shirani, Tianchun Wang, Shouwei Gao, Wenqian Dong, Wei Cheng, Dongsheng Luo

    Abstract: Graphical models capture relations between entities in a wide range of applications including social networks, biology, and natural language processing, among others. Graph neural networks (GNN) are neural models that operate over graphs, enabling the model to leverage the complex relationships and dependencies in graph-structured data. A graph explanation is a subgraph which is an `almost suffici… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 21 pages, 6 figures, 4 tables

  39. arXiv:2401.17800  [pdf, other

    cs.SD cs.MM eess.AS

    Dance-to-Music Generation with Encoder-based Textual Inversion of Diffusion Models

    Authors: Sifei Li, Weiming Dong, Yuxin Zhang, Fan Tang, Chongyang Ma, Oliver Deussen, Tong-Yee Lee, Changsheng Xu

    Abstract: The harmonious integration of music with dance movements is pivotal in vividly conveying the artistic essence of dance. This alignment also significantly elevates the immersive quality of gaming experiences and animation productions. While there has been remarkable advancement in creating high-fidelity music from textual descriptions, current methodologies mainly concentrate on modulating overarch… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  40. arXiv:2401.14066  [pdf, other

    cs.CV cs.AI

    CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion

    Authors: Nisha Huang, Weiming Dong, Yuxin Zhang, Fan Tang, Ronghui Li, Chongyang Ma, Xiu Li, Changsheng Xu

    Abstract: Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. Firstly, users struggle to craft textual prompts that meticulously detail visual elements of the input image. Secondly, prevalent models, when effecting mo… ▽ More

    Submitted 30 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  41. arXiv:2401.12888  [pdf, other

    cs.RO cs.CV

    Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies

    Authors: Lincan Li, Wei Shao, Wei Dong, Yijun Tian, Qiming Zhang, Kaixiang Yang, Wenjie Zhang

    Abstract: The aspiration of the next generation's autonomous driving (AD) technology relies on the dedicated integration and interaction among intelligent perception, prediction, planning, and low-level control. There has been a huge bottleneck regarding the upper bound of autonomous driving algorithm performance, a consensus from academia and industry believes that the key to surmount the bottleneck lies i… ▽ More

    Submitted 26 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  42. arXiv:2401.02606  [pdf, other

    cs.CV

    Exploiting Polarized Material Cues for Robust Car Detection

    Authors: Wen Dong, Haiyang Mei, Ziqi Wei, Ao Jin, Sen Qiu, Qiang Zhang, Xin Yang

    Abstract: Car detection is an important task that serves as a crucial prerequisite for many automated driving functions. The large variations in lighting/weather conditions and vehicle densities of the scenes pose significant challenges to existing car detection algorithms to meet the highly accurate perception demand for safety, due to the unstable/limited color information, which impedes the extraction of… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  43. arXiv:2312.17606  [pdf, other

    cs.RO cs.AI cs.LG

    Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios

    Authors: Xinyuan Wu, Wentao Dong, Hang Lai, Yong Yu, Ying Wen

    Abstract: Quadruped robots have strong adaptability to extreme environments but may also experience faults. Once these faults occur, robots must be repaired before returning to the task, reducing their practical feasibility. One prevalent concern among these faults is actuator degradation, stemming from factors like device aging or unexpected operational events. Traditionally, addressing this problem has re… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 13 pages, 14 figures, in proceeding of DAI'23

  44. arXiv:2312.15702  [pdf, other

    cs.CV

    Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning

    Authors: Chengcheng Ma, Ismail Elezi, Jiankang Deng, Weiming Dong, Changsheng Xu

    Abstract: We address the challenging problem of Long-Tailed Semi-Supervised Learning (LTSSL) where labeled data exhibit imbalanced class distribution and unlabeled data follow an unknown distribution. Unlike in balanced SSL, the generated pseudo-labels are skewed towards head classes, intensifying the training bias. Such a phenomenon is even amplified as more unlabeled data will be mislabeled as head classe… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  45. arXiv:2312.05288  [pdf, other

    cs.CV

    MotionCrafter: One-Shot Motion Customization of Diffusion Models

    Authors: Yuxin Zhang, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Weiming Dong, Changsheng Xu

    Abstract: The essence of a video lies in its dynamic motions, including character actions, object movements, and camera movements. While text-to-video generative diffusion models have recently advanced in creating diverse contents, controlling specific motions through text prompts remains a significant challenge. A primary issue is the coupling of appearance and motion, often leading to overfitting on appea… ▽ More

    Submitted 2 January, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  46. arXiv:2312.02456  [pdf, other

    cs.CR

    Watermarking for Neural Radiation Fields by Invertible Neural Network

    Authors: Wenquan Sun, Jia Liu, Weina Dong, Lifeng Chen, Ke Niu

    Abstract: To protect the copyright of the 3D scene represented by the neural radiation field, the embedding and extraction of the neural radiation field watermark are considered as a pair of inverse problems of image transformations. A scheme for protecting the copyright of the neural radiation field is proposed using invertible neural network watermarking, which utilizes watermarking techniques for 2D imag… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  47. arXiv:2311.16491  [pdf, other

    cs.CV

    $Z^*$: Zero-shot Style Transfer via Attention Rearrangement

    Authors: Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong

    Abstract: Despite the remarkable progress in image style transfer, formulating style in the context of art is inherently subjective and challenging. In contrast to existing learning/tuning methods, this study shows that vanilla diffusion models can directly extract style information and seamlessly integrate the generative prior into the content image without retraining. Specifically, we adopt dual denoising… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  48. arXiv:2311.13263  [pdf, other

    cs.CV

    CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning

    Authors: Yaqi Liu, Chao Xia, Song Xiao, Qingxiao Guan, Wenqian Dong, Yifan Zhang, Nenghai Yu

    Abstract: Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image, and deep learning based copy-move forgery detection methods are in the ascendant. These deep learning based methods heavily rely on synthetic training data, and the performance will degrade when facing new tasks. In this paper, we propose a Transformer-style copy-move forgery detection network named as CM… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: 12pages,7 figures

  49. arXiv:2311.11777  [pdf

    cs.CV cs.LG eess.IV

    Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data

    Authors: Man Chen, Wenquan Dong, Hao Yu, Iain Woodhouse, Casey M. Ryan, Haoyu Liu, Selena Georgiou, Edward T. A. Mitchard

    Abstract: The integration of multisource remote sensing data and deep learning models offers new possibilities for accurately mapping high spatial resolution forest height. We found that GEDI relative heights (RH) metrics exhibited strong correlation with the mean of the top 10 highest trees (dominant height) measured in situ at the corresponding footprint locations. Consequently, we proposed a novel deep l… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  50. arXiv:2311.03067  [pdf, other

    cs.CV cs.LG eess.IV

    Forest aboveground biomass estimation using GEDI and earth observation data through attention-based deep learning

    Authors: Wenquan Dong, Edward T. A. Mitchard, Hao Yu, Steven Hancock, Casey M. Ryan

    Abstract: Accurate quantification of forest aboveground biomass (AGB) is critical for understanding carbon accounting in the context of climate change. In this study, we presented a novel attention-based deep learning approach for forest AGB estimation, primarily utilizing openly accessible EO data, including: GEDI LiDAR data, C-band Sentinel-1 SAR data, ALOS-2 PALSAR-2 data, and Sentinel-2 multispectral da… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.