Skip to main content

Showing 1–50 of 296 results for author: Feng, W

  1. arXiv:2406.19016  [pdf, other

    cs.RO

    Robust Multi-Robot Global Localization with Unknown Initial Pose based on Neighbor Constraints

    Authors: Yaojie Zhang, Haowen Luo, Weijun Wang, Wei Feng

    Abstract: Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. Howeve… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 7 pages (6+1), accepted by ICRA 2024

  2. arXiv:2406.18079  [pdf, other

    cs.CV eess.IV

    MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal

    Authors: Yiguo Jiang, Xuhang Chen, Chi-Man Pun, Shuqiang Wang, Wei Feng

    Abstract: When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by The Visual Computer journal

  3. arXiv:2406.08656  [pdf, other

    cs.CV cs.AI cs.CL

    TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

    Authors: Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

    Abstract: Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world v… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2406.02974  [pdf

    cs.CL

    Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese

    Authors: Jingshen Zhang, Xinglu Chen, Xinying Qiu, Zhimin Wang, Wenhe Feng

    Abstract: Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Se… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to the 23rd China National Conference on Computational Linguistics (CCL 2024)

  6. arXiv:2406.01112  [pdf, other

    cs.CV

    BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

    Authors: Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

    Abstract: Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  7. arXiv:2405.20787  [pdf, other

    cs.CL

    PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

    Authors: Yang Zhou, Shimin Shan, Hongkui Wei, Zhehuan Zhao, Wenshuo Feng

    Abstract: Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-sampl… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  8. arXiv:2405.18750  [pdf, other

    cs.CV

    T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

    Authors: Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

    Abstract: Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achiev… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://t2v-turbo.github.io/

  9. arXiv:2405.17457  [pdf, other

    cs.CV cs.DC cs.LG

    Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Jianwei Yin, See-Kiong Ng

    Abstract: Federated Class Incremental Learning (FCIL) is a critical yet largely underexplored issue that deals with the dynamic incorporation of new classes within federated learning (FL). Existing methods often employ generative adversarial networks (GANs) to produce synthetic images to address privacy concerns in FL. However, GANs exhibit inherent instability and high sensitivity, compromising the effecti… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  10. arXiv:2405.14602  [pdf, other

    cs.LG

    Controllable Continual Test-Time Adaptation

    Authors: Ziqi Shi, Fan Lyu, Ye Liu, Fanhua Shang, Fuyuan Hu, Wei Feng, Zhang Zhang, Liang Wang

    Abstract: Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr… ▽ More

    Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.12652  [pdf, other

    cs.NI eess.SP

    Edge Information Hub-Empowered 6G NTN: Latency-Oriented Resource Orchestration and Configuration

    Authors: Yueshan Lin, Wei Feng, Yunfei Chen, Ning Ge, Zhiyong Feng, Yue Gao

    Abstract: Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance.… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  12. arXiv:2405.11135  [pdf, other

    cs.CR

    AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

    Authors: Weitao Feng, Wenbo Zhou, Jiyan He, Jie Zhang, Tianyi Wei, Guanlin Li, Tianwei Zhang, Weiming Zhang, Nenghai Yu

    Abstract: Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution a… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/Georgefwt/AquaLoRA

  13. arXiv:2405.09133  [pdf, other

    cs.LG

    Overcoming Domain Drift in Online Continual Learning

    Authors: Fan Lyu, Daofeng Liu, Linglan Zhao, Zhang Zhang, Fanhua Shang, Fuyuan Hu, Wei Feng, Liang Wang

    Abstract: Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential lea… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  14. arXiv:2405.05144  [pdf, other

    cs.CY cs.LG

    Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank

    Authors: Alexander Scarlatos, Wanyong Feng, Digory Smith, Simon Woodhead, Andrew Lan

    Abstract: Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a nove… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

    Comments: BEA workshop NAACL 2024

  15. arXiv:2404.13903  [pdf, other

    cs.CV

    Accelerating Image Generation with Sub-path Linear Approximation Model

    Authors: Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi… ▽ More

    Submitted 22 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  16. arXiv:2404.12400  [pdf, other

    cs.LG

    Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

    Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

    Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.12130  [pdf, other

    cs.LG cs.CV cs.DC

    One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng

    Abstract: Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL se… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  18. arXiv:2404.11352  [pdf, other

    cs.DC

    Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route

    Authors: Zonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato

    Abstract: Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of ba… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 20 figures

    MSC Class: 68T99 ACM Class: I.2.11; C.2.4

  19. arXiv:2404.11111  [pdf, other

    cs.CV

    CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation

    Authors: Lianyu Hu, Wei Feng, Liqing Gao, Zekang Liu, Liang Wan

    Abstract: In sign language, the conveyance of human body trajectories predominantly relies upon the coordinated movements of hands and facial expressions across successive frames. Despite the recent advancements of sign language understanding methods, they often solely focus on individual frames, inevitably overlooking the inter-frame correlations that are essential for effectively modeling human body traje… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.03202

  20. arXiv:2404.09245  [pdf, other

    cs.MM cs.CV

    Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

    Authors: Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Qihua Zhou, Yuanqing Xia

    Abstract: The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers h… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  21. arXiv:2404.08567  [pdf, other

    cs.CL cs.AI

    CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

    Authors: Ruqi Liao, Chuqing Zhao, Jin Li, Weiqi Feng

    Abstract: In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations,… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2404.08226  [pdf, other

    cs.CV

    Improving Continuous Sign Language Recognition with Adapted Image Models

    Authors: Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  23. arXiv:2404.08021  [pdf, other

    cs.LG cs.AI cs.RO

    VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning

    Authors: Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao

    Abstract: Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  24. arXiv:2404.06247  [pdf, other

    cs.CV

    LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks

    Authors: Jianlang Chen, Xuhong Ren, Qing Guo, Felix Juefei-Xu, Di Lin, Wei Feng, Lei Ma, Jianjun Zhao

    Abstract: Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when the… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  25. arXiv:2404.02124  [pdf, other

    cs.CL

    Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

    Authors: Wanyong Feng, Jaewook Lee, Hunter McNichols, Alexander Scarlatos, Digory Smith, Simon Woodhead, Nancy Otero Ornelas, Andrew Lan

    Abstract: Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractor… ▽ More

    Submitted 18 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 findings

  26. arXiv:2403.18423  [pdf, other

    cs.CL cs.LG

    SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks

    Authors: Brian Formento, Wenjie Feng, Chuan Sheng Foo, Luu Anh Tuan, See-Kiong Ng

    Abstract: Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial T… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Published in NAACL 2024 (Main Track)

  27. arXiv:2403.12519  [pdf, other

    cs.CV

    Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition

    Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph con… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  28. arXiv:2403.11027  [pdf, other

    cs.CV cs.AI

    Reward Guided Latent Consistency Distillation

    Authors: Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

    Abstract: Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In t… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Project page: https://rg-lcd.github.io/

  29. arXiv:2403.07630  [pdf, other

    cs.CV cs.AI

    Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

    Authors: Feilong Tang, Zhongxing Xu, Zhaojun Qu, Wei Feng, Xingjian Jiang, Zongyuan Ge

    Abstract: Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  30. arXiv:2403.03432  [pdf, other

    cs.CL cs.AI

    Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

    Authors: Wenfeng Feng, Chuzhan Hao, Yuewei Zhang, Yu Han, Hao Wang

    Abstract: Instruction Tuning has the potential to stimulate or enhance specific capabilities of large language models (LLMs). However, achieving the right balance of data is crucial to prevent catastrophic forgetting and interference between tasks. To address these limitations and enhance training flexibility, we propose the Mixture-of-LoRAs (MoA) architecture which is a novel and parameter-efficient tuning… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 10 pages, COLING24 Accepted

  31. arXiv:2402.07359  [pdf, other

    cs.IT

    Structured Satellite-UAV-Terrestrial Networks for 6G Internet of Things

    Authors: Wei Feng, Yanmin Wang, Yunfei Chen, Ning Ge, Cheng-Xiang Wang

    Abstract: The upcoming sixth generation (6G) wireless communication network is envisioned to cover space, air, and maritime areas, in addition to urban-centered terrestrial coverage by the fifth generation (5G) network, to support intelligent Internet of Things (IoT). Towards this end, we investigate structured integration of satellites, unmanned aerial vehicles (UAVs), and terrestrial networks, aiming to s… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  32. arXiv:2402.07140  [pdf, other

    cs.AI

    Graph Descriptive Order Improves Reasoning with Large Language Model

    Authors: Yuyao Ge, Shenghua Liu, Wenjie Feng, Lingrui Mei, Lizhe Chen, Xueqi Cheng

    Abstract: In recent years, large language models have achieved state-of-the-art performance across multiple domains. However, the progress in the field of graph reasoning with LLM remains limited. Our work delves into this gap by thoroughly investigating graph reasoning with LLMs. In this work, we reveal the impact of the order of graph description on LLMs' graph reasoning performance, which significantly a… ▽ More

    Submitted 24 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  33. arXiv:2402.02108  [pdf, other

    cs.CV

    From Synthetic to Real: Unveiling the Power of Synthetic Data for Video Person Re-ID

    Authors: Xiangqun Zhang, Ruize Han, Wei Feng

    Abstract: In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  34. arXiv:2401.17617  [pdf, other

    cs.CV cs.AI

    Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking

    Authors: Wei Feng, Feifan Wang, Ruize Han, Zekun Qian, Song Wang

    Abstract: Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the video… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  35. arXiv:2401.02118  [pdf, other

    cs.IT eess.SP

    Radio Map-Based Spectrum Sharing for Joint Communication and Sensing

    Authors: Xionran Fang, Wei Feng, Yunfei Chen, Dingxi Yang, Ning Ge, Zhiyong Feng, Yue Gao

    Abstract: The sixth-generation (6G) network is expected to provide both communication and sensing (C&S) services. However, spectrum scarcity poses a major challenge to the harmonious coexistence of C&S systems. Without effective cooperation, the interference resulting from spectrum sharing impairs the performance of both systems. This paper addresses C&S interference within a distributed network. Different… ▽ More

    Submitted 27 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  36. arXiv:2401.01054  [pdf, other

    cs.LG cs.AI

    Elastic Multi-Gradient Descent for Parallel Continual Learning

    Authors: Fan Lyu, Wei Feng, Yuepan Li, Qing Sun, Fanhua Shang, Liang Wan, Liang Wang

    Abstract: The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Submited to IEEE TPAMI

  37. arXiv:2401.00268  [pdf, other

    cs.CV

    COMMA: Co-Articulated Multi-Modal Learning

    Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Chi-Man Pun, Wei Feng

    Abstract: Pretrained large-scale vision-language models such as CLIP have demonstrated excellent generalizability over a series of downstream tasks. However, they are sensitive to the variation of input text prompts and need a selection of prompt templates to achieve satisfactory performance. Recently, various methods have been proposed to dynamically learn the prompts as the textual inputs to avoid the req… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted to AAAI2024. Code is available at https://github.com/hulianyuyy/COMMA

  38. arXiv:2312.17431  [pdf, other

    cs.CR cs.CV

    MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World

    Authors: Zheng Zhou, Hongbo Zhao, Ju Liu, Qiaosheng Zhang, Liwei Geng, Shuchang Lyu, Wenquan Feng

    Abstract: Recent investigations demonstrate that adversarial patches can be utilized to manipulate the result of object detection models. However, the conspicuous patterns on these patches may draw more attention and raise suspicions among humans. Moreover, existing works have primarily focused on enhancing the efficacy of attacks in the physical domain, rather than seeking to optimize their stealth attribu… ▽ More

    Submitted 11 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 14 pages, 8 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  39. arXiv:2312.14206  [pdf, other

    cs.CV

    LLM4VG: Large Language Models Evaluation for Video Grounding

    Authors: Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Zihan Song, Yuwei Zhou, Wenwu Zhu

    Abstract: Recently, researchers have attempted to investigate the capability of LLMs in handling videos and proposed several video LLM models. However, the ability of LLMs to handle video grounding (VG), which is an important time-related video task requiring the model to precisely locate the start and end timestamps of temporal moments in videos that match the given textual queries, still remains unclear a… ▽ More

    Submitted 28 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

  40. arXiv:2312.13309  [pdf, other

    cs.CV cs.AI

    Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style

    Authors: Haohan Wang, Wei Feng, Yang Lu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Lixing Bo, Jingping Shao

    Abstract: The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diff… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 12 pages, 11 figures

  41. arXiv:2312.08822  [pdf, other

    cs.CV

    Planning and Rendering: Towards End-to-End Product Poster Generation

    Authors: Zhaochen Li, Fengheng Li, Wei Feng, Honghe Zhu, An Liu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Jingping Shao, Zhenglu Yang

    Abstract: End-to-end product poster generation significantly optimizes design efficiency and reduces production costs. Prevailing methods predominantly rely on image-inpainting methods to generate clean background images for given products. Subsequently, poster layout generation methods are employed to produce corresponding layout results. However, the background images may not be suitable for accommodating… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  42. arXiv:2312.06632  [pdf, other

    cs.AI

    Control Risk for Potential Misuse of Artificial Intelligence in Science

    Authors: Jiyan He, Weitao Feng, Yaosen Min, Jingwei Yi, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie, Weiming Zhang, Nenghai Yu, Shuxin Zheng

    Abstract: The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in scien… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  43. Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks

    Authors: Ling Luo, Jinzhong Ning, Yingwen Zhao, Zhijun Wang, Zeyuan Ding, Peng Chen, Weiru Fu, Qinyu Han, Guangtao Xu, Yunzhi Qiu, Dinghao Pan, Jiru Li, Hao Li, Wenduo Feng, Senbo Tu, Yuqi Liu, Zhihao Yang, Jian Wang, Yuanyuan Sun, Hongfei Lin

    Abstract: Objective: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curat… ▽ More

    Submitted 19 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

    Journal ref: Journal of the American Medical Informatics Association, 2024, ocae037

  44. arXiv:2311.04247  [pdf, other

    cs.LG cs.AI

    Analysis and Applications of Deep Learning with Finite Samples in Full Life-Cycle Intelligence of Nuclear Power Generation

    Authors: Chenwei Tang, Wenqiang Zhou, Dong Wang, Caiyang Yu, Zhenan He, Jizhe Zhou, Shudong Huang, Yi Gao, Jianming Chen, Wentao Feng, Jiancheng Lv

    Abstract: The advent of Industry 4.0 has precipitated the incorporation of Artificial Intelligence (AI) methods within industrial contexts, aiming to realize intelligent manufacturing, operation as well as maintenance, also known as industrial intelligence. However, intricate industrial milieus, particularly those relating to energy exploration and production, frequently encompass data characterized by long… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  45. arXiv:2311.02703  [pdf, other

    cs.IR

    Toward Trustworthy Identity Tracing via Multi-attribute Synergistic Identification

    Authors: Decheng Liu, Jiahao Yu, Ruimin Hu, Wenbin Feng

    Abstract: Identity tracing is a technology that uses the selection and collection of identity attributes of the object to be tested to discover its true identity, and it is one of the most important foundational issues in the field of social security prevention. However, traditional identity recognition technologies based on single attributes have difficulty achieving ultimate recognition accuracy, where de… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  46. arXiv:2310.20490  [pdf, other

    cs.CV cs.LG

    Long-Tailed Learning as Multi-Objective Optimization

    Authors: Weiqi Li, Fan Lyu, Fanhua Shang, Liang Wan, Wei Feng

    Abstract: Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes. Recent methods propose to rebalance classes but they undertake the seesaw dilemma (what is increasing performance on tail classes may decrease that of head classes, and vice versa). In this paper, we argue t… ▽ More

    Submitted 1 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: In submission

  47. arXiv:2310.17801  [pdf

    cs.CV

    Image Prior and Posterior Conditional Probability Representation for Efficient Damage Assessment

    Authors: Jie Wei, Weicong Feng, Erik Blasch, Erika Ardiles-Cruz, Haibin Ling

    Abstract: It is important to quantify Damage Assessment (DA) for Human Assistance and Disaster Response (HADR) applications. In this paper, to achieve efficient and scalable DA in HADR, an image prior and posterior conditional probability (IP2CP) is developed as an effective computational imaging representation. Equipped with the IP2CP representation, the matching pre- and post-disaster images are effective… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 6 pages, 2 figures

    MSC Class: I.4.6; I.5.3

  48. arXiv:2310.14374  [pdf, other

    cs.CV

    OV-VG: A Benchmark for Open-Vocabulary Visual Grounding

    Authors: Chunlei Wang, Wenquan Feng, Xiangtai Li, Guangliang Cheng, Shuchang Lyu, Binghao Liu, Lijiang Chen, Qi Zhao

    Abstract: Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light of the widespread adoption of vision-based foundational models. Its primary objective is to comprehend novel concepts that are not encompassed within a predefined vocabulary. One key facet of this endeavor is Visual Grounding, which entails locating a specific region within an image based on a corresponding… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  49. arXiv:2310.13347  [pdf, other

    cs.CV cs.AI

    NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

    Authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge

    Abstract: The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hinder… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023 Datasets and Benchmarks Track

  50. arXiv:2310.00938  [pdf, ps, other

    cs.DS

    An FPRAS for two terminal reliability in directed acyclic graphs

    Authors: Weiming Feng, Heng Guo

    Abstract: We give a fully polynomial-time randomized approximation scheme (FPRAS) for two terminal reliability in directed acyclic graphs (DAGs). In contrast, we also show the complementing problem of approximating two terminal unreliability in DAGs is #BIS-hard.

    Submitted 14 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 30 pages. v3: improved presentation