Skip to main content

Showing 1–50 of 98 results for author: Hua, G

  1. arXiv:2406.00429  [pdf, other

    cs.CV

    Towards Generalizable Multi-Object Tracking

    Authors: Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang

    Abstract: Multi-Object Tracking MOT encompasses various tracking scenarios, each characterized by unique traits. Effective trackers should demonstrate a high degree of generalizability across diverse scenarios. However, existing trackers struggle to accommodate all aspects or necessitate hypothesis and experimentation to customize the association information motion and or appearance for a given scenario, le… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: CVPR2024

  2. arXiv:2406.00143  [pdf, other

    cs.CV

    Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding

    Authors: Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing Wang, Gang Hua

    Abstract: Temporal sentence grounding is a challenging task that aims to localize the moment spans relevant to a language description. Although recent DETR-based models have achieved notable progress by leveraging multiple learnable moment queries, they suffer from overlapped and redundant proposals, leading to inaccurate predictions. We attribute this limitation to the lack of task-related guidance for the… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2405.20015  [pdf, other

    cs.AI cs.CL

    Efficient LLM-Jailbreaking by Introducing Visual Modality

    Authors: Zhenxing Niu, Yuyao Sun, Haodong Ren, Haoxuan Ji, Quan Wang, Xiaoke Ma, Gang Hua, Rong Jin

    Abstract: This paper focuses on jailbreaking attacks against large language models (LLMs), eliciting them to generate objectionable content in response to harmful user queries. Unlike previous LLM-jailbreaks that directly orient to LLMs, our approach begins by constructing a multimodal large language model (MLLM) through the incorporation of a visual module into the target LLM. Subsequently, we conduct an e… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.17929  [pdf, other

    cs.CV

    Towards Unified Robustness Against Both Backdoor and Adversarial Attacks

    Authors: Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong Jin, Gang Hua

    Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  5. arXiv:2405.09863  [pdf, other

    cs.CV cs.AI

    Box-Free Model Watermarks Are Prone to Black-Box Removal Attacks

    Authors: Haonan An, Guang Hua, Zhiping Lin, Yuguang Fang

    Abstract: Box-free model watermarking is an emerging technique to safeguard the intellectual property of deep learning models, particularly those for low-level image processing tasks. Existing works have verified and improved its effectiveness in several aspects. However, in this paper, we reveal that box-free model watermarking is prone to removal attacks, even under the real-world threat model such that t… ▽ More

    Submitted 21 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  6. arXiv:2404.05285  [pdf, other

    cs.CV

    Detecting Every Object from Events

    Authors: Haitian Zhang, Chang Xu, Xinya Wang, Bingde Liu, Guang Hua, Lei Yu, Wen Yang

    Abstract: Object detection is critical in autonomous driving, and it is more practical yet challenging to localize objects of unknown categories: an endeavour known as Class-Agnostic Object Detection (CAOD). Existing studies on CAOD predominantly rely on ordinary cameras, but these frame-based sensors usually have high latency and limited dynamic range, leading to safety risks in real-world scenarios. In th… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  7. arXiv:2404.00513  [pdf, other

    cs.CV

    Transformer based Pluralistic Image Completion with Reduced Information Loss

    Authors: Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

    Abstract: Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize $256^3$ RGB values to a small number (such as 512) of quantized color valu… ▽ More

    Submitted 14 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by TPAMI (2024). arXiv admin note: text overlap with arXiv:2205.05076

  8. arXiv:2403.12042  [pdf, other

    cs.CV

    Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

    Authors: Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua

    Abstract: In this paper, we explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding. Our hypothesis is validated through the… ▽ More

    Submitted 6 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Appear at ECCV 2024, and the code is available at https://github.com/buxiangzhiren/VD-IT

  9. arXiv:2403.11189  [pdf, other

    cs.CV

    Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes

    Authors: Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

    Abstract: The crux of semi-supervised temporal action localization (SS-TAL) lies in excavating valuable information from abundant unlabeled videos. However, current approaches predominantly focus on building models that are robust to the error-prone target class (i.e, the predicted class with the highest confidence) while ignoring informative semantics within non-target classes. This paper approaches SS-TAL… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  10. arXiv:2403.05810  [pdf, other

    cs.CV cs.AI

    Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

    Authors: Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua, Changyin Sun

    Abstract: Pedestrian trajectory prediction is a crucial component in computer vision and robotics, but remains challenging due to the domain shift problem. Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model. However, such domain adaptation methods are impractical in real-world scenarios, as it is infeasible to collect t… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  11. arXiv:2402.17207  [pdf, other

    cs.CV

    Deployment Prior Injection for Run-time Calibratable Object Detection

    Authors: Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua

    Abstract: With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  12. arXiv:2402.02309  [pdf, other

    cs.LG cs.CL cs.CR cs.CV

    Jailbreaking Attack against Multimodal Large Language Model

    Authors: Zhenxing Niu, Haodong Ren, Xinbo Gao, Gang Hua, Rong Jin

    Abstract: This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), seeking to elicit MLLMs to generate objectionable responses to harmful user queries. A maximum likelihood-based algorithm is proposed to find an \emph{image Jailbreaking Prompt} (imgJP), enabling jailbreaks against MLLMs across multiple unseen prompts and images (i.e., data-universal property). Our approa… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  13. arXiv:2312.16256  [pdf, other

    cs.CV cs.AI

    DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

    Authors: Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

    Abstract: We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not… ▽ More

    Submitted 29 December, 2023; v1 submitted 25 December, 2023; originally announced December 2023.

  14. arXiv:2311.16917  [pdf, other

    cs.CV cs.RO

    UGG: Unified Generative Grasping

    Authors: Jiaxin Lu, Hao Kang, Haoxiang Li, Bo Liu, Yiding Yang, Qixing Huang, Gang Hua

    Abstract: Dexterous grasping aims to produce diverse grasping postures with a high grasping success rate. Regression-based methods that directly predict grasping parameters given the object may achieve a high success rate but often lack diversity. Generation-based methods that generate grasping postures conditioned on the object can often produce diverse grasping, but they are insufficient for high grasping… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 17 pages, 14 figures

  15. arXiv:2311.15512  [pdf, other

    cs.CV

    Sparse Pedestrian Character Learning for Trajectory Prediction

    Authors: Yonghao Dong, Le Wang, Sanpin Zhou, Gang Hua, Changyin Sun

    Abstract: Pedestrian trajectory prediction in a first-person view has recently attracted much attention due to its importance in autonomous driving. Recent work utilizes pedestrian character information, \textit{i.e.}, action and appearance, to improve the learned trajectory embedding and achieves state-of-the-art performance. However, it neglects the invalid and negative pedestrian character information, w… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  16. arXiv:2311.13793  [pdf, other

    cs.CV cs.RO

    Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception

    Authors: Lei Fan, Mingfu Liang, Yunxuan Li, Gang Hua, Ying Wu

    Abstract: Active recognition enables robots to intelligently explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions. Recent approaches favor learning policies from simulated or collected data, wherein appropriate actions are more frequently selected when the recognition is accurate. However, most recognition modules are developed under the closed-worl… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  17. arXiv:2310.10651  [pdf, other

    cs.CV cs.GR

    HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

    Authors: Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

    Abstract: Hair editing has made tremendous progress in recent years. Early hair editing methods use well-drawn sketches or masks to specify the editing conditions. Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images. Thanks to the recent breakthrough of cross-m… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: ICCV 2023, code is available at https://github.com/wty-ustc/HairCLIPv2

  18. arXiv:2309.14282  [pdf, other

    cs.CV

    Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation

    Authors: Muxin Liao, Shishun Tian, Yuhang Zhang, Guoguang Hua, Wenbin Zou, Xia Li

    Abstract: Prototypical contrastive learning (PCL) has been widely used to learn class-wise domain-invariant features recently. These methods are based on the assumption that the prototypes, which are represented as the central value of the same class in a certain domain, are domain-invariant. Since the prototypes of different domains have discrepancies as well, the class-wise domain-invariant features learn… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ACM MM'23

  19. arXiv:2309.07403  [pdf, other

    cs.CV

    Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance

    Authors: Lei Fan, Bo Liu, Haoxiang Li, Ying Wu, Gang Hua

    Abstract: In real-world scenarios, typical visual recognition systems could fail under two major causes, i.e., the misclassification between known classes and the excusable misbehavior on unknown-class images. To tackle these deficiencies, flexible visual recognition should dynamically predict multiple classes when they are unconfident between choices and reject making predictions when the input is entirely… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV23

  20. arXiv:2309.01265  [pdf, other

    cs.CV

    SOAR: Scene-debiasing Open-set Action Recognition

    Authors: Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua

    Abstract: Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have different scene distributions from the training samples. To mitigate this problem, we propose a novel method, called Scene-debiasing Open-set Action Reco… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, code:https://github.com/yhZhai/SOAR

  21. arXiv:2308.10315  [pdf, other

    cs.CV

    Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

    Authors: Qidong Huang, Xiaoyi Dong, Dongdong Chen, Yinpeng Chen, Lu Yuan, Gang Hua, Weiming Zhang, Nenghai Yu

    Abstract: In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g., BEiT, MAE). A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. This observation drives us to rethink the basic differences between these BERT pretraining methods and how these differences affect the robu… ▽ More

    Submitted 22 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023

  22. arXiv:2306.05390  [pdf, other

    cs.CV

    HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

    Authors: Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

    Abstract: This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50,000 high-quality images with rich texture details and semantic diversity. We analyze existing image restoration datasets from five different perspectives, including data scale, resolution, compression rates, texture details, and semantic coverage. However, we find that all of these datasets are defi… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Dataset and code will be available at https://github.com/littleYaang/HQ-50K

  23. arXiv:2306.04632  [pdf, other

    cs.CV cs.GR

    Designing a Better Asymmetric VQGAN for StableDiffusion

    Authors: Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

    Abstract: StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real ima… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: code is available at https://github.com/buxiangzhiren/Asymmetric_VQGAN

  24. arXiv:2304.10177  [pdf, other

    cs.LG cs.CV

    Regularizing Second-Order Influences for Continual Learning

    Authors: Zhicheng Sun, Yadong Mu, Gang Hua

    Abstract: Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge. Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data, for which a delicate sample selection strategy is required. However, existing selection schemes typically seek only to maximize the utility of the ongoing selection, overl… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  25. arXiv:2303.10404  [pdf, other

    cs.CV

    MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

    Authors: Zheng Qin, Sanping Zhou, Le Wang, Jinghai Duan, Gang Hua, Wei Tang

    Abstract: The main challenge of Multi-Object Tracking~(MOT) lies in maintaining a continuous trajectory for each target. Existing methods often learn reliable motion patterns to match the same target between adjacent frames and discriminative appearance features to re-identify the lost targets after a long period. However, the reliability of motion prediction and the discriminability of appearances can be e… ▽ More

    Submitted 16 April, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023!

  26. arXiv:2303.08138  [pdf, other

    cs.CV

    Diversity-Aware Meta Visual Prompting

    Authors: Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu

    Abstract: We present Diversity-Aware Meta Visual Prompting~(DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. A challenging issue in visual prompting is that image datasets sometimes have a large data diversity whereas a per-dataset generic prompt can hardly handle the complex distribution shift toward the original pretraining… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: CVPR2023, code is available at https://github.com/shikiw/DAM-VP

  27. arXiv:2303.05072  [pdf, other

    cs.CV cs.AI cs.LG

    Identification of Systematic Errors of Image Classifiers on Rare Subgroups

    Authors: Jan Hendrik Metzen, Robin Hutmacher, N. Grace Hua, Valentyn Boreiko, Dan Zhang

    Abstract: Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups wit… ▽ More

    Submitted 12 April, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  28. arXiv:2212.04575  [pdf, other

    cs.CV

    DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization

    Authors: Xiangyu Xu, Li Guan, Enrique Dunn, Haoxiang Li, Gang Hua

    Abstract: In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization. Prior art has tackled each of these components individually, purportedly aiming to alleviate difficulties in effectively train a holistic network. We design a self-supervised image warping correspondence loss for b… ▽ More

    Submitted 1 February, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

  29. arXiv:2211.16726  [pdf, other

    cs.LG cs.CV

    Boosted Dynamic Neural Networks

    Authors: Haichao Yu, Haoxiang Li, Gang Hua, Gao Huang, Humphrey Shi

    Abstract: Early-exiting dynamic neural networks (EDNN), as one type of dynamic neural networks, has been widely studied recently. A typical EDNN has multiple prediction heads at different layers of the network backbone. During inference, the model will exit at either the last prediction head or an intermediate prediction head where the prediction confidence is higher than a predefined threshold. To optimize… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 10 pages, 6 figures

  30. arXiv:2211.11694  [pdf, other

    cs.CV

    Exploring Discrete Diffusion Models for Image Captioning

    Authors: Zixin Zhu, Yixuan Wei, Jianfeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

    Abstract: The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image generation, where the output is continuous and redundant with a fixed length, texts in image captions are categorical and short with varied lengths. Therefore, nai… ▽ More

    Submitted 9 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

  31. arXiv:2209.07788  [pdf, other

    cs.CV

    PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

    Authors: Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Kui Zhang, Gang Hua, Nenghai Yu

    Abstract: Notwithstanding the prominent performance achieved in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations. In this paper, we delve into boosting the general robustness of point cloud recognition models and propose Point-Cloud Contrastive Adversarial Training (PointCAT). The main intuition of PointCAT is encouraging the tar… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

  32. arXiv:2209.05980  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation

    Authors: Maksym Yatsura, Kaspar Sakmann, N. Grace Hua, Matthias Hein, Jan Hendrik Metzen

    Abstract: Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the… ▽ More

    Submitted 21 February, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: accepted at ICLR 2023

  33. Deep Fidelity in DNN Watermarking: A Study of Backdoor Watermarking for Classification Models

    Authors: Guang Hua, Andrew Beng Jin Teoh

    Abstract: Backdoor watermarking is a promising paradigm to protect the copyright of deep neural network (DNN) models. In the existing works on this subject, researchers have intensively focused on watermarking robustness, while the concept of fidelity, which is concerned with the preservation of the model's original functionality, has received less attention. In this paper, focusing on deep image classifica… ▽ More

    Submitted 31 October, 2023; v1 submitted 31 July, 2022; originally announced August 2022.

    Comments: Published in Pattern Recognition

    Journal ref: Pattern Recognition, Vol. 144, Dec. 2023

  34. arXiv:2205.13296  [pdf, other

    cs.CV

    Social Interpretable Tree for Pedestrian Trajectory Prediction

    Authors: Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Fang Zheng, Nanning Zheng, Gang Hua

    Abstract: Understanding the multiple socially-acceptable future behaviors is an essential task for many vision applications. In this paper, we propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task, where a hand-crafted tree is built depending on the prior information of observed trajectory to model multiple future trajectories. Specifically, a pa… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Comments: Accepted by AAAI2022

  35. arXiv:2204.04416  [pdf, other

    cs.CV

    E^2TAD: An Energy-Efficient Tracking-based Action Detector

    Authors: Xin Hu, Zhenyu Wu, Hao-Yu Miao, Siqi Fan, Taiyu Long, Zhenyu Hu, Pengcheng Pi, Yi Wu, Zhou Ren, Zhangyang Wang, Gang Hua

    Abstract: Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays. It has high practical impacts for many applications across robotics, security, healthcare, etc. The two-stage paradigm of Faster R-CNN inspires a standard paradigm of video action detection in object detection, i.e., firstly generating person proposa… ▽ More

    Submitted 29 October, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

  36. arXiv:2202.06312  [pdf, other

    cs.CV

    Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks

    Authors: Bingxu Mu, Zhenxing Niu, Le Wang, Xue Wang, Rong Jin, Gang Hua

    Abstract: Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, in this paper we find an intriguing connection between them: for a model planted with backdo… ▽ More

    Submitted 26 December, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

  37. arXiv:2201.00785  [pdf, other

    cs.CV

    Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning

    Authors: Siming Yan, Zhenpei Yang, Haoxiang Li, Chen Song, Li Guan, Hao Kang, Gang Hua, Qixing Huang

    Abstract: This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface. This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge o… ▽ More

    Submitted 27 August, 2023; v1 submitted 3 January, 2022; originally announced January 2022.

    Comments: Published in ICCV 2023. The code is available at https://github.com/SimingYan/IAE

  38. arXiv:2111.13675  [pdf, other

    cs.CV

    Weakly-guided Self-supervised Pretraining for Temporal Activity Detection

    Authors: Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua

    Abstract: Temporal Activity Detection aims to predict activity classes per frame, in contrast to video-level predictions in Activity Classification (i.e., Activity Recognition). Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited. Thus, commonly, previous work on temporal activity detection resorts to fine-tuning a classification model pretrained o… ▽ More

    Submitted 4 February, 2023; v1 submitted 26 November, 2021; originally announced November 2021.

    Comments: Published as a conference paper at AAAI 2023

  39. arXiv:2109.05534  [pdf, other

    cs.CV

    DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

    Authors: Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, Gang Hua

    Abstract: Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality. Nevertheless, due to the complexity of high-dimensional data, the unconstrained mapping paradigms are not able to properly catch discriminative clues about the corresponding person while d… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: Accepted by ACM MM'21

  40. arXiv:2108.09056  [pdf

    cs.RO

    Joint order assignment and picking station scheduling in KIVA warehouses with multiple stations

    Authors: Xiying Yang, Guowei Hua, Li Zhang, T. C. E Cheng, Tsan Ming Choi

    Abstract: We consider the problem of allocating orders to multiple stations and sequencing the interlinked order and rack processing flows in each station in the robot-assisted KIVA warehouse. The various decisions involved in the problem, which are closely associated and must be solved in real time, are often tackled separately for ease of treatment. However, exploiting the synergy between order assignment… ▽ More

    Submitted 5 May, 2023; v1 submitted 20 August, 2021; originally announced August 2021.

  41. Poison Ink: Robust and Invisible Backdoor Attack

    Authors: Jie Zhang, Dongdong Chen, Qidong Huang, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

    Abstract: Recent research shows deep neural networks are vulnerable to different types of attacks, such as adversarial attack, data poisoning attack and backdoor attack. Among them, backdoor attack is the most cunning one and can occur in almost every stage of deep learning pipeline. Therefore, backdoor attack has attracted lots of interests from both academia and industry. However, most existing backdoor a… ▽ More

    Submitted 13 August, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

    Comments: IEEE Transactions on Image Processing (TIP)

  42. arXiv:2108.02360  [pdf, other

    cs.CR cs.CV

    Exploring Structure Consistency for Deep Model Watermarking

    Authors: Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Zehua Ma, Weiming Zhang, Gang Hua, Nenghai Yu

    Abstract: The intellectual property (IP) of Deep neural networks (DNNs) can be easily ``stolen'' by surrogate model attack. There has been significant progress in solutions to protect the IP of DNN models in classification tasks. However, little attention has been devoted to the protection of DNNs in image processing tasks. By utilizing consistent invisible spatial watermarks, one recent work first consider… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  43. arXiv:2108.00238  [pdf, other

    cs.AI cs.CV

    Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

    Authors: Fang Zheng, Le Wang, Sanping Zhou, Wei Tang, Zhenxing Niu, Nanning Zheng, Gang Hua

    Abstract: Understanding complex social interactions among agents is a key challenge for trajectory prediction. Most existing methods consider the interactions between pairwise traffic agents or in a local area, while the nature of interactions is unlimited, involving an uncertain number of agents and non-local areas simultaneously. Besides, they treat heterogeneous traffic agents the same, namely those amon… ▽ More

    Submitted 2 November, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

  44. arXiv:2107.12960  [pdf, other

    cs.CV

    Enriching Local and Global Contexts for Temporal Action Localization

    Authors: Zixin Zhu, Wei Tang, Le Wang, Nanning Zheng, Gang Hua

    Abstract: Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framewo… ▽ More

    Submitted 7 August, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

    Comments: Accepted by ICCV 2021

  45. arXiv:2106.03772  [pdf, other

    cs.CV

    Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

    Authors: Yiding Yang, Zhou Ren, Haoxiang Li, Chunluan Zhou, Xinchao Wang, Gang Hua

    Abstract: Multi-person pose estimation and tracking serve as crucial steps for video understanding. Most state-of-the-art approaches rely on first estimating poses in each frame and only then implementing data association and refinement. Despite the promising results achieved, such a strategy is inevitably prone to missed detections especially in heavily-cluttered scenes, since this tracking-by-detection pa… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted by CVPR 2021

  46. arXiv:2106.03614  [pdf, other

    cs.CV cs.LG

    Adversarial Attack and Defense in Deep Ranking

    Authors: Mo Zhou, Le Wang, Zhenxing Niu, Qilin Zhang, Nanning Zheng, Gang Hua

    Abstract: Deep Neural Network classifiers are vulnerable to adversarial attack, where an imperceptible perturbation could result in misclassification. However, the vulnerability of DNN-based image ranking systems remains under-explored. In this paper, we propose two attacks against deep ranking systems, i.e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversa… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

  47. Video Imprint

    Authors: Zhanning Gao, Le Wang, Nebojsa Jojic, Zhenxing Niu, Nanning Zheng, Gang Hua

    Abstract: A new unified video analytics framework (ER3) is proposed for complex event retrieval, recognition and recounting, based on the proposed video imprint representation, which exploits temporal correlations among image features across video frames. With the video imprint representation, it is convenient to reverse map back to both temporal and spatial locations in video frames, allowing for both key… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Journal ref: IEEE transactions on pattern analysis and machine intelligence, 41(12), 3086-3099 (2018)

  48. arXiv:2105.10882  [pdf, other

    cs.CV

    Weakly-supervised 3D Human Pose Estimation with Cross-view U-shaped Graph Convolutional Network

    Authors: Guoliang Hua, Hong Liu, Wenhao Li, Qian Zhang, Runwei Ding, Xin Xu

    Abstract: Although monocular 3D human pose estimation methods have made significant progress, it is far from being solved due to the inherent depth ambiguity. Instead, exploiting multi-view information is a practical way to achieve absolute 3D human pose estimation. In this paper, we propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation. By only using two camera v… ▽ More

    Submitted 17 May, 2022; v1 submitted 23 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Transactions on Multimedia

  49. arXiv:2105.00133  [pdf, other

    cs.CV

    Semi-supervised Long-tailed Recognition using Alternate Sampling

    Authors: Bo Liu, Haoxiang Li, Hao Kang, Nuno Vasconcelos, Gang Hua

    Abstract: Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. While techniques have been proposed to achieve a more balanced training loss and to improve tail classes data variations with synthesized samples, we resort to leverage readily available unlabeled data to boost recognition accuracy. The idea leads to a new recognition sett… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

  50. arXiv:2105.00131  [pdf, other

    cs.CV

    GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

    Authors: Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

    Abstract: The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. It is hypothesized that the well known tendency of standard classifier training to overfit to popular classes can be exploited for effective transfer learning. Rather than eliminating this overfitting, e.g. by adopting popular class-balanced sampling methods, the learning algorithm s… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.