Skip to main content

Showing 1–50 of 77 results for author: Sheng, L

  1. arXiv:2407.05441  [pdf, other

    cs.IR cs.AI

    Language Models Encode Collaborative Signals in Recommendation

    Authors: Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, Tat-Seng Chua

    Abstract: Recent studies empirically indicate that language models (LMs) encode rich world knowledge beyond mere semantics, attracting significant attention across various fields. However, in the recommendation domain, it remains uncertain whether LMs implicitly encode user preference information. Contrary to the prevailing understanding that LMs and traditional recommender models learn two distinct represe… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/LehengTHU/AlphaRec

  2. arXiv:2406.09215  [pdf, other

    cs.IR cs.AI

    On Softmax Direct Preference Optimization for Recommendation

    Authors: Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

    Abstract: Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-t… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.03184  [pdf, other

    cs.CV

    Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

    Authors: Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng

    Abstract: Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which in… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: See our project page at https://costwen.github.io/Ouroboros3D/

  4. arXiv:2404.15267  [pdf, other

    cs.CV

    From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

    Authors: Zehuan Huang, Hongxing Fan, Lipeng Wang, Lu Sheng

    Abstract: Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, inc… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  5. arXiv:2404.13854  [pdf, other

    cs.CV

    Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

    Authors: Haolin Yang, Chaoqiang Zhao, Lu Sheng, Yang Tang

    Abstract: Nighttime self-supervised monocular depth estimation has received increasing attention in recent years. However, using night images for self-supervision is unreliable because the photometric consistency assumption is usually violated in the videos taken under complex lighting conditions. Even with domain adaptation or photometric loss repair, performance is still limited by the poor supervision of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024

  6. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  7. arXiv:2403.17830  [pdf, other

    cs.CV

    Assessment of Multimodal Large Language Models in Alignment with Human Values

    Authors: Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining h… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.02692

  8. arXiv:2403.12037  [pdf, other

    cs.CV

    MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

    Authors: Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://sites.google.com/view/minedreamer/main

  9. arXiv:2403.10750  [pdf, other

    cs.CL cs.AI

    Depression Detection on Social Media with Large Language Models

    Authors: Xiaochong Lan, Yiming Cheng, Li Sheng, Chen Gao, Yong Li

    Abstract: Depression harms. However, due to a lack of mental health awareness and fear of stigma, many patients do not actively seek diagnosis and treatment, leading to detrimental outcomes. Depression detection aims to determine whether an individual suffers from depression by analyzing their history of posts on social media, which can significantly aid in early detection and intervention. It mainly faces… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  10. arXiv:2403.10261  [pdf, other

    cs.CV

    Learning Spatiotemporal Inconsistency via Thumbnail Layout for Face Deepfake Detection

    Authors: Yuting Xu, Jian Liang, Lijun Sheng, Xiao-Yu Zhang

    Abstract: The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on {3D CNNs} resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCV

  11. arXiv:2403.03962  [pdf, other

    cs.SI cs.AI cs.NE

    Identify Critical Nodes in Complex Network with Large Language Models

    Authors: Jinzhu Mao, Dongyun Zou, Li Sheng, Siyi Liu, Chen Gao, Yue Wang, Yong Li

    Abstract: Identifying critical nodes in networks is a classical decision-making task, and many methods struggle to strike a balance between adaptability and utility. Therefore, we propose an approach that empowers Evolutionary Algorithm (EA) with Large Language Models (LLMs), to generate a function called "score\_nodes" which can further be used to identify crucial nodes based on their assigned scores. Our… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  12. arXiv:2402.13769  [pdf, other

    cs.IR

    General Debiasing for Graph-based Collaborative Filtering via Adversarial Graph Dropout

    Authors: An Zhang, Wenchang Ma, Pengbo Wei, Leheng Sheng, Xiang Wang

    Abstract: Graph neural networks (GNNs) have shown impressive performance in recommender systems, particularly in collaborative filtering (CF). The key lies in aggregating neighborhood information on a user-item interaction graph to enhance user/item representations. However, we have discovered that this aggregation mechanism comes with a drawback, which amplifies biases present in the interaction graph. For… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to WWW 2024

  13. arXiv:2402.04087  [pdf, other

    cs.CV cs.AI cs.LG

    A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

    Authors: Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with lim… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  14. arXiv:2401.15657  [pdf, other

    cs.CV

    Data-Free Generalized Zero-Shot Learning

    Authors: Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu

    Abstract: Deep learning models have the ability to extract rich knowledge from large-scale datasets. However, the sharing of data has become increasingly challenging due to concerns regarding data copyright and privacy. Consequently, this hampers the effective transfer of knowledge from existing data to novel downstream tasks and concepts. Zero-shot learning (ZSL) approaches aim to recognize new classes by… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI24

  15. arXiv:2401.15071  [pdf, other

    cs.CV

    From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

    Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

    Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  16. arXiv:2401.06030  [pdf, other

    cs.CR

    Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation

    Authors: Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Model adaptation tackles the distribution shift problem with a pre-trained model instead of raw data, becoming a popular paradigm due to its great privacy protection. Existing methods always assume adapting to a clean target domain, overlooking the security risks of unlabeled samples. In this paper, we explore the potential backdoor attacks on model adaptation launched by well-designed poisoning t… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 11 pages, 4 figures

  17. arXiv:2312.16578  [pdf, other

    cs.CV

    Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

    Authors: Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

    Abstract: 3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (s… ▽ More

    Submitted 29 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

  18. arXiv:2312.07472  [pdf, other

    cs.CV

    MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

    Authors: Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

    Abstract: It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways. However, existing approaches usually struggle with compound difficulties caused by the logic-aware decomposition and context-aware execution of these tasks. To this end, we introduce MP5, an open-ended multimodal embodied system built upon the challenging Minecraft simulator, whi… ▽ More

    Submitted 26 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024

  19. arXiv:2312.06725  [pdf, other

    cs.CV

    EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

    Authors: Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

    Abstract: Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image. Recent methods that introduce 3D global representation into diffusion models have shown the potential to generate consistent multiviews, but they have reduced generation speed and face challenges in maintaining generalizability and quality. To address this issue, we propose E… ▽ More

    Submitted 2 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://huanngzh.github.io/EpiDiff/

  20. arXiv:2311.16843  [pdf, ps, other

    cs.CV

    Self-training solutions for the ICCV 2023 GeoNet Challenge

    Authors: Lijun Sheng, Zhengbo Wang, Jian Liang

    Abstract: GeoNet is a recently proposed domain adaptation benchmark consisting of three challenges (i.e., GeoUniDA, GeoImNet, and GeoPlaces). Each challenge contains images collected from the USA and Asia where there are huge geographical gaps. Our solution adopts a two-stage source-free domain adaptation framework with a Swin Transformer backbone to achieve knowledge transfer from the USA (source) domain t… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: technical report; 1st in the ICCV-2023 GeoUniDA challenge

  21. arXiv:2311.02692  [pdf, other

    cs.CV

    ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

    Authors: Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

    Abstract: Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks. However, even though a list of benchmarks has been proposed, the capabilities and limitations of MLLMs are still not comprehensively understood, due to a lack of a standardized and holistic evaluation framework. To this end, we present the first Compre… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 39 pages, 26 figures

  22. arXiv:2311.02684  [pdf, other

    cs.CV cs.CL

    Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

    Authors: Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

    Abstract: Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-shot generalization capabilities to multimodal learning through instruction tuning. As more modalities and downstream tasks are introduced, negative conflicts and interference may have a worse impact on performance. While this phenomenon has been overlooked in previous work, we propose a novel and extensible framew… ▽ More

    Submitted 13 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 22 pages, 12 figures. Accepted in ICLR 2024

  23. arXiv:2311.02343  [pdf, other

    cs.CV cs.AI

    Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

    Authors: Hao Ai, Lu Sheng

    Abstract: Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengt… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

  24. arXiv:2310.18700  [pdf, other

    cs.IR

    Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss

    Authors: An Zhang, Leheng Sheng, Zhibo Cai, Xiang Wang, Tat-Seng Chua

    Abstract: Contrastive Learning (CL) has achieved impressive performance in self-supervised learning tasks, showing superior generalization ability. Inspired by the success, adopting CL into collaborative filtering (CF) is prevailing in semi-supervised top-K recommendations. The basic idea is to routinely conduct heuristic-based data augmentation and apply contrastive losses (e.g., InfoNCE) on the augmented… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  25. arXiv:2310.10108  [pdf, other

    cs.IR cs.AI

    On Generative Agents in Recommendation

    Authors: An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, Tat-Seng Chua

    Abstract: Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development. Addressing this challenge, we envision a recommendation simulator, capitalizing on recent breakthroughs in human-level intelligence exhibited by Large Language Models (LLMs). We propose Agent4Rec, a user simulator in recomm… ▽ More

    Submitted 11 May, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: SIGIR 2024 perspective paper

  26. arXiv:2310.05754  [pdf, other

    cs.LG cs.CV

    Unleashing the power of Neural Collapse for Transferability Estimation

    Authors: Yuhe Ding, Bo Jiang, Lijun Sheng, Aihua Zheng, Jian Liang

    Abstract: Transferability estimation aims to provide heuristics for quantifying how suitable a pre-trained model is for a specific downstream task, without fine-tuning them all. Prior studies have revealed that well-trained models exhibit the phenomenon of Neural Collapse. Based on a widely used neural collapse metric in existing literature, we observe a strong correlation between the neural collapse of pre… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  27. arXiv:2309.02773  [pdf, other

    cs.CV

    Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

    Authors: Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu

    Abstract: The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes. Recently, there has been a growing interest in expanding the application of generative models from generation tasks to semantic segmentation. These approaches utili… ▽ More

    Submitted 22 January, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

  28. arXiv:2308.12919  [pdf, other

    cs.CV cs.LG

    Towards Realistic Unsupervised Fine-tuning with CLIP

    Authors: Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan

    Abstract: The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsup… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  29. arXiv:2308.03359  [pdf, other

    cs.CV

    Distortion-aware Transformer in 360° Salient Object Detection

    Authors: Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu

    Abstract: With the emergence of VR and AR, 360° data attracts increasing attention from the computer vision and multimedia communities. Typically, 360° data is projected into 2D ERP (equirectangular projection) images for feature extraction. However, existing methods cannot handle the distortions that result from the projection, hindering the development of 360-data-based tasks. Therefore, in this paper, we… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures

  30. arXiv:2307.03133  [pdf, other

    cs.LG cs.CV

    Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification

    Authors: Yongcan Yu, Lijun Sheng, Ran He, Jian Liang

    Abstract: Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distrib… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  31. arXiv:2306.06687  [pdf, other

    cs.CV

    LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    Authors: Zhenfei Yin, Jiong Wang, Jianjian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

    Abstract: Large language models have emerged as a promising approach towards achieving general-purpose AI agents. The thriving open-source LLM community has greatly accelerated the development of agents that support human-machine dialogue interaction through natural language processing. However, human interaction with the world extends beyond only text as a modality, and other modalities such as vision are… ▽ More

    Submitted 6 November, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

    Comments: NeurIPS2023 camera ready ; 37 pages, 33 figures. Code available at https://github.com/OpenLAMM/LAMM ; Project page: https://openlamm.github.io/

  32. arXiv:2303.18144  [pdf, other

    cs.CV

    Siamese DETR

    Authors: Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng

    Abstract: Recent self-supervised methods are mainly designed for representation learning with the base model, e.g., ResNets or ViTs. They cannot be easily transferred to DETR, with task-specific Transformer modules. In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR. We consider learning view-invariant and detection-oriented represe… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: 10 pages, 11 figures. Accepted in CVPR 2023

  33. arXiv:2303.14408  [pdf, other

    cs.CV

    VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

    Authors: Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng

    Abstract: The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since (1) the 3D point cloud only captures geometric structures with limited semantics compared to 2D images, and (2) long-tailed relation distribution inherently hinders the learning of unbiased prediction. Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this stu… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: CVPR2023 Highlight

  34. arXiv:2303.10594  [pdf, other

    cs.CR cs.CV cs.LG

    AdaptGuard: Defending Against Universal Attacks for Model Adaptation

    Authors: Lijun Sheng, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

    Abstract: Model adaptation aims at solving the domain transfer problem under the constraint of only accessing the pretrained source models. With the increasing considerations of data privacy and transmission efficiency, this paradigm has been gaining recent popularity. This paper studies the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms due to the e… ▽ More

    Submitted 27 November, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: ICCV2023

  35. arXiv:2301.12511  [pdf, other

    cs.CV

    Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

    Authors: Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

    Abstract: Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effectiv… ▽ More

    Submitted 9 July, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.07870

    Journal ref: Transactions on Pattern Analysis and Machine Intelligence 2024

  36. arXiv:2209.12028  [pdf, other

    cs.CV

    Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

    Authors: Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan

    Abstract: Recently, 3D vision-and-language tasks have attracted increasing research interest. Compared to other vision-and-language tasks, the 3D visual question answering (VQA) task is less exploited and is more susceptible to language priors and co-reference ambiguity. Meanwhile, a couple of recently proposed 3D VQA datasets do not well support 3D VQA task due to their limited scale and annotation methods… ▽ More

    Submitted 24 September, 2022; originally announced September 2022.

    Comments: 13 pages, 10 figures

  37. arXiv:2209.06582  [pdf, other

    cs.LG cs.AI

    A Clustering Method Based on Information Entropy Payload

    Authors: Shaodong Deng, Long Sheng, Jiayi Nie, Fuyi Deng

    Abstract: Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy (called entropy payload in… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  38. arXiv:2208.14893  [pdf, other

    cs.CV

    Improving RGB-D Point Cloud Registration by Learning Multi-scale Local Linear Transformation

    Authors: Ziming Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu

    Abstract: Point cloud registration aims at estimating the geometric transformation between two point cloud scans, in which point-wise correspondence estimation is the key to its success. In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence. However… ▽ More

    Submitted 31 August, 2022; v1 submitted 31 August, 2022; originally announced August 2022.

    Comments: Accepted to ECCV 2022, supplementary materials included

  39. arXiv:2208.06880  [pdf, other

    cs.CV

    SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

    Authors: Chenjian Gao, Qian Yu, Lu Sheng, Yi-Zhe Song, Dong Xu

    Abstract: Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape. Existing works try to employ the global feature extracted from sketch to directly predict the 3D coordinates, but they usually suffer from losing fine details that are not faithful to the input sketch. Through analyzing the 3D-to-2D p… ▽ More

    Submitted 25 December, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

    Comments: 16 pages, 7 figures, accepted by ECCV 2022

  40. arXiv:2205.14566  [pdf, other

    cs.CV cs.AI

    ProxyMix: Proxy-based Mixup Training with Label Refinery for Source-Free Domain Adaptation

    Authors: Yuhe Ding, Lijun Sheng, Jian Liang, Aihua Zheng, Ran He

    Abstract: Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Owing to privacy concerns and heavy data transmission, source-free UDA, exploiting the pre-trained source models instead of the raw source data for target learning, has been gaining popularity in recent years. Some works attempt to recover unseen source domains with generativ… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

  41. arXiv:2203.08764  [pdf, other

    cs.CV cs.AI

    X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

    Authors: Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

    Abstract: In computer vision, pre-training models based on largescale supervised learning have been proven effective over the past few years. However, existing works mostly focus on learning from individual task with single data source (e.g., ImageNet for classification or COCO for detection). This restricted form limits their generalizability and usability due to the lack of vast semantic information from… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: 12 pages, 4 figures

  42. arXiv:2203.07845  [pdf, other

    cs.CV

    Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy

    Authors: Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu

    Abstract: Large-scale datasets play a vital role in computer vision. But current datasets are annotated blindly without differentiation to samples, making the data collection inefficient and unscalable. The open question is how to build a mega-scale dataset actively. Although advanced active learning algorithms might be the answer, we experimentally found that they are lame in the realistic annotation scena… ▽ More

    Submitted 23 August, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Bamboo is available at https://github.com/ZhangYuanhan-AI/Bamboo

  43. arXiv:2112.08325  [pdf, other

    cs.CV

    ForgeryNet -- Face Forgery Analysis Challenge 2021: Methods and Results

    Authors: Yinan He, Lu Sheng, Jing Shao, Ziwei Liu, Zhaofan Zou, Zhizhi Guo, Shan Jiang, Curitis Sun, Guosheng Zhang, Keyao Wang, Haixiao Yue, Zhibin Hong, Wanguo Wang, Zhenyu Li, Qi Wang, Zhenli Wang, Ronghao Xu, Mingwen Zhang, Zhiheng Wang, Zhenhang Huang, Tianming Zhang, Ningning Zhao

    Abstract: The rapid progress of photorealistic synthesis techniques has reached a critical point where the boundary between real and manipulated images starts to blur. Recently, a mega-scale deep face forgery dataset, ForgeryNet which comprised of 2.9 million images and 221,247 videos has been released. It is by far the largest publicly available in terms of data-scale, manipulations (7 image-level approach… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Technical report. Challenge website: https://competitions.codalab.org/competitions/33386

  44. arXiv:2110.08729  [pdf, other

    cs.CV cs.AI

    VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds

    Authors: Guanze Liu, Yu Rong, Lu Sheng

    Abstract: 3D human mesh recovery from point clouds is essential for various tasks, including AR/VR and human behavior understanding. Previous works in this field either require high-quality 3D human scans or sequential point clouds, which cannot be easily applied to low-quality 3D scans captured by consumer-level depth sensors. In this paper, we make the first attempt to reconstruct reliable 3D human shapes… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: Our paper are accepted to MM 2021 as oral

  45. arXiv:2109.09596  [pdf

    cs.CV

    Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium Segmentation

    Authors: Xuanting Hao, Shengbo Gao, Lijie Sheng, Jicong Zhang

    Abstract: Consistency training has proven to be an advanced semi-supervised framework and achieved promising results in medical image segmentation tasks through enforcing an invariance of the predictions over different views of the inputs. However, with the iterative updating of model parameters, the models would tend to reach a coupled state and eventually lose the ability to exploit unlabeled data. To add… ▽ More

    Submitted 8 November, 2021; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: ICMV2021 camera ready

  46. arXiv:2108.13461  [pdf

    cs.LG

    Time Series Prediction using Deep Learning Methods in Healthcare

    Authors: Mohammad Amin Morid, Olivia R. Liu Sheng, Joseph Dunbar

    Abstract: Traditional machine learning methods face two main challenges in dealing with healthcare predictive analytics tasks. First, the high-dimensional nature of healthcare data needs labor-intensive and time-consuming processes to select an appropriate set of features for each new task. Second, these methods depend on feature engineering to capture the sequential nature of patient data, which may not ad… ▽ More

    Submitted 14 September, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

  47. arXiv:2104.06114  [pdf, other

    cs.CV

    Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

    Authors: Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu

    Abstract: 3D object detection in point clouds is a challenging vision task that benefits various applications for understanding the 3D visual world. Lots of recent research focuses on how to exploit end-to-end trainable Hough voting for generating object proposals. However, the current voting strategy can only receive partial votes from the surfaces of potential objects together with severe outlier votes fr… ▽ More

    Submitted 14 April, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: CVPR2021

  48. arXiv:2103.10206  [pdf, other

    cs.AI cs.CV

    DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

    Authors: Buyu Li, Yongchi Zhao, Zhelun Shi, Lu Sheng

    Abstract: Generating 3D dances from music is an emerged research task that benefits a lot of applications in vision and graphics. Previous works treat this task as sequence generation, however, it is challenging to render a music-aligned long-term sequence with high kinematic complexity and coherent movements. In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-… ▽ More

    Submitted 27 July, 2023; v1 submitted 18 March, 2021; originally announced March 2021.

    Comments: This is the version accepted by AAAI-22

  49. arXiv:2103.05630  [pdf, other

    cs.CV cs.LG

    ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

    Authors: Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

    Abstract: The rapid progress of photorealistic synthesis techniques has reached at a critical point where the boundary between real and manipulated images starts to blur. Thus, benchmarking and advancing digital forgery analysis have become a pressing issue. However, existing face forgery datasets either have limited diversity or only support coarse-grained analysis. To counter this emerging threat, we cons… ▽ More

    Submitted 14 July, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: 17 pages, 11 figures, Accepted to CVPR 2021 (Oral), project webpage: https://yinanhe.github.io/projects/forgerynet.html

  50. arXiv:2011.00826  [pdf, other

    cs.CV

    PV-NAS: Practical Neural Architecture Search for Video Recognition

    Authors: Zihao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao

    Abstract: Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability. Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests. Recent advance in network architecture search has boosted the image recognition performance in a large margin. However, automatic… ▽ More

    Submitted 2 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.