Skip to main content

Showing 1–22 of 22 results for author: Leng, Z

  1. arXiv:2405.02811  [pdf, other

    cs.CV

    PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

    Authors: Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

    Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D det… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  2. arXiv:2403.00372  [pdf, other

    cs.CV

    HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation

    Authors: Zhiying Leng, Tolga Birdal, Xiaohui Liang, Federico Tombari

    Abstract: 3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like ``chair" covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion,… ▽ More

    Submitted 30 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Journal ref: IEEE/CVF conference on computer vision and pattern recognition 2024

  3. arXiv:2402.01049  [pdf, other

    cs.CV

    IMUGPT 2.0: Language-Based Cross Modality Transfer for Sensor-Based Human Activity Recognition

    Authors: Zikang Leng, Amitrajit Bhattacharjee, Hrudhai Rajasekhar, Lizhe Zhang, Elizabeth Bruda, Hyeokhyen Kwon, Thomas Plötz

    Abstract: One of the primary challenges in the field of human activity recognition (HAR) is the lack of large labeled datasets. This hinders the development of robust and generalizable models. Recently, cross modality transfer approaches have been explored that can alleviate the problem of data scarcity. These approaches convert existing datasets from a source modality, such as video, to a target modality (… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2311.14189  [pdf, other

    cs.CV

    D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

    Authors: Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

    Abstract: Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  5. arXiv:2310.17976  [pdf, other

    cs.CL

    InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews

    Authors: Xintao Wang, Yunze Xiao, Jen-tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, Yanghua Xiao

    Abstract: Role-playing agents (RPAs), powered by large language models, have emerged as a flourishing field of applications. However, a key challenge lies in assessing whether RPAs accurately reproduce the personas of target characters, namely their character fidelity. Existing methods mainly focus on the knowledge and linguistic patterns of characters. This paper, instead, introduces a novel perspective to… ▽ More

    Submitted 7 June, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: ACL 2024

  6. arXiv:2310.12085  [pdf, other

    cs.CV cs.CL

    On the Benefit of Generative Foundation Models for Human Activity Recognition

    Authors: Zikang Leng, Hyeokhyen Kwon, Thomas Plötz

    Abstract: In human activity recognition (HAR), the limited availability of annotated data presents a significant challenge. Drawing inspiration from the latest advancements in generative AI, including Large Language Models (LLMs) and motion synthesis models, we believe that generative AI can address this data scarcity by autonomously generating virtual IMU data from text descriptions. Beyond this, we spotli… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Generative AI for Pervasive Computing (GenAI4PC) Symposium within UbiComp/ISWC 2023

  7. arXiv:2309.16870  [pdf, other

    cs.CV cs.LG cs.RO

    LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

    Authors: Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan

    Abstract: We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds. Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector. This feature fusion strategy enables the model to better capture the shapes and poses for challenging objects, compared with learning from raw points directly. Our method con… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  8. arXiv:2309.06284  [pdf, other

    cs.CV cs.MM

    Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

    Authors: Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun-Cheng Wu, Xiaohui Liang

    Abstract: Text-driven human motion generation in computer vision is both significant and challenging. However, current methods are limited to producing either deterministic or imprecise motion sequences, failing to effectively control the temporal and spatial relationships required to conform to a given text description. In this work, we propose a fine-grained method for generating high-quality, conditional… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  9. arXiv:2309.02965  [pdf, other

    cs.CV

    Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

    Authors: Zhiying Leng, Shun-Cheng Wu, Mahdi Saleh, Antonio Montanaro, Hao Yu, Yin Wang, Nassir Navab, Xiaohui Liang, Federico Tombari

    Abstract: Reconstructing both objects and hands in 3D from a single RGB image is complex. Existing methods rely on manually defined hand-object constraints in Euclidean space, leading to suboptimal feature learning. Compared with Euclidean space, hyperbolic space better preserves the geometric properties of meshes thanks to its exponentially-growing space distance, which amplifies the differences between th… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Accpeted by ICCV 2023

    ACM Class: I.4.5

  10. arXiv:2308.09597  [pdf, other

    cs.CL cs.HC

    ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

    Authors: Cheng Li, Ziang Leng, Chenxi Yan, Junyi Shen, Hao Wang, Weishi MI, Yaying Fei, Xiaoyang Feng, Song Yan, HaoSheng Wang, Linkang Zhan, Yaokai Jia, Pingyu Wu, Haozhen Sun

    Abstract: Role-playing chatbots built on large language models have drawn interest, but better techniques are needed to enable mimicking specific fictional characters. We propose an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts. We construct ChatHaruhi, a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: v1 - First version of techique report

  11. arXiv:2305.03187  [pdf, other

    cs.CV

    Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

    Authors: Zikang Leng, Hyeokhyen Kwon, Thomas Plötz

    Abstract: The development of robust, generalized models in human activity recognition (HAR) has been hindered by the scarcity of large-scale, labeled data sets. Recent work has shown that virtual IMU data extracted from videos using computer vision techniques can lead to substantial performance improvements when training HAR models combined with small portions of real IMU data. Inspired by recent advances i… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  12. arXiv:2304.03834  [pdf, other

    cs.CV

    WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

    Authors: Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov

    Abstract: Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the hu… ▽ More

    Submitted 18 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: ICRA 2024 camera ready version. Dataset website: https://waymo.com/open/data/motion/

  13. arXiv:2211.01342  [pdf, other

    cs.CV

    Fine-grained Human Activity Recognition Using Virtual On-body Acceleration Data

    Authors: Zikang Leng, Yash Jain, Hyeokhyen Kwon, Thomas Plötz

    Abstract: Previous work has demonstrated that virtual accelerometry data, extracted from videos using cross-modality transfer approaches like IMUTube, is beneficial for training complex and effective human activity recognition (HAR) models. Systems like IMUTube were originally designed to cover activities that are based on substantial body (part) movements. Yet, life is complex, and a range of activities of… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  14. arXiv:2210.13488  [pdf, other

    cs.CV

    LidarAugment: Searching for Scalable 3D LiDAR Data Augmentations

    Authors: Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

    Abstract: Data augmentations are important in training high-performance 3D object detectors for point clouds. Despite recent efforts on designing new data augmentations, perhaps surprisingly, most state-of-the-art 3D detectors only use a few simple data augmentations. In particular, different from 2D image data augmentations, 3D data augmentations need to account for different representations of input data… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  15. PseudoAugment: Learning to Use Unlabeled Data for Data Augmentation in Point Clouds

    Authors: Zhaoqi Leng, Shuyang Cheng, Benjamin Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov

    Abstract: Data augmentation is an important technique to improve data efficiency and save labeling cost for 3D detection in point clouds. Yet, existing augmentation policies have so far been designed to only utilize labeled data, which limits the data diversity. In this paper, we recognize that pseudo labeling and data augmentation are complementary, thus propose to leverage unlabeled data for data augmenta… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Journal ref: ECCV 2022 (pp. 555-572). Springer, Cham

  16. arXiv:2210.07372  [pdf, other

    cs.CV

    SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

    Authors: Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, Dragomir Anguelov

    Abstract: 3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Journal ref: ECCV 2022

  17. arXiv:2210.05018  [pdf, other

    cs.CV

    LidarNAS: Unifying and Searching Neural Architectures for 3D Point Clouds

    Authors: Chenxi Liu, Zhaoqi Leng, Pei Sun, Shuyang Cheng, Charles R. Qi, Yin Zhou, Mingxing Tan, Dragomir Anguelov

    Abstract: Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving. However, arguably due to the higher-dimensional nature of the data (as compared to images), existing neural architectures exhibit a large variety in their designs, including but not limited to the views considered, the format of the neural features, and th… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  18. arXiv:2207.07198  [pdf, ps, other

    cs.RO

    The Effect of Sideslip on Jackknife Limits During Low Speed Trailer Operation

    Authors: Zhe Leng, Mark A. Minor

    Abstract: Jackknifing refers to the serious situation where a vehicle-trailer system enters a jackknife state and the vehicle and trailer eventually collide if trailer operation is not corrected. This paper considers low speed trailer maneuvering typical of trailer backing where jackknife state limits can vary due to sideslip caused by physical interaction between the vehicle, trailer, and environment. Anal… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  19. arXiv:2205.05703  [pdf, other

    cs.CV cs.RO

    Multi-Class 3D Object Detection with Single-Class Supervision

    Authors: Mao Ye, Chenxi Liu, Maoqing Yao, Weiyue Wang, Zhaoqi Leng, Charles R. Qi, Dragomir Anguelov

    Abstract: While multi-class 3D detectors are needed in many robotics applications, training them with fully labeled datasets can be expensive in labeling cost. An alternative approach is to have targeted single-class labels on disjoint data samples. In this paper, we are interested in training a multi-class 3D object detection model, while using these single-class labeled data. We begin by detailing the uni… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: ICRA 2022

  20. arXiv:2204.12511  [pdf, other

    cs.CV

    PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

    Authors: Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Xiaojie Shi, Shuyang Cheng, Dragomir Anguelov

    Abstract: Cross-entropy loss and focal loss are the most common choices when training deep neural networks for classification problems. Generally speaking, however, a good loss function can take on much more flexible forms, and should be tailored for different tasks and datasets. Motivated by how functions can be approximated via Taylor expansion, we propose a simple framework, named PolyLoss, to view and d… ▽ More

    Submitted 10 May, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: Add ablation studies on COCO detection using RetinaNet (Section 8)

    Journal ref: International Conference on Learning Representations. 2021

  21. arXiv:2004.00831  [pdf, other

    cs.CV

    Improving 3D Object Detection through Progressive Population Based Augmentation

    Authors: Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang Song, Benjamin Caine, Vijay Vasudevan, Congcong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov

    Abstract: Data augmentation has been widely adopted for object detection in 3D point clouds. However, all previous related efforts have focused on manually designing specific data augmentation methods for individual architectures. In this work, we present the first attempt to automate the design of data augmentation policies for 3D object detection. We introduce the Progressive Population Based Augmentation… ▽ More

    Submitted 16 July, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: Accepted at ECCV 2020

  22. arXiv:2003.04993  [pdf, other

    cs.CL

    Learning to mirror speaking styles incrementally

    Authors: Siyi Liu, Ziang Leng, Derry Wijaya

    Abstract: Mirroring is the behavior in which one person subconsciously imitates the gesture, speech pattern, or attitude of another. In conversations, mirroring often signals the speakers enjoyment and engagement in their communication. In chatbots, methods have been proposed to add personas to the chatbots and to train them to speak or to shift their dialogue style to that of the personas. However, they of… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: 4 pages, 3 tables, 1 figure