Skip to main content

Showing 1–50 of 70 results for author: Ji, P

  1. arXiv:2407.01531  [pdf, other

    cs.RO cs.LG

    Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

    Authors: Yixiao Wang, Yifei Zhang, Mingxiao Huo, Ran Tian, Xiang Zhang, Yichen Xie, Chenfeng Xu, Pengliang Ji, Wei Zhan, Mingyu Ding, Masayoshi Tomizuka

    Abstract: The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.18591  [pdf, other

    cs.CV cs.AI cs.LG

    Composition Vision-Language Understanding via Segment and Depth Anything Model

    Authors: Mingxiao Huo, Pengliang Ji, Haotian Lin, Junchen Liu, Yixiao Wang, Yijun Chen

    Abstract: We introduce a pioneering unified library that leverages depth anything, segment anything models to augment neural comprehension in language-vision model zero-shot understanding. This library synergizes the capabilities of the Depth Anything Model (DAM), Segment Anything Model (SAM), and GPT-4V, enhancing multimodal tasks such as vision-question-answering (VQA) and composition reasoning. Through t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.15245  [pdf, other

    cs.CL cs.LG

    Unsupervised Morphological Tree Tokenizer

    Authors: Qingyang Zhu, Xiang Hu, Pengyu Ji, Wei Wu, Kewei Tu

    Abstract: As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of word… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2405.15622  [pdf, other

    cs.CV

    LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image

    Authors: Ruikai Cui, Xibin Song, Weixuan Sun, Senbo Wang, Weizhe Liu, Shenzhou Chen, Taizhang Shang, Yang Li, Nick Barnes, Hongdong Li, Pan Ji

    Abstract: Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images. Despite their success, these models often produce 3D meshes with geometric inaccuracies, stemming from the inherent challenges of deducing 3D shapes solely from image data. In this work, we introduce a novel framework, the Large Image and Point Cloud Align… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 19 pages, 10 figures

  5. arXiv:2405.11613  [pdf, other

    cs.CL

    Decoding by Contrasting Knowledge: Enhancing LLMs' Confidence on Edited Facts

    Authors: Baolong Bi, Shenghua Liu, Lingrui Mei, Yiwei Wang, Pengliang Ji, Xueqi Cheng

    Abstract: The knowledge within large language models (LLMs) may become outdated quickly. While in-context editing (ICE) is currently the most effective method for knowledge editing (KE), it is constrained by the black-box modeling of LLMs and thus lacks interpretability. Our work aims to elucidate the superior performance of ICE on the KE by analyzing the impacts of in-context new knowledge on token-wise di… ▽ More

    Submitted 21 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  6. arXiv:2405.07744  [pdf, other

    cs.SE

    MoCo: Fuzzing Deep Learning Libraries via Assembling Code

    Authors: Pin Ji, Yang Feng, Duo Wu, Lingyue Yan, Pengling Chen, Jia Liu, Zhihong Zhao

    Abstract: The rapidly developing deep learning (DL) techniques have been applied in software systems with various application scenarios. However, they could also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  7. arXiv:2404.17511  [pdf, other

    cs.LG cs.CY cs.SI

    Bridging the Fairness Divide: Achieving Group and Individual Fairness in Graph Neural Networks

    Authors: Duna Zhan, Dongliang Guo, Pengsheng Ji, Sheng Li

    Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for analyzing and learning from complex data structured as graphs, demonstrating remarkable effectiveness in various applications, such as social network analysis, recommendation systems, and drug discovery. However, despite their impressive performance, the fairness problem has increasingly gained attention as a crucial aspect to consid… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages, 3 figures

  8. arXiv:2404.10160  [pdf, other

    cs.AI

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

    Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More

    Submitted 18 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: The first three authors contributed equally to this work

  9. arXiv:2403.18241  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation

    Authors: Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji

    Abstract: 3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  10. arXiv:2403.16210  [pdf, other

    cs.CV cs.AI cs.GR

    Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

    Authors: Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

    Abstract: We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass. Unlike existing methods that output a single, unified 3D shape, Frankenstein simultaneously generates multiple separated shapes, each corresponding to a semantically meaningful part. The 3D scene information is encoded in one single tri-plane tensor, from which multiple Singed… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Video: https://youtu.be/lRn-HqyCrLI

  11. arXiv:2403.08293  [pdf, other

    cs.CL cs.AI

    Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

    Authors: Xiang Hu, Pengyu Ji, Qingyang Zhu, Wei Wu, Kewei Tu

    Abstract: A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It c… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: accepted by ACL 2024

  12. arXiv:2402.18527  [pdf, other

    cs.CV cs.LG eess.IV

    Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures

    Authors: Andrei Cozma, Landon Harris, Hairong Qi, Ping Ji, Wenpeng Guo, Song Yuan

    Abstract: This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and te… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures, 3 tables, submitted to ICIP2024

    ACM Class: I.4.7; I.4.9; I.4.0

  13. arXiv:2402.00262  [pdf

    cs.AI

    Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective

    Authors: Qun Ma, Xiao Xue, Deyu Zhou, Xiangning Yu, Donghua Liu, Xuwen Zhang, Zihan Zhao, Yifan Shen, Peilin Ji, Juanjuan Li, Gang Wang, Wanpeng Ma

    Abstract: Computational experiments have emerged as a valuable method for studying complex systems, involving the algorithmization of counterfactuals. However, accurately representing real social systems in Agent-based Modeling (ABM) is challenging due to the diverse and intricate characteristics of humans, including bounded rationality and heterogeneity. To address this limitation, the integration of Large… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  14. arXiv:2401.17053  [pdf, other

    cs.CV cs.AI cs.GR

    BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

    Authors: Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji

    Abstract: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, f… ▽ More

    Submitted 23 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: ACM Transactions on Graphics (SIGGRAPH'24). Code: https://yang-l1.github.io/blockfusion

  15. Recent Advances in Text Analysis

    Authors: Zheng Tracy Ke, Pengsheng Ji, Jiashun Jin, Wanshan Li

    Abstract: Text analysis is an interesting research area in data science and has various applications, such as in artificial intelligence, biomedical research, and engineering. We review popular methods for text analysis, ranging from topic modeling to the recent neural language models. In particular, we review Topic-SCORE, a statistical approach to topic modeling, and discuss how to use it to analyze MADSta… ▽ More

    Submitted 7 February, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Journal ref: Annual Review of Statistics and Its Application 2024 11:1

  16. arXiv:2310.10343  [pdf, other

    cs.CV

    ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion

    Authors: Jiayu Yang, Ziang Cheng, Yunfei Duan, Pan Ji, Hongdong Li

    Abstract: Given a single image of a 3D object, this paper proposes a novel method (named ConsistNet) that is able to generate multiple images of the same object, as if seen they are captured from different viewpoints, while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited. Central to our method is a multi-view consistency block which enables information excha… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  17. arXiv:2309.10255  [pdf, other

    cs.CV

    RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery

    Authors: Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, Pan Ji

    Abstract: While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimati… ▽ More

    Submitted 18 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

  18. arXiv:2308.11737  [pdf, other

    cs.CV cs.LG

    Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

    Authors: Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski

    Abstract: Accurately estimating the 3D pose and shape is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. However, research in this area is held back by the lack of a comprehensive and diverse dataset with high-quality 3D pose and shape annotations. In this paper, we propose Animal3D, the first comprehensive dat… ▽ More

    Submitted 20 January, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: 11 pages, 5 figures, link to the dataset: https://xujiacong.github.io/Animal3D/

  19. arXiv:2308.10123  [pdf, other

    cs.CV cs.AI

    3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation

    Authors: Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, Alan Yuille

    Abstract: Regression-based methods for 3D human pose estimation directly predict the 3D pose parameters from a 2D image using deep networks. While achieving state-of-the-art performance on standard benchmarks, their performance degrades under occlusion. In contrast, optimization-based methods fit a parametric body model to 2D features in an iterative manner. The localized reconstruction loss can potentially… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, project page: https://3dnbf.github.io/

  20. arXiv:2306.09245  [pdf

    cs.CR cs.CE cs.CV

    Image encryption for Offshore wind power based on 2D-LCLM and Zhou Yi Eight Trigrams

    Authors: Lei Kou, Jinbo Wu, Fangfang Zhang, Peng Ji, Wende Ke, Junhe Wan, Hailin Liu, Yang Li, Quande Yuan

    Abstract: Offshore wind power is an important part of the new power system, due to the complex and changing situation at ocean, its normal operation and maintenance cannot be done without information such as images, therefore, it is especially important to transmit the correct image in the process of information transmission. In this paper, we propose a new encryption algorithm for offshore wind power based… ▽ More

    Submitted 27 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: accepted by Int. J. of Bio-Inspired Computation

    MSC Class: 68P25 ACM Class: E.3

    Journal ref: International Journal of Bio-Inspired Computation.vol. 22, no. 1,pp 53-64 (2023)

  21. arXiv:2304.06178  [pdf, other

    cs.CV cs.GR

    Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

    Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu

    Abstract: Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth obs… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: For the project, see https://yanqingan.github.io/

  22. arXiv:2302.10499  [pdf, other

    cs.SE

    Intergenerational Test Generation for Natural Language Processing Applications

    Authors: Pin Ji, Yang Feng, Weitao Huang, Jia Liu, Zhihong Zhao

    Abstract: The development of modern NLP applications often relies on various benchmark datasets containing plenty of manually labeled tests to evaluate performance. While constructing datasets often costs many resources, the performance on the held-out data may not properly reflect their capability in real-world application scenarios and thus cause tremendous misunderstanding and monetary loss. To alleviate… ▽ More

    Submitted 28 July, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

  23. arXiv:2210.14383  [pdf, other

    cs.CV

    CLIP-FLow: Contrastive Learning by semi-supervised Iterative Pseudo labeling for Optical Flow Estimation

    Authors: Zhiqi Zhang, Nitin Bansal, Changjiang Cai, Pan Ji, Qingan Yan, Xiangyu Xu, Yi Xu

    Abstract: Synthetic datasets are often used to pretrain end-to-end optical flow networks, due to the lack of a large amount of labeled, real-scene data. But major drops in accuracy occur when moving from synthetic to real scenes. How do we better transfer the knowledge learned from synthetic to real domains? To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer… ▽ More

    Submitted 2 December, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

  24. arXiv:2209.15574  [pdf, other

    cs.CG

    An improved algorithm for Generalized Čech complex construction

    Authors: Jie Chu, Mikael Vejdemo-Johansson, Ping Ji

    Abstract: In this paper, we present an algorithm that computes the generalized Čech complex for a finite set of disks where each may have a different radius in 2D space. An extension of this algorithm is also proposed for a set of balls in 3D space with different radius. To compute a $k$-simplex, we leverage the computation performed in the round of $(k-1)$-simplices such that we can reduce the number of… ▽ More

    Submitted 30 September, 2022; originally announced September 2022.

    MSC Class: 68U05; 57-08 ACM Class: F.2.2; I.3.5

  25. arXiv:2207.08951  [pdf, other

    cs.CV

    MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

    Authors: Runze Li, Pan Ji, Yi Xu, Bir Bhanu

    Abstract: Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments. However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices. As compared to outdoor environments, estimating depth of monocular videos for indoor environments, using self-supervised methods, resul… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Journal version of "MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments"(ICCV-2021). arXiv admin note: substantial text overlap with arXiv:2107.12429

  26. arXiv:2206.10562  [pdf, other

    cs.CV

    Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth

    Authors: Nitin Bansal, Pan Ji, Junsong Yuan, Yi Xu

    Abstract: Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w.r.t model's generalizability, performance, and training/inference memory footprint. The aforementioned benefits become ever so indispensable in the case of joint training for vision-related {\bf dense} prediction tasks. In this work, we tackle the MTL problem of two dense tasks, i… ▽ More

    Submitted 25 October, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

  27. arXiv:2205.14405  [pdf, other

    cs.CV

    Strengthening Skeletal Action Recognizers via Leveraging Temporal Patterns

    Authors: Zhenyue Qin, Pan Ji, Dongwoo Kim, Yang Liu, Saeed Anwar, Tom Gedeon

    Abstract: Skeleton sequences are compact and lightweight. Numerous skeleton-based action recognizers have been proposed to classify human behaviors. In this work, we aim to incorporate components that are compatible with existing models and further improve their accuracy. To this end, we design two temporal accessories: discrete cosine encoding (DCE) and chronological loss (CRL). DCE facilitates models to a… ▽ More

    Submitted 23 August, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: ECCV2022-RWS

  28. arXiv:2205.14320  [pdf, other

    cs.CV

    RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

    Authors: Changjiang Cai, Pan Ji, Qingan Yan, Yi Xu

    Abstract: This paper presents a learning-based method for multi-view depth estimation from posed images. Our core idea is a "learning-to-optimize" paradigm that iteratively indexes a plane-sweeping cost volume and regresses the depth map via a convolutional Gated Recurrent Unit (GRU). Since the cost volume plays a paramount role in encoding the multi-view geometry, we aim to improve its construction both at… ▽ More

    Submitted 21 March, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

    Comments: CVPR 2023

  29. arXiv:2205.02940  [pdf, other

    cs.RO cs.CV

    CNN-Augmented Visual-Inertial SLAM with Planar Constraints

    Authors: Pan Ji, Yuan Tian, Qingan Yan, Yuxin Ma, Yi Xu

    Abstract: We present a robust visual-inertial SLAM system that combines the benefits of Convolutional Neural Networks (CNNs) and planar constraints. Our system leverages a CNN to predict the depth map and the corresponding uncertainty map for each image. The CNN depth effectively bootstraps the back-end optimization of SLAM and meanwhile the CNN uncertainty adaptively weighs the contribution of each feature… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  30. arXiv:2205.02930  [pdf, other

    cs.CV

    FisheyeDistill: Self-Supervised Monocular Depth Estimation with Ordinal Distillation for Fisheye Cameras

    Authors: Qingan Yan, Pan Ji, Nitin Bansal, Yuxin Ma, Yuan Tian, Yi Xu

    Abstract: In this paper, we deal with the problem of monocular depth estimation for fisheye cameras in a self-supervised manner. A known issue of self-supervised depth estimation is that it suffers in low-light/over-exposure conditions and in large homogeneous regions. To tackle this issue, we propose a novel ordinal distillation loss that distills the ordinal information from a large teacher model. Such a… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

  31. arXiv:2205.02071  [pdf, other

    cs.CV

    ANUBIS: Skeleton Action Recognition Dataset, Review, and Benchmark

    Authors: Zhenyue Qin, Yang Liu, Madhawa Perera, Tom Gedeon, Pan Ji, Dongwoo Kim, Saeed Anwar

    Abstract: Skeleton-based action recognition, as a subarea of action recognition, is swiftly accumulating attention and popularity. The task is to recognize actions performed by human articulation points. Compared with other data modalities, 3D human skeleton representations have extensive unique desirable characteristics, including succinctness, robustness, racial-impartiality, and many more. We aim to prov… ▽ More

    Submitted 8 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

  32. arXiv:2205.01656  [pdf, other

    cs.CV

    GeoRefine: Self-Supervised Online Depth Refinement for Accurate Dense Mapping

    Authors: Pan Ji, Qingan Yan, Yuxin Ma, Yi Xu

    Abstract: We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences. GeoRefine consists of three modules: a hybrid SLAM module using learning-based priors, an online depth refinement module leveraging self-supervision, and a global mapping module via TSDF fusion. The proposed system is online by design and achieves great ro… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

  33. arXiv:2204.11194  [pdf, other

    cs.DL

    Co-citation and Co-authorship Networks of Statisticians

    Authors: Pengsheng Ji, Jiashun Jin, Zheng Tracy Ke, Wanshan Li

    Abstract: We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows us to construct many different networks, and motivates a number of research problems about the resear… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: 61 pages, 16 figures

  34. arXiv:2203.12082  [pdf, other

    cs.CV

    PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo

    Authors: Jiachen Liu, Pan Ji, Nitin Bansal, Changjiang Cai, Qingan Yan, Xiaolei Huang, Yi Xu

    Abstract: We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses. Most previous learning-based plane reconstruction methods reconstruct 3D planes from single images, which highly rely on single-view regression and suffer from depth scale ambiguity. In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advan… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: CVPR 2022; source code: https://github.com/oppo-us-research/PlaneMVS

  35. arXiv:2203.06318  [pdf, other

    cs.CV

    Deformable VisTR: Spatio temporal deformable attention for video instance segmentation

    Authors: Sudhir Yarram, Jialian Wu, Pan Ji, Yi Xu, Junsong Yuan

    Abstract: Video instance segmentation (VIS) task requires classifying, segmenting, and tracking object instances over all frames in a video clip. Recently, VisTR has been proposed as end-to-end transformer-based VIS framework, while demonstrating state-of-the-art performance. However, VisTR is slow to converge during training, requiring around 1000 GPU hours due to the high computational cost of its transfo… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted to ICASSP 2022

  36. arXiv:2112.09428   

    cs.CV

    Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

    Authors: An Tao, Yueqi Duan, He Wang, Ziyi Wu, Pengliang Ji, Haowen Sun, Jie Zhou, Jiwen Lu

    Abstract: In this paper, we investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. However, this assumption does not hold for many recently proposed networks, e.g. 3D sparse convolution network, which contains input-dependent execut… ▽ More

    Submitted 20 January, 2023; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: We have improved the quality of this work and updated a new version to address the limitations of the proposed method

  37. arXiv:2111.04945  [pdf

    cs.CV cs.GR

    PREMA: Part-based REcurrent Multi-view Aggregation Network for 3D Shape Retrieval

    Authors: Jiongchao Jin, Huanqiang Xu, Pengliang Ji, Zehao Tang, Zhang Xiong

    Abstract: We propose the Part-based Recurrent Multi-view Aggregation network(PREMA) to eliminate the detrimental effects of the practical view defects, such as insufficient view numbers, occlusions or background clutters, and also enhance the discriminative ability of shape representations. Inspired by the fact that human recognize an object mainly by its discriminant parts, we define the multi-view coheren… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: Accepted by ICCSMT 2021

  38. arXiv:2107.12429  [pdf, other

    cs.CV

    MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

    Authors: Pan Ji, Runze Li, Bir Bhanu, Yi Xu

    Abstract: Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart in at least the following two aspects: (i) the depth range of indoor sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues, whereas the maximum distance in outdoor scenes mostly stays the same as the camera usually sees the sk… ▽ More

    Submitted 27 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: ICCV 2021

  39. arXiv:2105.11346  [pdf, other

    cs.LG

    Position-Sensing Graph Neural Networks: Proactively Learning Nodes Relative Positions

    Authors: Zhenyue Qin, Saeed Anwar, Dongwoo Kim, Yang Liu, Pan Ji, Tom Gedeon

    Abstract: Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes.… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

  40. arXiv:2105.04746  [pdf, other

    cs.CV

    Disentangling Noise from Images: A Flow-Based Image Denoising Neural Network

    Authors: Yang Liu, Saeed Anwar, Zhenyue Qin, Pan Ji, Sabrina Caldwell, Tom Gedeon

    Abstract: The prevalent convolutional neural network (CNN) based image denoising methods extract features of images to restore the clean ground truth, achieving high denoising accuracy. However, these methods may ignore the underlying distribution of clean images, inducing distortions or artifacts in denoising results. This paper proposes a new perspective to treat image denoising as a distribution learning… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  41. arXiv:2105.01563  [pdf, other

    cs.CV

    Fusing Higher-order Features in Graph Neural Networks for Skeleton-based Action Recognition

    Authors: Zhenyue Qin, Yang Liu, Pan Ji, Dongwoo Kim, Lei Wang, Bob McKay, Saeed Anwar, Tom Gedeon

    Abstract: Skeleton sequences are lightweight and compact, and thus are ideal candidates for action recognition on edge devices. Recent skeleton-based action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these representations in a graph neural network for feature fusion to boost recognition performance. The use of first- and second-order features, i.e., joint… ▽ More

    Submitted 23 August, 2022; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

  42. arXiv:2104.10546  [pdf, other

    eess.IV cs.CV

    Invertible Denoising Network: A Light Solution for Real Noise Removal

    Authors: Yang Liu, Zhenyue Qin, Saeed Anwar, Pan Ji, Dongwoo Kim, Sabrina Caldwell, Tom Gedeon

    Abstract: Invertible networks have various benefits for image denoising since they are lightweight, information-lossless, and memory-saving during back-propagation. However, applying invertible models to remove noise is challenging because the input is noisy, and the reversed output is clean, following two different distributions. We propose an invertible denoising network, InvDN, to address this challenge.… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

  43. arXiv:2104.00953  [pdf, other

    cs.CV

    Learning Transferable Kinematic Dictionary for 3D Human Pose and Shape Reconstruction

    Authors: Ze Ma, Yifan Yao, Pan Ji, Chao Ma

    Abstract: Estimating 3D human pose and shape from a single image is highly under-constrained. To address this ambiguity, we propose a novel prior, namely kinematic dictionary, which explicitly regularizes the solution space of relative 3D rotations of human joints in the kinematic tree. Integrated with a statistical human model and a deep neural network, our method achieves end-to-end 3D reconstruction with… ▽ More

    Submitted 20 April, 2021; v1 submitted 2 April, 2021; originally announced April 2021.

  44. arXiv:2104.00273  [pdf, other

    cs.RO cs.AI

    Perspective, Survey and Trends: Public Driving Datasets and Toolsets for Autonomous Driving Virtual Test

    Authors: Pengliang Ji, Li Ruan, Yunzhi Xue, Limin Xiao, Qian Dong

    Abstract: Owing to the merits of early safety and reliability guarantee, autonomous driving virtual testing has recently gains increasing attention compared with closed-loop testing in real scenarios. Although the availability and quality of autonomous driving datasets and toolsets are the premise to diagnose the autonomous driving system bottlenecks and improve the system performance, due to the diversity… ▽ More

    Submitted 30 June, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: 6 pages, 4 figures. Accepted to 24th IEEE International Conference on Intelligent Transportation - ITSC2021

  45. arXiv:2011.00774  [pdf, other

    cs.CV

    Set Augmented Triplet Loss for Video Person Re-Identification

    Authors: Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi

    Abstract: Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss. The triplet loss used in video re-ID is usually based on so-called clip features, each aggregated from a few frame features. In this paper, we propose to model the video clip as a set and instead study the distance between sets in the corresponding triplet loss.… ▽ More

    Submitted 6 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: to appear in WACV 2021

  46. arXiv:2010.14851  [pdf, other

    cs.CV

    Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation

    Authors: Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong Li

    Abstract: Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume. However, this mechanism has never been employed for the optical flow task. This is mainly due to the significantly increased search dimension in the case of optical flow computation, ie, a s… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020, 9 pages

  47. arXiv:2010.13242  [pdf, other

    cs.LG cs.CV

    Co-embedding of Nodes and Edges with Graph Neural Networks

    Authors: Xiaodong Jiang, Ronghang Zhu, Pengsheng Ji, Sheng Li

    Abstract: Graph, as an important data representation, is ubiquitous in many real world applications ranging from social network analysis to biology. How to correctly and effectively learn and extract information from graph is essential for a large number of machine learning tasks. Graph embedding is a way to transform and encode the data structure in high dimensional and non-Euclidean feature space to a low… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: This manuscript has been accepted by the IEEE Transactions on Pattern Analysis and Machine Intelligence

  48. arXiv:2010.03108  [pdf, other

    cs.CV cs.LG

    Channel Recurrent Attention Networks for Video Pedestrian Retrieval

    Authors: Pengfei Fang, Pan Ji, Jieming Zhou, Lars Petersson, Mehrtash Harandi

    Abstract: Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks. In this work, we propose a fully attentional network, termed {\it channel recurrent attention network}, for the task of video pedestrian retrieval. The main attention unit, \textit{channel recurrent attention}, identifies attention maps at t… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear in ACCV 2020

  49. arXiv:2008.10436  [pdf, other

    cs.CV

    Cross-Modality 3D Object Detection

    Authors: Ming Zhu, Chao Ma, Pan Ji, Xiaokang Yang

    Abstract: In this paper, we focus on exploring the fusion of images and point clouds for 3D object detection in view of the complementary nature of the two modalities, i.e., images possess more semantic information while point clouds specialize in distance sensing. To this end, we present a novel two-stage multi-modal fusion network for 3D object detection, taking both binocular images and raw point clouds… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: Accepted by WACV 2021

  50. arXiv:2008.03820  [pdf, other

    stat.ML cs.LG cs.SI math.ST

    Spectral Algorithms for Community Detection in Directed Networks

    Authors: Zhe Wang, Yingbin Liang, Pengsheng Ji

    Abstract: Community detection in large social networks is affected by degree heterogeneity of nodes. The D-SCORE algorithm for directed networks was introduced to reduce this effect by taking the element-wise ratios of the singular vectors of the adjacency matrix before clustering. Meaningful results were obtained for the statistician citation network, but rigorous analysis on its performance was missing. F… ▽ More

    Submitted 9 August, 2020; originally announced August 2020.

    Comments: Journal of Machine Learning Research 2020, to appear

    Journal ref: Journal of Machine Learning Research 2020. (153):1-45,