Skip to main content

Showing 1–50 of 50 results for author: Deng, F

  1. arXiv:2406.12272  [pdf, other

    cs.AI

    Slot State Space Models

    Authors: Jindong Jiang, Fei Deng, Gautam Singh, Minseung Lee, Sungjin Ahn

    Abstract: Recent State Space Models (SSMs) such as S4, S5, and Mamba have shown remarkable computational benefits in long-range temporal dependency modeling. However, in many sequence modeling problems, the underlying process is inherently modular and it is of interest to have inductive biases that mimic this modular structure. In this paper, we introduce SlotSSMs, a novel framework for incorporating indepe… ▽ More

    Submitted 30 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.08203  [pdf, other

    eess.AS cs.SD

    LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation

    Authors: Wenhao Guan, Kaidi Wang, Wangjin Zhou, Yang Wang, Feng Deng, Hui Wang, Lin Li, Qingyang Hong, Yong Qin

    Abstract: Recently, the application of diffusion models has facilitated the significant development of speech and audio generation. Nevertheless, the quality of samples generated by diffusion models still needs improvement. And the effectiveness of the method is accompanied by the extensive number of sampling steps, leading to an extended synthesis time necessary for generating high-quality audio. Previous… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech2024

  3. arXiv:2406.06793  [pdf, other

    cs.LG cs.AI

    PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

    Authors: Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

    Abstract: Despite the recent advancements in offline RL, no unified algorithm could achieve superior performance across a broad range of tasks. Offline \textit{value function learning}, in particular, struggles with sparse-reward, long-horizon tasks due to the difficulty of solving credit assignment and extrapolation errors that accumulates as the horizon of the task grows.~On the other hand, models that ca… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  4. EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network

    Authors: Bin Wang, Fei Deng, Peifan Jiang

    Abstract: Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbas… ▽ More

    Submitted 20 May, 2024; v1 submitted 20 March, 2024; originally announced April 2024.

  5. arXiv:2404.09654  [pdf, other

    cs.CV cs.MM

    Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Junran Wu

    Abstract: Large vision-language models (LVLMs) are markedly proficient in deriving visual representations guided by natural language. Recent explorations have utilized LVLMs to tackle zero-shot visual anomaly detection (VAD) challenges by pairing images with textual descriptions indicative of normal and abnormal conditions, referred to as anomaly prompts. However, existing approaches depend on static anomal… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2404.09533  [pdf, other

    cs.CV cs.AI cs.LG

    WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

    Authors: Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Zhixuan Zhang

    Abstract: Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer N… ▽ More

    Submitted 29 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  7. arXiv:2403.11482  [pdf, other

    cs.LG physics.geo-ph

    SeisFusion: Constrained Diffusion Model with Input Guidance for 3D Seismic Data Interpolation and Reconstruction

    Authors: Shuang Wang, Fei Deng, Peifan Jiang, Zishan Gong, Xiaolin Wei, Yuqing Wang

    Abstract: Geographical, physical, or economic constraints often result in missing traces within seismic data, making the reconstruction of complete seismic data a crucial step in seismic data processing. Traditional methods for seismic data reconstruction require the selection of multiple empirical parameters and struggle to handle large-scale continuous missing data. With the development of deep learning,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2402.08714  [pdf, other

    cs.LG cs.AI

    PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

    Authors: Fei Deng, Qifei Wang, Wei Wei, Matthias Grundmann, Tingbo Hou

    Abstract: Reward finetuning has emerged as a promising approach to aligning foundation models with downstream objectives. Remarkable success has been achieved in the language domain by using reinforcement learning (RL) to maximize rewards that reflect human preference. However, in the vision domain, existing RL-based reward finetuning methods are limited by their instability in large-scale training, renderi… ▽ More

    Submitted 27 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project page: https://fdeng18.github.io/prdp

  9. arXiv:2401.05675  [pdf, other

    cs.CV

    Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

    Authors: Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang

    Abstract: Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduc… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  10. arXiv:2401.02644  [pdf, other

    cs.LG cs.AI

    Simple Hierarchical Planning with Diffusion

    Authors: Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

    Abstract: Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  11. METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

    Authors: Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Wenqiao Zhang

    Abstract: Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams.… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  12. arXiv:2312.08760  [pdf, other

    cs.CV

    CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Jie Chen, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. H… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted at the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI24)

  13. arXiv:2309.11042  [pdf, other

    cs.CL cs.AI

    Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

    Authors: Yukang Xie, Chengyu Wang, Junbing Yan, Jiyong Zhou, Feiqi Deng, Jun Huang

    Abstract: Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTur… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  14. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

  15. arXiv:2307.02064  [pdf, other

    cs.LG

    Facing Off World Model Backbones: RNNs, Transformers, and S4

    Authors: Fei Deng, Junyeong Park, Sungjin Ahn

    Abstract: World models are a fundamental component in model-based reinforcement learning (MBRL). To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limi… ▽ More

    Submitted 9 November, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023. Added instantiation with S5. Project page: https://fdeng18.github.io/s4wm

  16. arXiv:2305.05119  [pdf, other

    cs.LG cs.AI

    Flexible Job Shop Scheduling via Dual Attention Network Based Reinforcement Learning

    Authors: Runqing Wang, Gang Wang, Jian Sun, Fang Deng, Jie Chen

    Abstract: Flexible manufacturing has given rise to complex scheduling problems such as the flexible job shop scheduling problem (FJSP). In FJSP, operations can be processed on multiple machines, leading to intricate relationships between operations and machines. Recent works have employed deep reinforcement learning (DRL) to learn priority dispatching rules (PDRs) for solving FJSP. However, the quality of s… ▽ More

    Submitted 17 June, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

  17. arXiv:2304.10831  [pdf, other

    cs.CV cs.AI cs.LG

    Learn to Cluster Faces with Better Subgraphs

    Authors: Yuan Cao, Di Jiang, Guanqun Hou, Fan Deng, Xinjia Chen, Qiang Yang

    Abstract: Face clustering can provide pseudo-labels to the massive unlabeled face data and improve the performance of different face recognition models. The existing clustering methods generally aggregate the features within subgraphs that are often implemented based on a uniform threshold or a learned cutoff position. This may reduce the recall of subgraphs and hence degrade the clustering performance. Thi… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  18. arXiv:2304.08990  [pdf, other

    eess.IV cs.CV

    A Comparison of Image Denoising Methods

    Authors: Zhaoming Kong, Fangxi Deng, Haomin Zhuang, Jun Yu, Lifang He, Xiaowei Yang

    Abstract: The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic represe… ▽ More

    Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiments. arXiv admin note: substantial text overlap with arXiv:2011.03462

  19. arXiv:2304.00601  [pdf, other

    cs.CV cs.LG

    Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

    Authors: Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

    Abstract: Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-ge… ▽ More

    Submitted 8 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted at Generative Models for Computer Vision Workshop 2023

  20. arXiv:2304.00040  [pdf

    cs.LG

    A robust deep learning-based damage identification approach for SHM considering missing data

    Authors: Fan Deng, Xiaoming Tao, Pengxiang Wei, Shiyin Wei

    Abstract: Data-driven method for Structural Health Monitoring (SHM), that mine the hidden structural performance from the correlations among monitored time series data, has received widely concerns recently. However, missing data significantly impacts the conduction of this method. Missing data is a frequently encountered issue in time series data in SHM and many other real-world applications, that harms to… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  21. arXiv:2303.15710  [pdf, other

    cs.CV

    Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks

    Authors: Mingjian Liang, Junjie Hu, Chenyu Bao, Hua Feng, Fuqin Deng, Tin Lun Lam

    Abstract: Recently, RGB-Thermal based perception has shown significant advances. Thermal information provides useful clues when visual cameras suffer from poor lighting conditions, such as low light and fog. However, how to effectively fuse RGB images and thermal data remains an open challenge. Previous works involve naive fusion strategies such as merging them at the input, concatenating multi-modality fea… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  22. arXiv:2303.10834  [pdf, other

    cs.CV cs.LG

    Object-Centric Slot Diffusion

    Authors: Jindong Jiang, Fei Deng, Gautam Singh, Sungjin Ahn

    Abstract: The recent success of transformer-based image generative models in object-centric learning highlights the importance of powerful image generators for handling complex scenes. However, despite the high expressiveness of diffusion models in image generation, their integration into object-centric learning remains largely unexplored in this domain. In this paper, we explore the feasibility and potenti… ▽ More

    Submitted 3 November, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023 as a Spotlight paper

  23. arXiv:2211.16905  [pdf, other

    cs.CV

    Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on Disparity

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable. To address this problem, we propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS, which infers the depth information from the pixel movement between two views. The core of DispMVS is to cons… ▽ More

    Submitted 4 December, 2022; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23)

  24. arXiv:2210.14225  [pdf, other

    cs.CR cs.AI cs.LG

    Flexible Android Malware Detection Model based on Generative Adversarial Networks with Code Tensor

    Authors: Zhao Yang, Fengyang Deng, Linxi Han

    Abstract: The behavior of malware threats is gradually increasing, heightened the need for malware detection. However, existing malware detection methods only target at the existing malicious samples, the detection of fresh malicious code and variants of malicious code is limited. In this paper, we propose a novel scheme that detects malware and its variants efficiently. Based on the idea of the generative… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  25. arXiv:2209.13760  [pdf, other

    cs.RO

    MultiRoboLearn: An open-source Framework for Multi-robot Deep Reinforcement Learning

    Authors: Junfeng Chen, Fuqin Deng, Yuan Gao, Junjie Hu, Xiyue Guo, Guanqi Liang, Tin Lun Lam

    Abstract: It is well known that it is difficult to have a reliable and robust framework to link multi-agent deep reinforcement learning algorithms with practical multi-robot applications. To fill this gap, we propose and build an open-source framework for multi-robot systems called MultiRoboLearn1. This framework builds a unified setup of simulation and real-world applications. It aims to provide standard,… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 6pages, 6 figures

  26. arXiv:2209.06582  [pdf, other

    cs.LG cs.AI

    A Clustering Method Based on Information Entropy Payload

    Authors: Shaodong Deng, Long Sheng, Jiayi Nie, Fuyi Deng

    Abstract: Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy (called entropy payload in… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  27. arXiv:2208.13714  [pdf, other

    cs.CV

    SphereDepth: Panorama Depth Estimation from Spherical Domain

    Authors: Qingsong Yan, Qiang Wang, Kaiyong Zhao, Bo Li, Xiaowen Chu, Fei Deng

    Abstract: The panorama image can simultaneously demonstrate complete information of the surrounding environment and has many advantages in virtual tourism, games, robotics, etc. However, the progress of panorama depth estimation cannot completely solve the problems of distortion and discontinuity caused by the commonly used projection methods. This paper proposes SphereDepth, a novel panorama depth estimati… ▽ More

    Submitted 4 December, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: Conference accept at 3DV 2022

  28. arXiv:2203.02677  [pdf

    cs.RO

    Tightly Coupled Optimization-based GPS-Visual-Inertial Odometry with Online Calibration and Initialization

    Authors: Shihao Han, Feiyang Deng, Tao Li, Hailong Pei

    Abstract: In this paper, we present a tightly coupled optimization-based GPS-Visual-Inertial odometry system to solve the trajectory drift of the visual-inertial odometry especially over long-term runs. Visual reprojection residuals, IMU residuals, and GPS measurement residuals are jointly minimized within a local bundle adjustment, in which we apply GPS measurements and IMU preintegration used for the IMU… ▽ More

    Submitted 5 March, 2022; originally announced March 2022.

    Comments: 7 pages, 10 figures

  29. arXiv:2110.14565  [pdf, other

    cs.LG cs.AI

    DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations

    Authors: Fei Deng, Ingook Jang, Sungjin Ahn

    Abstract: Top-performing Model-Based Reinforcement Learning (MBRL) agents, such as Dreamer, learn the world model by reconstructing the image observations. Hence, they often fail to discard task-irrelevant details and struggle to handle visual distractions. To address this issue, previous work has proposed to contrastively learn the world model, but the performance tends to be inferior in the absence of dis… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

  30. arXiv:2110.11405  [pdf, other

    cs.CV cs.LG

    Illiterate DALL-E Learns to Compose

    Authors: Gautam Singh, Fei Deng, Sungjin Ahn

    Abstract: Although DALL-E has shown an impressive ability of composition-based systematic generalization in image generation, it requires the dataset of text-image pairs and the compositionality is provided by the text. In contrast, object-centric representation models like the Slot Attention model learn composable representations without the text prompt. However, unlike DALL-E its ability to systematically… ▽ More

    Submitted 14 March, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  31. arXiv:2110.09047  [pdf, other

    cs.CV

    Abnormal Occupancy Grid Map Recognition using Attention Network

    Authors: Fuqin Deng, Hua Feng, Mingjian Liang, Qi Feng, Ningbo Yi, Yong Yang, Yuan Gao, Junfeng Chen, Tin Lun Lam

    Abstract: The occupancy grid map is a critical component of autonomous positioning and navigation in the mobile robotic system, as many other systems' performance depends heavily on it. To guarantee the quality of the occupancy grid maps, researchers previously had to perform tedious manual recognition for a long time. This work focuses on automatic abnormal occupancy grid map recognition using the residual… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  32. arXiv:2110.08988  [pdf, other

    cs.CV

    FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation

    Authors: Fuqin Deng, Hua Feng, Mingjian Liang, Hongmin Wang, Yong Yang, Yuan Gao, Junfeng Chen, Junjie Hu, Xiyue Guo, Tin Lun Lam

    Abstract: The RGB-Thermal (RGB-T) information for semantic segmentation has been extensively explored in recent years. However, most existing RGB-T semantic segmentation usually compromises spatial resolution to achieve real-time inference speed, which leads to poor performance. To better extract detail spatial information, we propose a two-stage Feature-Enhanced Attention Network (FEANet) for the RGB-T sem… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: 7 pages, 5 figures

  33. arXiv:2110.00760  [pdf, other

    cs.RO cs.MA

    AB-Mapper: Attention and BicNet Based Multi-agent Path Finding for Dynamic Crowded Environment

    Authors: Huifeng Guan, Yuan Gao, Min Zhao, Yong Yang, Fuqin Deng, Tin Lun Lam

    Abstract: Multi-agent path finding in dynamic crowded environments is of great academic and practical value for multi-robot systems in the real world. To improve the effectiveness and efficiency of communication and learning process during path planning in dynamic crowded environments, we introduce an algorithm called Attention and BicNet based Multi-agent path planning with effective reinforcement (AB-Mapp… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  34. arXiv:2109.13617  [pdf, other

    cs.RO

    Meta Reinforcement Learning Based Sensor Scanning in 3D Uncertain Environments for Heterogeneous Multi-Robot Systems

    Authors: Junfeng Chen, Yuan Gao, Junjie Hu, Fuqin Deng, Tin Lun Lam

    Abstract: We study a novel problem that tackles learning based sensor scanning in 3D and uncertain environments with heterogeneous multi-robot systems. Our motivation is two-fold: first, 3D environments are complex, the use of heterogeneous multi-robot systems intuitively can facilitate sensor scanning by fully taking advantage of sensors with different capabilities. Second, in uncertain environments (e.g.… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 6 pages, 9 figures

  35. arXiv:2109.08839  [pdf, other

    cs.SD cs.CL cs.CV cs.LG eess.AS

    SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification

    Authors: Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen Yang, Ji Liu

    Abstract: Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDN… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: 8 pages, 3 figures, 3 tables. Accepted by ASRU2021

  36. arXiv:2107.06467  [pdf, ps, other

    eess.AS cs.SD

    Multi-Task Audio Source Separation

    Authors: Lu Zhang, Chenxing Li, Feng Deng, Xiaorui Wang

    Abstract: The audio source separation tasks, such as speech enhancement, speech separation, and music source separation, have achieved impressive performance in recent studies. The powerful modeling capabilities of deep neural networks give us hope for more challenging tasks. This paper launches a new multi-task audio source separation (MTASS) challenge to separate the speech, music, and noise signals from… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

  37. arXiv:2104.14837  [pdf, other

    cs.CV

    RobustFusion: Robust Volumetric Performance Reconstruction under Human-object Interactions from Monocular RGBD Stream

    Authors: Zhuo Su, Lan Xu, Dawei Zhong, Zhong Li, Fan Deng, Shuxue Quan, Lu Fang

    Abstract: High-quality 4D reconstruction of human performance with complex interactions to various objects is essential in real-world scenarios, which enables numerous immersive VR/AR applications. However, recent advances still fail to provide reliable performance reconstruction, suffering from challenging interaction patterns and severe occlusions, especially for the monocular setting. To fill this gap, i… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: 16 pages, 18 figures. Under review by IEEE TPAMI

  38. arXiv:2103.11688  [pdf, ps, other

    cs.CG

    Space Mapping of Spline Spaces over Hierarchical T-meshes

    Authors: Jingjing Liu, Fang Deng, Jiansong Deng

    Abstract: In this paper, we construct a bijective mapping between a biquadratic spline space over the hierarchical T-mesh and the piecewise constant space over the corresponding crossing-vertex-relationship graph (CVR graph). We propose a novel structure, by which we offer an effective and easy operative method for constructing the basis functions of the biquadratic spline space. The mapping we construct is… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 31 pages,20 figures

    MSC Class: 65D07 ACM Class: G.1.1

  39. arXiv:2010.09316  [pdf, other

    cs.CV

    A Two-stage Unsupervised Approach for Low light Image Enhancement

    Authors: Junjie Hu, Xiyue Guo, Junfeng Chen, Guanqi Liang, Fuqin Deng, Tin lun Lam

    Abstract: As vision based perception methods are usually built on the normal light assumption, there will be a serious safety issue when deploying them into low light environments. Recently, deep learning based methods have been proposed to enhance low light images by penalizing the pixel-wise loss of low light and normal light images. However, most of them suffer from the following problems: 1) the need of… ▽ More

    Submitted 19 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

  40. Semantic Histogram Based Graph Matching for Real-Time Multi-Robot Global Localization in Large Scale Environment

    Authors: Xiyue Guo, Junjie Hu, Junfeng Chen, Fuqin Deng, Tin Lun Lam

    Abstract: The core problem of visual multi-robot simultaneous localization and mapping (MR-SLAM) is how to efficiently and accurately perform multi-robot global localization (MR-GL). The difficulties are two-fold. The first is the difficulty of global localization for significant viewpoint difference. Appearance-based localization methods tend to fail under large viewpoint changes. Recently, semantic graphs… ▽ More

    Submitted 24 February, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

  41. arXiv:2006.06130  [pdf, other

    cs.LG stat.ML

    ROOTS: Object-Centric Representation and Rendering of 3D Scenes

    Authors: Chang Chen, Fei Deng, Sungjin Ahn

    Abstract: A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations. Recent works achieve object-centric generation but without the ability to infer the representation, or achieve 3D scene representation learning but without object-centric compositionality. Therefore, learning to represent and render 3D scenes with object-centric compositionality… ▽ More

    Submitted 1 July, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: First two authors contributed equally. Accepted in JMLR

  42. arXiv:2001.02407  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

    Authors: Zhixuan Lin, Yi-Fu Wu, Skand Vishwanath Peri, Weihao Sun, Gautam Singh, Fei Deng, Jindong Jiang, Sungjin Ahn

    Abstract: The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we… ▽ More

    Submitted 15 March, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Accepted in ICLR 2020

  43. arXiv:1912.09678  [pdf, other

    cs.CV cs.RO

    IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation

    Authors: Qiang Wang, Shizhen Zheng, Qingsong Yan, Fei Deng, Kaiyong Zhao, Xiaowen Chu

    Abstract: Indoor robotics localization, navigation, and interaction heavily rely on scene understanding and reconstruction. Compared to the monocular vision which usually does not explicitly introduce any geometrical constraint, stereo vision-based schemes are more promising and robust to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models trai… ▽ More

    Submitted 26 March, 2021; v1 submitted 20 December, 2019; originally announced December 2019.

  44. arXiv:1910.09119  [pdf, other

    cs.LG cs.CV stat.ML

    Generative Hierarchical Models for Parts, Objects, and Scenes

    Authors: Fei Deng, Zhuo Zhi, Sungjin Ahn

    Abstract: Compositional structures between parts and objects are inherent in natural scenes. Modeling such compositional hierarchies via unsupervised learning can bring various benefits such as interpretability and transferability, which are important in many downstream tasks. In this paper, we propose the first deep latent variable model, called RICH, for learning Representation of Interpretable Compositio… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

  45. arXiv:1909.12681  [pdf, ps, other

    cs.CL eess.AS

    End-to-End Code-Switching ASR for Low-Resourced Language Pairs

    Authors: Xianghu Yue, Grandee Lee, Emre Yılmaz, Fang Deng, Haizhou Li

    Abstract: Despite the significant progress in end-to-end (E2E) automatic speech recognition (ASR), E2E ASR for low resourced code-switching (CS) speech has not been well studied. In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language. Low-resourcedness in acoustic data hinders the performance of E2E ASR systems… ▽ More

    Submitted 30 September, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: Accepted for publication at IEEE ASRU Workshop 2019

  46. arXiv:1907.01019  [pdf, other

    cs.DC

    Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo

    Authors: Valerio Formicola, Saurabh Jha, Daniel Chen, Fei Deng, Amanda Bonnie, Mike Mason, Jim Brandt, Ann Gentile, Larry Kaplan, Jason Repik, Jeremy Enos, Mike Showerman, Annette Greiner, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Bill Krammer

    Abstract: We present a set of fault injection experiments performed on the ACES (LANL/SNL) Cray XE supercomputer Cielo. We use this experimental campaign to improve the understanding of failure causes and propagation that we observed in the field failure data analysis of NCSA's Blue Waters. We use the data collected from the logs and from network performance counter data 1) to characterize the fault-error-f… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: Presented at Cray User Group 2017

  47. arXiv:1809.03956  [pdf, other

    cs.AI cs.NE

    Abstraction Learning

    Authors: Fei Deng, Jinsheng Ren, Feng Chen

    Abstract: There has been a gap between artificial intelligence and human intelligence. In this paper, we identify three key elements forming human intelligence, and suggest that abstraction learning combines these elements and is thus a way to bridge the gap. Prior researches in artificial intelligence either specify abstraction by human experts, or take abstraction as a qualitative explanation for the mode… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

  48. Similarity Join and Similarity Self-Join Size Estimation in a Streaming Environment

    Authors: Davood Rafiei, Fan Deng

    Abstract: We study the problem of similarity self-join and similarity join size estimation in a streaming setting where the goal is to estimate, in one scan of the input and with sublinear space in the input size, the number of record pairs that have a similarity within a given threshold. The problem has many applications in data cleaning and query plan generation, where the cost of a similarity join may be… ▽ More

    Submitted 1 February, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

    Comments: IEEE Transactions on Knowledge and Data Engineering (to appear)

    Journal ref: IEEE Trans. Knowl. Data Eng. 32(4): 768-781 (2020)

  49. arXiv:1806.01989  [pdf

    eess.SP cs.ET physics.app-ph

    Design of Voltage Pulse Control Module for Free Space Measurement-Device-Independent Quantum Key Distribution

    Authors: Sijie Zhang, Nan Zhou, Fanshui Deng, Hao Liang

    Abstract: Measurement-Device-Independent Quantum Key Distribution (MDIQKD) protocol has been proved that it is unaffected by all hacking attacks, and ensures the security of information theory even when the performance of single-photon detectors is not ideal. Fiber channel has been used by the previous MDIQKD experimental device. However, the signal attenuation increases exponentially as the transmission di… ▽ More

    Submitted 20 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

  50. arXiv:1710.06637  [pdf, other

    cs.SI cs.DL

    Maximum Value Matters: Finding Hot Topics in Scholarly Fields

    Authors: Jinghao Zhao, Hao Wu, Fengyu Deng, Wentian Bao, Wencheng Tang, Luoyi Fu, Xinbing Wang

    Abstract: Finding hot topics in scholarly fields can help researchers to keep up with the latest concepts, trends, and inventions in their field of interest. Due to the rarity of complete large-scale scholarly data, earlier studies target this problem based on manual topic extraction from a limited number of domains, with their focus solely on a single feature such as coauthorship, citation relations, and e… ▽ More

    Submitted 18 October, 2017; originally announced October 2017.

    Comments: 10 pages