Skip to main content

Showing 1–50 of 60 results for author: Ke, L

  1. arXiv:2406.04221  [pdf, other

    cs.CV

    Matching Anything by Segmenting Anything

    Authors: Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu

    Abstract: The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings. We propose MASA, a novel method for robust instance association learning, capable of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight. code at: https://github.com/siyuanliii/masa

  2. arXiv:2405.19307  [pdf, other

    cs.RO

    Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

    Authors: Abhay Deshpande, Liyiming Ke, Quinn Pfeifer, Abhishek Gupta, Siddhartha S. Srinivasa

    Abstract: We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by l… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.02280  [pdf, other

    cs.CV

    DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

    Authors: Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki

    Abstract: View-predictive generative models provide strong priors for lifting object-centric images and videos into 3D and 4D through rendering and score distillation objectives. A question then remains: what about lifting complete multi-object dynamic scenes? There are two challenges in this direction: First, rendering error gradients are often insufficient to recover fast object motion, and second, view p… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Project page: https://dreamscene4d.github.io/

  4. arXiv:2404.13146  [pdf, other

    cs.CR cs.CV

    DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection

    Authors: Yan Ju, Chengzhe Sun, Shan Jia, Shuwei Hou, Zhaofeng Si, Soumyya Kanti Datta, Lipeng Ke, Riky Zhou, Anita Nikolich, Siwei Lyu

    Abstract: Deepfakes, as AI-generated media, have increasingly threatened media integrity and personal privacy with realistic yet fake digital content. In this work, we introduce an open-source and user-friendly online platform, DeepFake-O-Meter v2.0, that integrates state-of-the-art methods for detecting Deepfake images, videos, and audio. Built upon DeepFake-O-Meter v1.0, we have made significant upgrades… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  5. arXiv:2404.08767  [pdf, other

    cs.CV cs.LG

    LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning

    Authors: Junchi Wang, Lei Ke

    Abstract: Understanding human instructions to identify the target objects is vital for perception systems. In recent years, the advancements of Large Language Models (LLMs) have introduced new possibilities for image segmentation. In this work, we delve into reasoning segmentation, a novel task that enables segmentation system to reason and interpret implicit user intention via large language model reasonin… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Github: https://github.com/wangjunchi/LLMSeg

  6. arXiv:2401.01519  [pdf

    cs.LG cs.AI

    Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review

    Authors: Luoma Ke, Song Tong, Peng Cheng, Kaiping Peng

    Abstract: This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration of how LLMs like ChatGPT are transforming psychological research. It discusses t… ▽ More

    Submitted 16 March, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  7. arXiv:2312.00732  [pdf, other

    cs.CV cs.AI

    Gaussian Grouping: Segment and Edit Anything in 3D Scenes

    Authors: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke

    Abstract: The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.… ▽ More

    Submitted 8 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: ECCV 2024. Gaussian Grouping extends Gaussian Splatting to fine-grained open-world 3D scene understanding. Github: https://github.com/lkeab/gaussian-grouping

  8. arXiv:2311.15776  [pdf, other

    cs.CV

    Stable Segment Anything Model

    Authors: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

    Abstract: The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts which, however, often require good skills to specify. To make SAM robust to casual prompts, this paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities, notably imprecise bounding boxes and insufficient points. Our key findin… ▽ More

    Submitted 5 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Smaller file size for the easy access. Codes will be released upon acceptance. https://github.com/fanq15/Stable-SAM

  9. arXiv:2310.12972  [pdf, other

    cs.RO

    CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

    Authors: Liyiming Ke, Yunchu Zhang, Abhay Deshpande, Siddhartha Srinivasa, Abhishek Gupta

    Abstract: We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage… ▽ More

    Submitted 3 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  10. arXiv:2307.11035  [pdf, other

    cs.CV cs.AI

    Cascade-DETR: Delving into High-Quality Universal Object Detection

    Authors: Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

    Abstract: Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to very accurately estimate the object bounding boxes in complex environments. We introduce Cascade-DETR for high-quality universal object detection. W… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted in ICCV 2023. Our code and models will be released at https://github.com/SysCV/cascade-detr

  11. arXiv:2307.01197  [pdf, other

    cs.CV

    Segment Anything Meets Point Tracking

    Authors: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

    Abstract: The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models. While click and brush interactions are both well explored in interactive image segmentation, the existing methods on videos focus on mask annotation and propagation. This paper presents SAM-PT, a novel method for point-cent… ▽ More

    Submitted 3 December, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  12. arXiv:2306.14397  [pdf, other

    cs.SE cs.CY

    Discriminating Human-authored from ChatGPT-Generated Code Via Discernable Feature Analysis

    Authors: Li Ke, Hong Sheng, Fu Cai, Zhang Yunhe, Liu Ming

    Abstract: The ubiquitous adoption of Large Language Generation Models (LLMs) in programming has underscored the importance of differentiating between human-written code and code generated by intelligent models. This paper specifically aims to distinguish code generated by ChatGPT from that authored by humans. Our investigation reveals disparities in programming style, technical level, and readability betwee… ▽ More

    Submitted 4 July, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: 11 pages, 8 figures, 3 tables

  13. arXiv:2306.01567  [pdf, other

    cs.CV

    Segment Anything in High Quality

    Authors: Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

    Abstract: The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurat… ▽ More

    Submitted 23 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. We propose HQ-SAM to upgrade SAM for high-quality zero-shot segmentation. Github: https://github.com/SysCV/SAM-HQ

  14. arXiv:2304.08408  [pdf, other

    cs.CV

    OVTrack: Open-Vocabulary Multiple Object Tracking

    Authors: Siyuan Li, Tobias Fischer, Lei Ke, Henghui Ding, Martin Danelljan, Fisher Yu

    Abstract: The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited t… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: CVPR 2023

  15. arXiv:2303.15904  [pdf, other

    cs.CV cs.AI

    Mask-Free Video Instance Segmentation

    Authors: Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

    Abstract: The recent advancement in Video Instance Segmentation (VIS) has largely been driven by the use of deeper and increasingly data-hungry transformer-based models. However, video masks are tedious and expensive to annotate, limiting the scale and diversity of existing VIS datasets. In this work, we aim to remove the mask-annotation requirement. We propose MaskFreeVIS, achieving highly competitive VIS… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted in CVPR 2023; Code: https://github.com/SysCV/MaskFreeVis; Project page: http://vis.xyz/pub/maskfreevis

  16. arXiv:2303.06182  [pdf, other

    cs.DC cs.AR cs.CL cs.LG

    Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

    Authors: Haiyang Huang, Newsha Ardalani, Anna Sun, Liu Ke, Hsien-Hsin S. Lee, Anjali Sridhar, Shruti Bhosale, Carole-Jean Wu, Benjamin Lee

    Abstract: Mixture-of-Experts (MoE) models have gained popularity in achieving state-of-the-art performance in a wide range of tasks in computer vision and natural language processing. They effectively expand the model capacity while incurring a minimal increase in computation cost during training. However, deploying such models for inference is difficult due to their large size and complex communication pat… ▽ More

    Submitted 17 June, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

  17. arXiv:2303.05508  [pdf, other

    cs.RO

    Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions

    Authors: Yunchu Zhang, Liyiming Ke, Abhay Deshpande, Abhishek Gupta, Siddhartha Srinivasa

    Abstract: Grasping small objects surrounded by unstable or non-rigid material plays a crucial role in applications such as surgery, harvesting, construction, disaster recovery, and assisted feeding. This task is especially difficult when fine manipulation is required in the presence of sensor noise and perception errors; errors inevitably trigger dynamic motion, which is challenging to model precisely. Circ… ▽ More

    Submitted 28 June, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  18. arXiv:2212.06264  [pdf, other

    cs.CE cs.CR cs.DC cs.LG

    Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

    Authors: Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee

    Abstract: Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  19. arXiv:2212.00939  [pdf, other

    cs.DC

    DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation

    Authors: Liu Ke, Xuan Zhang, Benjamin Lee, G. Edward Suh, Hsien-Hsin S. Lee

    Abstract: Deep learning-based personalized recommendation systems are widely used for online user-facing services in production datacenters, where a large amount of hardware resources are procured and managed to reliably provide low-latency services without disruption. As the recommendation models continue to evolve and grow in size, our analysis projects that datacenters deployed with monolithic servers wi… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  20. arXiv:2210.06479  [pdf, other

    cs.RO cs.LG

    Real World Offline Reinforcement Learning with Realistic Data Source

    Authors: Gaoyue Zhou, Liyiming Ke, Siddhartha Srinivasa, Abhinav Gupta, Aravind Rajeswaran, Vikash Kumar

    Abstract: Offline reinforcement learning (ORL) holds great promise for robot learning due to its ability to learn from arbitrary pre-generated experience. However, current ORL benchmarks are almost entirely in simulation and utilize contrived datasets like replay buffers of online RL agents or sub-optimal trajectories, and thus hold limited relevance for real-world robotics. In this work (Real-ORL), we posi… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Project website: https://sites.google.com/view/real-orl

  21. arXiv:2208.04438  [pdf, other

    cs.CV

    Occlusion-Aware Instance Segmentation via BiLayer Network Architectures

    Authors: Lei Ke, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Segmenting highly-overlapping image objects is challenging, because there is typically no distinction between real object contours and occlusion boundaries on images. Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the… ▽ More

    Submitted 10 March, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: Extended version of "Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers", CVPR 2021 (arXiv:2103.12340)

  22. arXiv:2207.14012  [pdf, other

    cs.CV

    Video Mask Transfiner for High-Quality Video Instance Segmentation

    Authors: Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

    Abstract: While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details. Moreover, the predicted segmentations often fluctuate over time, suggesting that temporal consistency cues are neglected or not fully utilized. In this paper, we set out to tackle these issues, with the aim of achieving highly detailed and more… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Project page: https://www.vis.xyz/pub/vmt; Dataset page: https://www.vis.xyz/data/hqvis

  23. arXiv:2205.04321  [pdf, other

    cs.LG

    Evaluating the Fairness Impact of Differentially Private Synthetic Data

    Authors: Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris Tanner, Joshua Allen

    Abstract: Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information. Due to the suppression of underrepresented classes that is often required to achieve privacy, however, it may be in conflict with fairness. We evaluate four DP synthesizers and present empirical results indicating that three of these models frequently degrade fairn… ▽ More

    Submitted 20 June, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

  24. arXiv:2203.13964  [pdf, other

    cs.CV

    Fusing Global and Local Features for Generalized AI-Synthesized Image Detection

    Authors: Yan Ju, Shan Jia, Lipeng Ke, Hongfei Xue, Koki Nagano, Siwei Lyu

    Abstract: With the development of the Generative Adversarial Networks (GANs) and DeepFakes, AI-synthesized images are now of such high quality that humans can hardly distinguish them from real images. It is imperative for media forensics to develop detectors to expose them accurately. Existing detection methods have shown high performance in generated images detection, but they tend to generalize poorly in… ▽ More

    Submitted 22 November, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: 5 pages, 3 figures, 2 tables

  25. arXiv:2203.13487  [pdf, other

    cs.CV

    Compare learning: bi-attention network for few-shot learning

    Authors: Li Ke, Meng Pan, Weigao Wen, Dong Li

    Abstract: Learning with few labeled data is a key challenge for visual recognition, as deep neural networks tend to overfit using a few samples only. One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category, then applying the trained metric to instances from other test set w… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  26. arXiv:2203.07424  [pdf, other

    cs.DC

    Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

    Authors: Liu Ke, Udit Gupta, Mark Hempstead, Carole-Jean Wu, Hsien-Hsin S. Lee, Xuan Zhang

    Abstract: Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  27. arXiv:2202.02314  [pdf, other

    cs.CV

    Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition

    Authors: Lipeng Ke, Kuan-Chuan Peng, Siwei Lyu

    Abstract: Graph Convolutional Networks (GCNs) have been widely used to model the high-order dynamic dependencies for skeleton-based action recognition. Most existing approaches do not explicitly embed the high-order spatio-temporal importance to joints' spatial connection topology and intensity, and they do not have direct objectives on their attention module to jointly learn when and where to focus on in t… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: AAAI 2022

  28. arXiv:2111.13673  [pdf, other

    cs.CV

    Mask Transfiner for High-Quality Instance Segmentation

    Authors: Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

    Abstract: Two-stage and query-based instance segmentation methods have achieved remarkable results. However, their segmented masks are still very coarse. In this paper, we present Mask Transfiner for high-quality and efficient instance segmentation. Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree. Our transformer-based approach onl… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: Project page: http://vis.xyz/pub/transfiner

  29. SiWa: See into Walls via Deep UWB Radar

    Authors: Tianyue Zheng, Zhe Chen, Jun Luo, Lin Ke, Chaoyang Zhao, Yaowen Yang

    Abstract: Being able to see into walls is crucial for diagnostics of building health; it enables inspections of wall structure without undermining the structural integrity. However, existing sensing devices do not seem to offer a full capability in mapping the in-wall structure while identifying their status (e.g., seepage and corrosion). In this paper, we design and implement SiWa as a low-cost and portabl… ▽ More

    Submitted 27 October, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: 14 pages

    Journal ref: MobiCom '21: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking October 2021

  30. arXiv:2109.11735  [pdf, other

    cs.MM

    On the Robustness of "Robust reversible data hiding scheme based on two-layer embedding strategy"

    Authors: Wen Yin, Longfei Ke, Zhaoxia Yin, Jin Tang, Bin Luo

    Abstract: In the paper "Robust reversible data hiding scheme based on two-layer embedding strategy" published in INS recently, Kumar et al. proposed a robust reversible data hiding (RRDH) scheme based on two-layer embedding. Secret data was embedded into the most significant bit (MSB) planes to increase robustness, and a sorting strategy based on local complexity was adopted to reduce distortion. However, K… ▽ More

    Submitted 22 January, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

  31. arXiv:2109.06638  [pdf, other

    cs.CV cs.LG eess.IV

    Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks

    Authors: Bor-Shiun Wang, Jun-Wei Hsieh, Ming-Ching Chang, Ping-Yang Chen, Lipeng Ke, Siwei Lyu

    Abstract: Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features… ▽ More

    Submitted 20 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted by BMVC 2021

  32. arXiv:2108.06765  [pdf, other

    cs.CV

    Occlusion-Aware Video Object Inpainting

    Authors: Lei Ke, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Conventional video inpainting is neither object-oriented nor occlusion-aware, making it liable to obvious artifacts when large occluded object regions are inpainted. This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos given their visible mask segmentation. To facilitate this new research, we construct t… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  33. arXiv:2108.00146  [pdf, other

    cs.CV

    T$_k$ML-AP: Adversarial Attacks to Top-$k$ Multi-Label Learning

    Authors: Shu Hu, Lipeng Ke, Xin Wang, Siwei Lyu

    Abstract: Top-$k$ multi-label learning, which returns the top-$k$ predicted labels from an input, has many practical applications such as image annotation, document analysis, and web search engine. However, the vulnerabilities of such algorithms with regards to dedicated adversarial perturbation attacks have not been extensively studied previously. In this work, we develop methods to create adversarial pert… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Comments: Accepted by International Conference on Computer Vision (ICCV 2021) (14 pages)

  34. arXiv:2106.11958  [pdf, other

    cs.CV

    Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

    Authors: Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

    Abstract: Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal infor… ▽ More

    Submitted 30 November, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021, Spotlight; Our code and video resources are available at http://vis.xyz/pub/pcan

  35. arXiv:2105.06631  [pdf, other

    cs.LG

    Ordering-Based Causal Discovery with Reinforcement Learning

    Authors: Xiaoqiang Wang, Yali Du, Shengyu Zhu, Liangjun Ke, Zhitang Chen, Jianye Hao, Jun Wang

    Abstract: It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scal… ▽ More

    Submitted 15 September, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: Accepted to IJCAI'2021

  36. arXiv:2103.12340  [pdf, other

    cs.CV

    Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

    Authors: Lei Ke, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries. Unlike previous two-stage instance segmentation methods, we model image formation as composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top GCN layer detects the occluding objects (occluder) and the bo… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted by CVPR2021. BCNet Code: https://github.com/lkeab/BCNet

  37. arXiv:2011.06719  [pdf, other

    cs.RO cs.LG

    Grasping with Chopsticks: Combating Covariate Shift in Model-free Imitation Learning for Fine Manipulation

    Authors: Liyiming Ke, Jingqiang Wang, Tapomayukh Bhattacharjee, Byron Boots, Siddhartha Srinivasa

    Abstract: Billions of people use chopsticks, a simple yet versatile tool, for fine manipulation of everyday objects. The small, curved, and slippery tips of chopsticks pose a challenge for picking up small objects, making them a suitably complex test case. This paper leverages human demonstrations to develop an autonomous chopsticks-equipped robotic manipulator. Due to the lack of accurate models for fine m… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Submitted to ICRA 2021

  38. arXiv:2008.00101  [pdf, other

    cs.RO cs.HC

    Telemanipulation with Chopsticks: Analyzing Human Factors in User Demonstrations

    Authors: Liyiming Ke, Ajinkya Kamat, Jingqiang Wang, Tapomayukh Bhattacharjee, Christoforos Mavrogiannis, Siddhartha S. Srinivasa

    Abstract: Chopsticks constitute a simple yet versatile tool that humans have used for thousands of years to perform a variety of challenging tasks ranging from food manipulation to surgery. Applying such a simple tool in a diverse repertoire of scenarios requires significant adaptability. Towards developing autonomous manipulators with comparable adaptability to humans, we study chopsticks-based manipulatio… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: IROS 2020

  39. arXiv:2007.13124  [pdf, other

    cs.CV

    GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

    Authors: Lei Ke, Shichao Li, Yanan Sun, Yu-Wing Tai, Chi-Keung Tang

    Abstract: We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass. Extensive experiments show that our diverse feature extractio… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

    Comments: ECCV 2020

  40. arXiv:2007.12387  [pdf, other

    cs.CV

    Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation

    Authors: Qi Fan, Lei Ke, Wenjie Pei, Chi-Keung Tang, Yu-Wing Tai

    Abstract: Partially supervised instance segmentation aims to perform learning on limited mask-annotated categories of data thus eliminating expensive and exhaustive mask annotation. The learned models are expected to be generalizable to novel categories. Existing methods either learn a transfer function from detection to segmentation, or cluster shape priors for segmenting novel categories. We propose to le… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

    Comments: Accepted by ECCV 2020

  41. Robust adaptive steganography based on dither modulation and modification with re-compression

    Authors: Zhaoxia Yin, Longfei Ke

    Abstract: Traditional adaptive steganography is a technique used for covert communication with high security, but it is invalid in the case of stego images are sent to legal receivers over networks which is lossy, such as JPEG compression of channels. To deal with such problem, robust adaptive steganography is proposed to enable the receiver to extract secret messages from the damaged stego images. Previous… ▽ More

    Submitted 20 March, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Signal and Information Processing over Networks, 2021, 7: 336-345

  42. Cascaded deep monocular 3D human pose estimation with evolutionary training data

    Authors: Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

    Abstract: End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data. This paper proposes a novel data augmentation method that: (1) is scalable for synthesizing massive amount of training data (over 8 million valid 3D human poses with corresponding 2D projections) for traini… ▽ More

    Submitted 8 April, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: Accepted to CVPR 2020 as Oral Presentation

  43. arXiv:2004.12615  [pdf, other

    cs.CV cs.LG cs.MM stat.ML

    Maximum Density Divergence for Domain Adaptation

    Authors: Li Jingjing, Chen Erpeng, Ding Zhengming, Zhu Lei, Lu Ke, Shen Heng Tao

    Abstract: Unsupervised domain adaptation addresses the problem of transferring knowledge from a well-labeled source domain to an unlabeled target domain where the two domains have distinctive data distributions. Thus, the essence of domain adaptation is to mitigate the distribution divergence between the two domains. The state-of-the-art methods practice this very idea by either conducting adversarial train… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Published on IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  44. arXiv:1912.12953  [pdf, other

    cs.DC cs.AR

    RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

    Authors: Liu Ke, Udit Gupta, Carole-Jean Wu, Benjamin Youngjae Cho, Mark Hempstead, Brandon Reagen, Xuan Zhang, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang

    Abstract: Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate per… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  45. arXiv:1911.12815  [pdf, other

    cs.LG cs.AR cs.ET eess.SP

    Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices

    Authors: Weidong Cao, Liu Ke, Ayan Chakrabarti, Xuan Zhang

    Abstract: Recent works propose neural network- (NN-) inspired analog-to-digital converters (NNADCs) and demonstrate their great potentials in many emerging applications. These NNADCs often rely on resistive random-access memory (RRAM) devices to realize the NN operations and require high-precision RRAM cells (6~12-bit) to achieve a moderate quantization resolution (4~8-bit). Such optimistic assumption of RR… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

    Comments: 7 pages, ICCAD 2019

  46. arXiv:1908.11824  [pdf, other

    cs.CV

    Reflective Decoding Network for Image Captioning

    Authors: Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

    Abstract: State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary coherence between words and syntactic paradigm of sentences are also important to generate high-quality image caption. Following the conventional encoder-decoder fra… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  47. arXiv:1908.03761  [pdf, other

    cs.LG cs.MA stat.ML

    Large-Scale Traffic Signal Control Using a Novel Multi-Agent Reinforcement Learning

    Authors: Xiaoqiang Wang, Liangjun Ke, Zhimin Qiao, Xinghua Chai

    Abstract: Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Coo… ▽ More

    Submitted 30 September, 2020; v1 submitted 10 August, 2019; originally announced August 2019.

    Comments: 14 pages, 11 figures

    Journal ref: IEEE transactions on cybernetics 51 (2021), 174-187

  48. arXiv:1906.07441  [pdf, other

    cs.CV

    Locality Preserving Joint Transfer for Domain Adaptation

    Authors: Li Jingjing, Jing Mengmeng, Lu Ke, Zhu Lei, Shen Heng Tao

    Abstract: Domain adaptation aims to leverage knowledge from a well-labeled source domain to a poorly-labeled target domain. A majority of existing works transfer the knowledge at either feature level or sample level. Recent researches reveal that both of the paradigms are essentially important, and optimizing one of them can reinforce the other. Inspired by this, we propose a novel approach to jointly explo… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted to IEEE TIP 2019

  49. arXiv:1905.12888  [pdf, other

    cs.LG cs.IT cs.RO stat.ML

    Imitation Learning as $f$-Divergence Minimization

    Authors: Liyiming Ke, Sanjiban Choudhury, Matt Barnes, Wen Sun, Gilwoo Lee, Siddhartha Srinivasa

    Abstract: We address the problem of imitation learning with multi-modal demonstrations. Instead of attempting to learn all modes, we argue that in many tasks it is sufficient to imitate any one of them. We show that the state-of-the-art methods such as GAIL and behavior cloning, due to their choice of loss function, often incorrectly interpolate between such modes. Our key insight is to minimize the right d… ▽ More

    Submitted 31 May, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: International Workshop on the Algorithmic Foundations of Robotics (WAFR) 2020

  50. arXiv:1905.03966  [pdf, other

    cs.CV

    Memory-Attended Recurrent Network for Video Captioning

    Authors: Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai

    Abstract: Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed. A potential disadvantage of such design is that it cannot capture the multiple visual context information of a word appearing in more than one relevant videos in training data. To tackle this limitation, we propose the Memory-Attended Recurrent Network (MARN) for… ▽ More

    Submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted by CVPR 2019