Skip to main content

Showing 1–50 of 94 results for author: Kira, Z

  1. arXiv:2407.06939  [pdf, other

    cs.RO cs.CV

    Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

    Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma , et al. (20 additional authors not shown)

    Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.17168  [pdf, other

    cs.LG cs.AI cs.RO

    Reinforcement Learning via Auxiliary Task Distillation

    Authors: Abhinav Narayan Harish, Larry Heck, Josiah P. Hanna, Zsolt Kira, Andrew Szot

    Abstract: We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.08488  [pdf, other

    cs.CV cs.AI cs.LG

    ICE-G: Image Conditional Editing of 3D Gaussian Splats

    Authors: Vishnu Jaganathan, Hannah Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira

    Abstract: Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically cor… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR AI4CC Workshop 2024. Project page: https://ice-gaussian.github.io

  4. arXiv:2406.07904  [pdf, other

    cs.LG

    Grounding Multimodal Large Language Models in Actions

    Authors: Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  5. arXiv:2405.05852  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO stat.ML

    Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

    Authors: Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner

    Abstract: Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  6. arXiv:2404.12526  [pdf, other

    cs.LG cs.CL cs.CV

    Adaptive Memory Replay for Continual Learning

    Authors: James Seale Smith, Lazar Valkov, Shaunak Halbe, Vyshnavi Gutta, Rogerio Feris, Zsolt Kira, Leonid Karlinsky

    Abstract: Foundation Models (FMs) have become the hallmark of modern AI, however, these models are trained on massive data, leading to financially expensive training. Updating FMs as new data becomes available is important, however, can lead to `catastrophic forgetting', where models underperform on tasks related to data sub-populations observed too long ago. This continual learning (CL) phenomenon has been… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: CVPR-W 2024 (Spotlight)

  7. arXiv:2404.06609  [pdf, other

    cs.AI cs.RO

    GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

    Authors: Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

    Abstract: The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2404.01300  [pdf, other

    cs.CV cs.AI cs.LG

    NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

    Authors: Muhammad Zubair Irshad, Sergey Zakahrov, Vitor Guizilini, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Neural fields excel in computer vision and robotics due to their ability to understand the 3D visual world such as inferring semantics, geometry, and dynamics. Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations… ▽ More

    Submitted 18 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 29 pages, 13 figures. Project Page: https://nerf-mae.github.io/

  9. arXiv:2403.05815  [pdf, other

    cs.RO

    N-QR: Natural Quick Response Codes for Multi-Robot Instance Correspondence

    Authors: Nathaniel Moore Glaser, Rajashree Ravi, Zsolt Kira

    Abstract: Image correspondence serves as the backbone for many tasks in robotics, such as visual fusion, localization, and mapping. However, existing correspondence methods do not scale to large multi-robot systems, and they struggle when image features are weak, ambiguous, or evolving. In response, we propose Natural Quick Response codes, or N-QR, which enables rapid and reliable correspondence between lar… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2024

  10. arXiv:2401.07770  [pdf, other

    cs.CV

    Seeing the Unseen: Visual Common Sense for Semantic Placement

    Authors: Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs

    Abstract: Computer vision tasks typically involve describing what is present in an image (e.g. classification, detection, segmentation, and captioning). We study a visual common sense task that requires understanding what is not present. Specifically, given an image (e.g. of a living room) and name of an object ("cushion"), a vision system is asked to predict semantically-meaningful regions (masks or boundi… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  11. arXiv:2312.08782  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

    Authors: Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

    Abstract: Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environment… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  12. arXiv:2311.18763  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters

    Authors: James Seale Smith, Yen-Chang Hsu, Zsolt Kira, Yilin Shen, Hongxia Jin

    Abstract: Recent work has demonstrated a remarkable ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential (i.e., continual) manner while only providing a few example images for each concept. This setting is known as continual diffusion. Here, we ask the question: Can we scale these methods to longer concept sequences without forgetting? Although prior work mi… ▽ More

    Submitted 2 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR-W 2024

  13. arXiv:2311.15395  [pdf, other

    cs.LG cs.CV stat.ML

    ConstraintMatch for Semi-constrained Clustering

    Authors: Jann Goschenhofer, Bernd Bischl, Zsolt Kira

    Abstract: Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance. While they perform well even in the absence of the true underlying class labels, constrained clustering models still require large amounts of binary constraint annotations for training. In thi… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

  14. arXiv:2311.04894  [pdf, other

    cs.CV cs.AI cs.LG

    DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets

    Authors: Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet

    Abstract: Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets? The answer lies in learning dataset-specific features and ensembling their knowledge but do all this in a single model. Previous methods achieve this by having separate detection heads on a common backbone but that results in a significant increase in parameters.… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: https://github.com/jinga-lala/DAMEX

  15. arXiv:2310.19182  [pdf, other

    cs.CV

    Fast Trainable Projection for Robust Fine-Tuning

    Authors: Junjiao Tian, Yen-Cheng Liu, James Seale Smith, Zsolt Kira

    Abstract: Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection.… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  16. arXiv:2310.13724  [pdf, other

    cs.HC cs.AI cs.CV cs.GR cs.MA cs.RO

    Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

    Authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander William Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M. Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

    Abstract: We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real h… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: http://aihabitat.org/habitat3

  17. arXiv:2310.12974  [pdf, other

    cs.CV cs.RO

    FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

    Authors: Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira, Muhammad Zubair Irshad

    Abstract: In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data. Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference. While existing self-supervised methods have made strides in this field, they often suffer fr… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Project page: https://fsd6d.github.io

  18. arXiv:2309.16750  [pdf, other

    cs.LG cs.AI math.DS

    Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models

    Authors: Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau

    Abstract: The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associa… ▽ More

    Submitted 28 May, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: 15 pages, 4 figures

  19. arXiv:2308.14596  [pdf, other

    cs.CV cs.LG

    LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration

    Authors: Ran Liu, Sahil Khose, Jingyun Xiao, Lakshmi Sathidevi, Keerthan Ramnath, Zsolt Kira, Eva L. Dyer

    Abstract: Despite significant advances in deep learning, models often struggle to generalize well to new, unseen domains, especially when training data is limited. To address this challenge, we propose a novel approach for distribution-aware latent augmentation that leverages the relationships across samples to guide the augmentation procedure. Our approach first degrades the samples stochastically in the l… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  20. arXiv:2308.12967  [pdf, other

    cs.CV cs.AI cs.LG

    NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus

    Abstract: Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neur… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted to International Conference on Computer Vision (ICCV), 2023. Project page: https://zubair-irshad.github.io/projects/neo360.html

  21. arXiv:2308.12469  [pdf, other

    cs.CV

    Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

    Authors: Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco

    Abstract: Producing quality segmentation masks for images is a fundamental problem in computer vision. Recent research has explored large-scale supervised training to enable zero-shot segmentation on virtually any image style and unsupervised training to enable segmentation without dense annotations. However, constructing a model capable of segmenting anything in a zero-shot manner without any annotations i… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to CVPR2024

    Journal ref: Conference on Computer Vision and Pattern Recognition, 2024

  22. arXiv:2306.11565  [pdf, other

    cs.RO cs.AI cs.CV

    HomeRobot: Open-Vocabulary Mobile Manipulation

    Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

    Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it invol… ▽ More

    Submitted 10 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 37 pages, 22 figures, 8 tables

  23. arXiv:2306.09970  [pdf, other

    cs.CV cs.AI cs.LG

    HePCo: Data-Free Heterogeneous Prompt Consolidation for Continual Federated Learning

    Authors: Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira

    Abstract: In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setu… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  24. arXiv:2306.00087  [pdf, other

    cs.LG cs.MA cs.RO

    Adaptive Coordination in Social Embodied Rearrangement

    Authors: Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai

    Abstract: We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment. In Social Rearrangement, two robots coordinate to complete a long-horizon task, using onboard sensing and egocentric observations, and no privileged information about the environment. We study zero-s… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

  25. arXiv:2305.16295  [pdf, other

    cs.CV cs.AI

    HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

    Authors: Chia-Wen Kuo, Zsolt Kira

    Abstract: A great deal of progress has been made in image captioning, driven by research into how to encode the image using pre-trained models. This includes visual encodings (e.g. image grid features or detected objects) and more recently textual encodings (e.g. image tags or text descriptions of image regions). As more advanced encodings are available and incorporated, it is natural to ask: how to efficie… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Paper accepted in CVPR-23; Project page and code available here: https://sites.google.com/view/chiawen-kuo/home/haav

  26. arXiv:2305.15267  [pdf, other

    cs.LG stat.ML

    Training Energy-Based Normalizing Flow with Score-Matching Objectives

    Authors: Chen-Hao Chao, Wei-Fang Sun, Yen-Chang Hsu, Zsolt Kira, Chun-Yi Lee

    Abstract: In this paper, we establish a connection between the parameterization of flow-based and energy-based generative models, and present a new flow-based modeling approach called energy-based normalizing flow (EBFlow). We demonstrate that by optimizing EBFlow with score-matching objectives, the computation of Jacobian determinants for linear transformations can be entirely bypassed. This feature enable… ▽ More

    Submitted 28 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published at NeurIPS 2023. Code: https://github.com/chen-hao-chao/ebflow

  27. arXiv:2305.10420  [pdf, other

    cs.CV

    CLIP-GCD: Simple Language Guided Generalized Category Discovery

    Authors: Rabah Ouldnoughi, Chia-Wen Kuo, Zsolt Kira

    Abstract: Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. In this paper, we posit that such methods are still prone to poor performance on out-of-distribution categories,… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  28. arXiv:2305.04352  [pdf, other

    cs.RO cs.MA

    We Need to Talk: Identifying and Overcoming Communication-Critical Scenarios for Self-Driving

    Authors: Nathaniel Moore Glaser, Zsolt Kira

    Abstract: In this work, we consider the task of collision-free trajectory planning for connected self-driving vehicles. We specifically consider communication-critical situations--situations where single-agent systems have blindspots that require multi-agent collaboration. To identify such situations, we propose a method which (1) simulates multi-agent perspectives from real self-driving datasets, (2) finds… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: Submitted to ICRA 2023 Workshop on Collaborative Perception

  29. arXiv:2304.10756  [pdf, other

    cs.CV cs.LG

    Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation

    Authors: Harsh Maheshwari, Yen-Cheng Liu, Zsolt Kira

    Abstract: Using multiple spatial modalities has been proven helpful in improving semantic segmentation performance. However, there are several real-world challenges that have yet to be addressed: (a) improving label efficiency and (b) enhancing robustness in realistic scenarios where modalities are missing at the test time. To address these challenges, we first propose a simple yet efficient multi-modal fus… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  30. arXiv:2304.06027  [pdf, other

    cs.CV cs.AI cs.LG

    Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

    Authors: James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin

    Abstract: Recent works demonstrate a remarkable ability to customize text-to-image diffusion models while only providing a few example images. What happens if you try to customize such models using multiple, fine-grained concepts in a sequential (i.e., continual) manner? In our work, we show that recent state-of-the-art customization of text-to-image models suffer from catastrophic forgetting when new conce… ▽ More

    Submitted 2 May, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Transactions on Machine Learning Research (TMLR) 2024

  31. arXiv:2303.16194  [pdf, other

    cs.LG

    BC-IRL: Learning Generalizable Reward Functions from Demonstrations

    Authors: Andrew Szot, Amy Zhang, Dhruv Batra, Zsolt Kira, Franziska Meier

    Abstract: How well do reward functions learned with inverse reinforcement learning (IRL) generalize? We illustrate that state-of-the-art IRL algorithms, which maximize a maximum-entropy objective, learn rewards that overfit to the demonstrations. Such rewards struggle to provide meaningful rewards for states not covered by the demonstrations, a major detriment when using the reward to learn policies in new… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  32. arXiv:2303.10720  [pdf, other

    cs.CV cs.LG

    Trainable Projected Gradient Method for Robust Fine-tuning

    Authors: Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

    Abstract: Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent th… ▽ More

    Submitted 28 March, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR2023

    Journal ref: Conference on Computer Vision and Pattern Recognition 2023

  33. arXiv:2303.07798  [pdf, other

    cs.CV cs.AI

    OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

    Authors: Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

    Abstract: We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules. Such general-purpose methods offer advantages of sim… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 15 pages, 7 figures, 9 tables

  34. arXiv:2303.06080  [pdf, other

    cs.RO cs.CV

    Communication-Critical Planning via Multi-Agent Trajectory Exchange

    Authors: Nathaniel Moore Glaser, Zsolt Kira

    Abstract: This paper addresses the task of joint multi-agent perception and planning, especially as it relates to the real-world challenge of collision-free navigation for connected self-driving vehicles. For this task, several communication-enabled vehicles must navigate through a busy intersection while avoiding collisions with each other and with obstacles. To this end, this paper proposes a learnable co… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted to ICRA 2023

  35. System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

    Authors: Indranil Sur, Zachary Daniels, Abrar Rahman, Kamil Faber, Gianmarco J. Gallardo, Tyler L. Hayes, Cameron E. Taylor, Mustafa Burak Gurbuz, James Smith, Sahana Joshi, Nathalie Japkowicz, Michael Baron, Zsolt Kira, Christopher Kanan, Roberto Corizzo, Ajay Divakaran, Michael Piacentino, Jesse Hostetler, Aswin Raghavan

    Abstract: As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new ta… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: The Second International Conference on AIML Systems, October 12--15, 2022, Bangalore, India

  36. arXiv:2211.13218  [pdf, other

    cs.CV cs.AI cs.LG

    CODA-Prompt: COntinual Decomposed Attention-based Prompting for Rehearsal-Free Continual Learning

    Authors: James Seale Smith, Leonid Karlinsky, Vyshnavi Gutta, Paola Cascante-Bonilla, Donghyun Kim, Assaf Arbelle, Rameswar Panda, Rogerio Feris, Zsolt Kira

    Abstract: Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data. Typical solutions for this continual learning problem require extensive rehearsal of previously seen data, which increases memory costs and may violate data privacy. Recently, the emergence of large-scale pre-trained vision transformer models has e… ▽ More

    Submitted 30 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  37. arXiv:2211.11116  [pdf, other

    cs.CV cs.AI

    Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

    Authors: Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira

    Abstract: In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on. However, the distribution shift between the training images from ImageNet and the views in the navigation environments may render the ImageNet pre-trained image encoder suboptimal. Therefore, in this paper,… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  38. arXiv:2211.09790  [pdf, other

    cs.LG cs.AI cs.CV

    ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

    Authors: James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

    Abstract: Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object… ▽ More

    Submitted 30 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  39. arXiv:2210.03265  [pdf, other

    cs.CV

    Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

    Authors: Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira

    Abstract: Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks while training only a few parameters. Despite their success, most existing methods are proposed in Natural Language Processing tasks with language Transformers, a… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022; Project Page is at https://ycliu93.github.io/projects/polyhistor.html

  40. arXiv:2209.10537  [pdf, other

    cs.LG cs.AI cs.CV

    FedFOR: Stateless Heterogeneous Federated Learning with First-Order Regularization

    Authors: Junjiao Tian, James Seale Smith, Zsolt Kira

    Abstract: Federated Learning (FL) seeks to distribute model training across local clients without collecting data in a centralized data-center, hence removing data-privacy concerns. A major challenge for FL is data heterogeneity (where each client's data distribution can differ) as it can lead to weight divergence among local clients and slow global convergence. The current SOTA FL methods designed for data… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

  41. arXiv:2209.07474  [pdf, other

    cs.CV cs.LG

    On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition

    Authors: Farrukh Rahman, Ömer Mubarek, Zsolt Kira

    Abstract: Recently vision transformers have been shown to be competitive with convolution-based methods (CNNs) broadly across multiple vision tasks. The less restrictive inductive bias of transformers endows greater representational capacity in comparison with CNNs. However, in the image classification setting this flexibility comes with a trade-off with respect to sample efficiency, where transformers requ… ▽ More

    Submitted 25 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022 Workshop on Vision Transformers: Theory and applications (VTTA)

    ACM Class: I.2.10

  42. arXiv:2208.13722  [pdf, other

    cs.CV cs.LG

    Open-Set Semi-Supervised Object Detection

    Authors: Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

    Abstract: Recent developments for Semi-Supervised Object Detection (SSOD) have shown the promise of leveraging unlabeled data to improve an object detector. However, thus far these methods have assumed that the unlabeled data does not contain out-of-distribution (OOD) classes, which is unrealistic with larger-scale unlabeled datasets. In this paper, we consider a more practical yet challenging problem, Open… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: Project Page is at https://ycliu93.github.io/projects/ossod.html

  43. arXiv:2207.13691  [pdf, other

    cs.CV cs.LG cs.RO

    ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization

    Authors: Muhammad Zubair Irshad, Sergey Zakharov, Rares Ambrus, Thomas Kollar, Zsolt Kira, Adrien Gaidon

    Abstract: Our method studies the complex task of object-centric 3D understanding from a single RGB-D observation. As it is an ill-posed problem, existing methods suffer from low performance for both 3D shape and 6D pose and size estimation in complex multi-object scenarios with occlusions. We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estima… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2022

  44. arXiv:2206.09500  [pdf, other

    cs.CV cs.LG

    Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

    Authors: Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

    Abstract: With the recent development of Semi-Supervised Object Detection (SS-OD) techniques, object detectors can be improved by using a limited amount of labeled data and abundant unlabeled data. However, there are still two challenges that are not addressed: (1) there is no prior SS-OD work on anchor-free detectors, and (2) prior works are ineffective when pseudo-labeling bounding box regression. In this… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: Project Page is at http://ycliu93.github.io/projects/unbiasedteacher2.html

  45. arXiv:2206.07932  [pdf, other

    cs.CV cs.LG

    Lifelong Wandering: A realistic few-shot online continual learning setting

    Authors: Mayank Lunayach, James Smith, Zsolt Kira

    Abstract: Online few-shot learning describes a setting where models are trained and evaluated on a stream of data while learning emerging classes. While prior work in this setting has achieved very promising performance on instance classification when learning from data-streams composed of a single indoor environment, we propose to extend this setting to consider object classification on a series of several… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 2022 Workshop on Continual Learning

  46. arXiv:2205.04363  [pdf, other

    cs.CV cs.AI cs.LG

    Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

    Authors: Chia-Wen Kuo, Zsolt Kira

    Abstract: Significant progress has been made on visual captioning, largely relying on pre-trained features and later fixed object detectors that serve as rich inputs to auto-regressive models. A key limitation of such methods, however, is that the output of the model is conditioned only on the object detector's outputs. The assumption that such outputs can represent all necessary information is unrealistic,… ▽ More

    Submitted 7 June, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

    Comments: paper accepted in CVPR 2022

  47. arXiv:2203.17269  [pdf, other

    cs.LG cs.AI cs.CV

    A Closer Look at Rehearsal-Free Continual Learning

    Authors: James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira

    Abstract: Continual learning is a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes which may disappear from the training data for extended periods of time (a phenomenon known as the catastrophic forgetting problem). Current approaches for continual learning of a single expand… ▽ More

    Submitted 3 April, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern (CVPR) Workshop on Continual Learning in Computer Vision (CLVision 2023)

  48. arXiv:2203.10163  [pdf, other

    cs.LG cs.AI cs.CV

    A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

    Authors: Yen-Chang Hsu, James Smith, Yilin Shen, Zsolt Kira, Hongxia Jin

    Abstract: Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. A vast number of methods have been developed for this strategy. While most method designs a more efficient way to facilitate knowledge transfer, less attention has been put on comparing the effect of knowledge sources such as features, logits, and gradients. This work… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  49. arXiv:2203.01929  [pdf, other

    cs.CV cs.LG cs.RO

    CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation

    Authors: Muhammad Zubair Irshad, Thomas Kollar, Michael Laskey, Kevin Stone, Zsolt Kira

    Abstract: This paper studies the complex task of simultaneous multi-object 3D reconstruction, 6D pose and size estimation from a single-view RGB-D observation. In contrast to instance-level pose estimation, we focus on a more challenging problem where CAD models are not available at inference time. Existing approaches mainly follow a complex multi-stage pipeline which first localizes and detects each object… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to ICRA 2022, Project page with videos: https://zubair-irshad.github.io/projects/CenterSnap.html

  50. arXiv:2110.15231  [pdf, other

    cs.LG cs.CV

    Exploring Covariate and Concept Shift for Detection and Calibration of Out-of-Distribution Data

    Authors: Junjiao Tian, Yen-Change Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira

    Abstract: Moving beyond testing on in-distribution data works on Out-of-Distribution (OOD) detection have recently increased in popularity. A recent attempt to categorize OOD data introduces the concept of near and far OOD detection. Specifically, prior works define characteristics of OOD data in terms of detection difficulty. We propose to characterize the spectrum of OOD data using two types of distributi… ▽ More

    Submitted 21 November, 2021; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: A short version of the paper is accepted to NeurIPS DistShift Workshop 2021