Skip to main content

Showing 1–23 of 23 results for author: Agrawal, H

  1. arXiv:2406.07904  [pdf, other

    cs.LG

    Grounding Multimodal Large Language Models in Actions

    Authors: Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2310.17722  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models as Generalizable Policies for Embodied Tasks

    Authors: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev

    Abstract: We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  3. Deep learning based projection domain metal segmentation for metal artifact reduction in cone beam computed tomography

    Authors: Harshit Agrawal, Ari Hietanen, Simo Särkkä

    Abstract: Metal artifact correction is a challenging problem in cone beam computed tomography (CBCT) scanning. Metal implants inserted into the anatomy cause severe artifacts in reconstructed images. Widely used inpainting-based metal artifact reduction (MAR) methods require segmentation of metal traces in the projections as a first step, which is a challenging task. One approach is to use a deep learning m… ▽ More

    Submitted 9 October, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Journal ref: in IEEE Access, vol. 11, pp.00371-100382, 2023

  4. arXiv:2207.02726  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

    Authors: Ana Lucic, Sheeraz Ahmad, Amanda Furtado Brinhosa, Vera Liao, Himani Agrawal, Umang Bhatt, Krishnaram Kenthapadi, Alice Xiang, Maarten de Rijke, Nicholas Drabowski

    Abstract: When using medical images for diagnosis, either by clinicians or artificial intelligence (AI) systems, it is important that the images are of high quality. When an image is of low quality, the medical exam that produced the image often needs to be redone. In telemedicine, a common problem is that the quality issue is only flagged once the patient has left the clinic, meaning they must return in or… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML 2022 Workshop on Interpretable ML in Healthcare

  5. arXiv:2206.06666  [pdf, other

    cs.SI nlin.AO physics.soc-ph stat.CO

    Effect of money heterogeneity on resource dependency in complex networks

    Authors: Harshit Agrawal, Ashwin Lahorkar, Snehal M. Shekatkar

    Abstract: Exchange of resources among individual components of a system is fundamental to systems like a social network of humans and a network of cities and villages. For various reasons, the human society has come up with the notion of money as a proxy for the resources. Here we extend the model of resource dependencies in networks that was recently proposed by one of us, by incorporating the concept of m… ▽ More

    Submitted 26 August, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 8 pages, 6 Figures

    Journal ref: Europhys. Lett. 139, 51003 (2022)

  6. arXiv:2205.10712  [pdf, other

    cs.CV

    Housekeep: Tidying Virtual Households using Commonsense Reasoning

    Authors: Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot, Harsh Agrawal

    Abstract: We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be rearranged. Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house. Specifically, w… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  7. arXiv:2204.02960  [pdf, other

    cs.CV cs.AI cs.LG

    Simple and Effective Synthesis of Indoor 3D Scenes

    Authors: Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

    Abstract: We study the problem of synthesizing immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints, including viewpoints that extrapolate far beyond the input images while maintaining 3D consistency. Existing approaches are highly complex, with many separately trained stages and components. We propose a simple alternative: an ima… ▽ More

    Submitted 1 December, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: AAAI 2023

  8. arXiv:2202.13099  [pdf, other

    cs.CV

    Symmetric Convolutional Filters: A Novel Way to Constrain Parameters in CNN

    Authors: Harish Agrawal, Sumana T., S. K. Nandy

    Abstract: We propose a novel technique to constrain parameters in CNN based on symmetric filters. We investigate the impact on SOTA networks when varying the combinations of symmetricity. We demonstrate that our models offer effective generalisation and a structured elimination of redundancy in parameters. We conclude by comparing our method with other pruning techniques.

    Submitted 26 February, 2022; originally announced February 2022.

    Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  9. arXiv:2110.14143  [pdf, other

    cs.CV

    SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

    Authors: Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

    Abstract: Natural language instructions for visual navigation often use scene descriptions (e.g., "bedroom") and object references (e.g., "green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based vision-and-language navigation (VLN) agent that uses two different visual encoders -- a scene classification network and an object detector -- which produce features t… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  10. arXiv:2108.11550  [pdf, other

    cs.CV cs.AI cs.LG

    The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

    Authors: Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alexander Schwing

    Abstract: It is fundamental for personal robots to reliably navigate to a specified goal. To study this task, PointGoal navigation has been introduced in simulated Embodied AI environments. Recent advances solve this PointGoal navigation task with near-perfect accuracy (99.6% success) in photo-realistically simulated environments, assuming noiseless egocentric vision, noiseless actuation, and most important… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  11. arXiv:2010.06087  [pdf, other

    cs.CV

    Contrast and Classify: Training Robust VQA Models

    Authors: Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

    Abstract: Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions. Existing approaches address this by augmenting the dataset with question paraphrases from visual question generation models or adversarial perturbations. These approaches use the combined data to learn an answer classifier by m… ▽ More

    Submitted 18 April, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  12. arXiv:2007.12146  [pdf, other

    cs.CV

    Spatially Aware Multimodal Transformers for TextVQA

    Authors: Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

    Abstract: Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches are limited in their use of spatial relations and rely on fully-connected transformer-like architectures to implicitly learn the spatial structure of a scene. I… ▽ More

    Submitted 22 December, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Accepted at European Conference on Computer Vision, 2020

  13. arXiv:1908.08529  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

    Authors: Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing

    Abstract: Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or part-of-speech tags. Common to all those methods is the fact t… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: Accepted to ICCV 2019

  14. arXiv:1902.03570  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    EvalAI: Towards Better Evaluation Systems for AI Agents

    Authors: Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

    Abstract: We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researcher… ▽ More

    Submitted 10 February, 2019; originally announced February 2019.

  15. arXiv:1812.08658  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    nocaps: novel object captioning at scale

    Authors: Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

    Abstract: Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from… ▽ More

    Submitted 30 September, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Journal ref: IEEE International Conference on Computer Vision (ICCV) 2019

  16. arXiv:1810.11649  [pdf, other

    cs.LG cs.AI cs.CV

    Fabrik: An Online Collaborative Neural Network Editor

    Authors: Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra

    Abstract: We present Fabrik, an online neural network editor that provides tools to visualize, edit, and share neural networks from within a browser. Fabrik provides a simple and intuitive GUI to import neural networks written in popular deep learning frameworks such as Caffe, Keras, and TensorFlow, and allows users to interact with, build, and edit models via simple drag and drop. Fabrik is designed to be… ▽ More

    Submitted 27 October, 2018; originally announced October 2018.

  17. arXiv:1701.07878  [pdf

    cs.NI

    Spectrum Allocation in Cognitive Networks

    Authors: Himanshu Agrawal

    Abstract: Cognitive Network is a technique which is used to improve the spectrum utilization. Current network scenario is experiencing the huge spectrum scarcity problem due to the fixed assignment policy so in this method great amount of spectrum remain unused. To overcome this limitation the spectrum allocation must be in dynamic manner. In this paper the spectrum allocation is discussed thoroughly. Inter… ▽ More

    Submitted 15 December, 2016; originally announced January 2017.

    Comments: 7 pages

  18. arXiv:1612.05865  [pdf

    cs.NI

    New Architecture for Dynamic Spectrum Allocation in Cognitive Heterogeneous Network using Self Organizing Map

    Authors: Himanshu Agrawal, Krishna Asawa

    Abstract: This paper introduces the Hybrid Architecture of Dynamic Spectrum Allocation in the hierarchical network combining centralized and distributed architecture to get optimum allocation of radio resources. It can limit the interference by interacting dynamically and enhance the spectrum efficiency while maintaining the desired QoS in the network. This paper presented dynamic framework for the interact… ▽ More

    Submitted 18 December, 2016; originally announced December 2016.

    Comments: 13 pages, 4 figures

  19. arXiv:1606.07493  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Sort Story: Sorting Jumbled Images and Captions into Stories

    Authors: Harsh Agrawal, Arjun Chandrasekaran, Dhruv Batra, Devi Parikh, Mohit Bansal

    Abstract: Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication. We propose the task of sequencing -- given a jumbled set of aligned image-caption pairs that belong to a story, the task is to sort them such that the output sequence forms a coherent story. We present multiple approaches, via unary (position) and pairwise (order) predictions, a… ▽ More

    Submitted 7 November, 2016; v1 submitted 23 June, 2016; originally announced June 2016.

    Comments: EMNLP 2016

  20. arXiv:1606.05589  [pdf, other

    stat.ML cs.CV

    Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

    Authors: Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

    Abstract: We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate at… ▽ More

    Submitted 17 June, 2016; originally announced June 2016.

    Comments: 5 pages, 4 figures, 3 tables, presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY. arXiv admin note: substantial text overlap with arXiv:1606.03556

  21. arXiv:1606.03556  [pdf, other

    cs.CV cs.CL

    Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

    Authors: Abhishek Das, Harsh Agrawal, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

    Abstract: We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate at… ▽ More

    Submitted 17 June, 2016; v1 submitted 11 June, 2016; originally announced June 2016.

    Comments: 9 pages, 6 figures, 3 tables; Under review at EMNLP 2016

  22. arXiv:1506.04130  [pdf, other

    cs.CV cs.DC

    CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

    Authors: Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra

    Abstract: We are witnessing a proliferation of massive visual data. Unfortunately scaling existing computer vision algorithms to large datasets leaves researchers repeatedly solving the same algorithmic, logistical, and infrastructural problems. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-ar… ▽ More

    Submitted 13 February, 2017; v1 submitted 12 June, 2015; originally announced June 2015.

  23. arXiv:1505.05836  [pdf, other

    cs.CV

    Object-Proposal Evaluation Protocol is 'Gameable'

    Authors: Neelima Chavali, Harsh Agrawal, Aroma Mahendru, Dhruv Batra

    Abstract: Object proposals have quickly become the de-facto pre-processing step in a number of vision pipelines (for object detection, object discovery, and other tasks). Their performance is usually evaluated on partially annotated datasets. In this paper, we argue that the choice of using a partially annotated dataset for evaluation of object proposals is problematic -- as we demonstrate via a thought exp… ▽ More

    Submitted 23 November, 2015; v1 submitted 21 May, 2015; originally announced May 2015.

    Comments: 15 pages, 11 figures, 4 tables