Skip to main content

Showing 1–50 of 70 results for author: Kar, A

  1. arXiv:2406.08292  [pdf, other

    cs.CV

    Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

    Authors: Dongsu Zhang, Francis Williams, Zan Gojcic, Karsten Kreis, Sanja Fidler, Young Min Kim, Amlan Kar

    Abstract: We aim to generate fine-grained 3D geometry from large-scale sparse LiDAR scans, abundantly captured by autonomous vehicles (AV). Contrary to prior work on AV scene completion, we aim to extrapolate fine geometry from unlabeled and beyond spatial limits of LiDAR scans, taking a step towards generating realistic, high-resolution simulation-ready 3D street environments. We propose hierarchical Gener… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024 as highlight

  2. arXiv:2405.18793  [pdf, other

    cs.LG

    Adaptive Discretization-based Non-Episodic Reinforcement Learning in Metric Spaces

    Authors: Avik Kar, Rahul Singh

    Abstract: We study non-episodic Reinforcement Learning for Lipschitz MDPs in which state-action space is a metric space, and the transition kernel and rewards are Lipschitz functions. We develop computationally efficient UCB-based algorithm, $\textit{ZoRL-}ε$ that adaptively discretizes the state-action space and show that their regret as compared with $ε$-optimal policy is bounded as… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 38 pages, 2 figures

  3. arXiv:2404.09591  [pdf, other

    cs.CV

    3D Gaussian Splatting as Markov Chain Monte Carlo

    Authors: Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physic… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2404.08636  [pdf, other

    cs.CV

    Probing the 3D Awareness of Visual Foundation Models

    Authors: Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani

    Abstract: Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also repr… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project page: https://github.com/mbanani/probe3d

  5. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2401.12924  [pdf, other

    stat.ML cs.LG stat.ME

    Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection

    Authors: Ankan Kar, Nirjhar Nath, Utpalraj Kemprai, Aman

    Abstract: This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit prof… ▽ More

    Submitted 7 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 19 pages, 8 figures

    Journal ref: Int. J. Communications, Network and System Sciences, 17, 11-29 (2024)

  7. arXiv:2401.10171  [pdf, other

    cs.CV cs.GR

    SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

    Authors: Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

    Abstract: We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit… ▽ More

    Submitted 29 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Updated supplementary material and acknowledgements

  8. arXiv:2312.04560  [pdf, other

    cs.CV cs.AI cs.GR

    NeRFiller: Completing Scenes via Generative 3D Inpainting

    Authors: Ethan Weber, Aleksander Hołyński, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa

    Abstract: We propose NeRFiller, an approach that completes missing portions of a 3D capture via generative 3D inpainting using off-the-shelf 2D visual generative models. Often parts of a captured 3D scene or object are missing due to mesh reconstruction failures or a lack of observations (e.g., contact regions, such as the bottom of objects, or hard-to-reach areas). We approach this challenging 3D inpaintin… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Project page: https://ethanweber.me/nerfiller

  9. arXiv:2312.00075  [pdf, other

    cs.CV

    Accelerating Neural Field Training via Soft Mining

    Authors: Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: We present an approach to accelerate Neural Field training by efficiently selecting sampling locations. While Neural Fields have recently become popular, it is often trained by uniformly sampling the training domain, or through handcrafted heuristics. We show that improved convergence and final training quality can be achieved by a soft mining technique based on importance sampling: rather than ei… ▽ More

    Submitted 29 November, 2023; originally announced December 2023.

  10. arXiv:2309.04147  [pdf, other

    cs.CV

    Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry

    Authors: Akankshya Kar, Sajal Maheshwari, Shamit Lal, Vinay Sameer Raja Kad

    Abstract: Visual odometry (VO) and SLAM have been using multi-view geometry via local structure from motion for decades. These methods have a slight disadvantage in challenging scenarios such as low-texture images, dynamic scenarios, etc. Meanwhile, use of deep neural networks to extract high level features is ubiquitous in computer vision. For VO, we can use these deep networks to extract depth and pose es… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  11. arXiv:2308.02945  [pdf, other

    cs.AR cs.CR

    RV-CURE: A RISC-V Capability Architecture for Full Memory Safety

    Authors: Yonghae Kim, Anurag Kar, Jaewon Lee, Jaekyu Lee, Hyesoon Kim

    Abstract: Despite decades of efforts to resolve, memory safety violations are still persistent and problematic in modern systems. Various defense mechanisms have been proposed, but their deployment in real systems remains challenging because of performance, security, or compatibility concerns. In this paper, we propose RV-CURE, a RISC-V capability architecture that implements full-system support for full me… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  12. arXiv:2307.07487  [pdf, other

    cs.CV cs.LG

    DreamTeacher: Pretraining Image Backbones with Deep Generative Models

    Authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

    Abstract: In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/DreamTeacher/

  13. arXiv:2306.05410  [pdf, other

    cs.CV

    LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs

    Authors: Zezhou Cheng, Carlos Esteves, Varun Jampani, Abhishek Kar, Subhransu Maji, Ameesh Makadia

    Abstract: A critical obstacle preventing NeRF models from being deployed broadly in the wild is their reliance on accurate camera poses. Consequently, there is growing interest in extending NeRF models to jointly optimize camera poses and scene representation, which offers an alternative to off-the-shelf SfM pipelines which have well-understood failure modes. Existing approaches for unposed NeRF operate und… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Project website: https://people.cs.umass.edu/~zezhoucheng/lu-nerf/

  14. arXiv:2306.01923  [pdf, other

    cs.CV

    The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

    Authors: Saurabh Saxena, Charles Herrmann, Junhwa Hur, Abhishek Kar, Mohammad Norouzi, Deqing Sun, David J. Fleet

    Abstract: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also… ▽ More

    Submitted 5 December, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Oral)

  15. arXiv:2305.16974  [pdf, other

    eess.SY cs.LG

    Finite Time Regret Bounds for Minimum Variance Control of Autoregressive Systems with Exogenous Inputs

    Authors: Rahul Singh, Akshay Mete, Avik Kar, P. R. Kumar

    Abstract: Minimum variance controllers have been employed in a wide-range of industrial applications. A key challenge experienced by many adaptive controllers is their poor empirical performance in the initial stages of learning. In this paper, we address the problem of initializing them so that they provide acceptable transients, and also provide an accompanying finite-time regret analysis, for adaptive mi… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  16. arXiv:2305.15581  [pdf, other

    cs.CV

    Unsupervised Semantic Correspondence Using Stable Diffusion

    Authors: Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi

    Abstract: Text-to-image diffusion models are now capable of generating images that are often indistinguishable from real images. To generate such images, these models must understand the semantics of the objects they are asked to generate. In this work we show that, without any training, one can leverage this semantic knowledge within diffusion models to find semantic correspondences - locations in multiple… ▽ More

    Submitted 23 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Project website: https://github.com/ubc-vision/LDM_correspondences

  17. arXiv:2304.03285  [pdf, other

    cs.CV

    $\text{DC}^2$: Dual-Camera Defocus Control by Learning to Refocus

    Authors: Hadi Alzayer, Abdullah Abuolaim, Leung Chun Chan, Yang Yang, Ying Chen Lou, Jia-Bin Huang, Abhishek Kar

    Abstract: Smartphone cameras today are increasingly approaching the versatility and quality of professional cameras through a combination of hardware and software advancements. However, fixed aperture remains a key limitation, preventing users from controlling the depth of field (DoF) of captured images. At the same time, many smartphones now have multiple cameras with different fixed apertures -- specifica… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR 2023. See the project page at https://defocus-control.github.io

  18. arXiv:2303.16201  [pdf, other

    cs.CV cs.AI cs.LG

    ASIC: Aligning Sparse in-the-wild Image Collections

    Authors: Kamal Gupta, Varun Jampani, Carlos Esteves, Abhinav Shrivastava, Ameesh Makadia, Noah Snavely, Abhishek Kar

    Abstract: We present a method for joint alignment of sparse in-the-wild image collections of an object category. Most prior works assume either ground-truth keypoint annotations or a large dataset of images of a single object category. However, neither of the above assumptions hold true for the long-tail of the objects present in the world. We present a self-supervised technique that directly optimizes on a… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Web: https://kampta.github.io/asic

  19. arXiv:2302.14816  [pdf, other

    cs.CV

    Monocular Depth Estimation using Diffusion Models

    Authors: Saurabh Saxena, Abhishek Kar, Mohammad Norouzi, David J. Fleet

    Abstract: We formulate monocular depth estimation using denoising diffusion models, inspired by their recent successes in high fidelity image generation. To that end, we introduce innovations to address problems arising due to noisy, incomplete depth maps in training data, including step-unrolled denoising diffusion, an $L_1$ loss, and depth infilling during training. To cope with the limited availability o… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  20. Visual deep learning-based explanation for neuritic plaques segmentation in Alzheimer's Disease using weakly annotated whole slide histopathological images

    Authors: Gabriel Jimenez, Anuradha Kar, Mehdi Ounissi, Léa Ingrassia, Susana Boluda, Benoît Delatour, Lev Stimmer, Daniel Racoceanu

    Abstract: Quantifying the distribution and morphology of tau protein structures in brain tissues is key to diagnosing Alzheimer's Disease (AD) and its subtypes. Recently, deep learning (DL) models such as UNet have been successfully used for automatic segmentation of histopathological whole slide images (WSI) of biological tissues. In this study, we propose a DL-based methodology for semantic segmentation o… ▽ More

    Submitted 13 January, 2023; originally announced February 2023.

    Journal ref: Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022, Sep 2022, Singapore (SG), Singapore. pp.336-344

  21. arXiv:2209.13064  [pdf, other

    cs.CV cs.AI cs.LG

    EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

    Authors: Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

    Abstract: We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transf… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 10 pages main, 38 pages appendix. Accepted at NeurIPS 2022 Track on Datasets and Benchmarks Data, code and leaderboards from: http://epic-kitchens.github.io/VISOR

  22. arXiv:2207.11894  [pdf, other

    cs.CV eess.IV

    Sub-Aperture Feature Adaptation in Single Image Super-resolution Model for Light Field Imaging

    Authors: Aupendu Kar, Suresh Nehra, Jayanta Mukhopadhyay, Prabir Kumar Biswas

    Abstract: With the availability of commercial Light Field (LF) cameras, LF imaging has emerged as an up and coming technology in computational photography. However, the spatial resolution is significantly constrained in commercial microlens based LF cameras because of the inherent multiplexing of spatial and angular information. Therefore, it becomes the main bottleneck for other applications of light field… ▽ More

    Submitted 26 July, 2022; v1 submitted 24 July, 2022; originally announced July 2022.

    Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  23. arXiv:2205.15768  [pdf, other

    cs.CV cs.GR cs.LG

    SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections

    Authors: Mark Boss, Andreas Engelhardt, Abhishek Kar, Yuanzhen Li, Deqing Sun, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

    Abstract: Inverse rendering of an object under entirely unknown capture conditions is a fundamental challenge in computer vision and graphics. Neural approaches such as NeRF have achieved photorealistic results on novel view synthesis, but they require known camera poses. Solving this problem with unknown camera poses is highly challenging as it requires joint optimization over shape, radiance, and pose. Th… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  24. arXiv:2204.09171  [pdf, other

    cs.RO cs.CV

    Learned Monocular Depth Priors in Visual-Inertial Initialization

    Authors: Yunwen Zhou, Abhishek Kar, Eric Turner, Adarsh Kowdle, Chao X. Guo, Ryan C. DuToit, Konstantine Tsotsos

    Abstract: Visual-inertial odometry (VIO) is the pose estimation backbone for most AR/VR and autonomous robotic systems today, in both academia and industry. However, these systems are highly sensitive to the initialization of key parameters such as sensor biases, gravity direction, and metric scale. In practical scenarios where high-parallax or variable acceleration assumptions are rarely met (e.g. hovering… ▽ More

    Submitted 1 August, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: to be published in 2022 European Conference on Computer Vision

  25. arXiv:2202.03651  [pdf, other

    cs.CV

    Causal Scene BERT: Improving object detection by searching for challenging groups of data

    Authors: Cinjon Resnick, Or Litany, Amlan Kar, Karsten Kreis, James Lucas, Kyunghyun Cho, Sanja Fidler

    Abstract: Modern computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection. These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process. In building autonomous vehicles (AV), this problem is an especially important challenge because their perce… ▽ More

    Submitted 21 April, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: In submission at JMLR; 0xe5110eA3B5014cd9a585Dc76c74Ee509F504Be14

  26. arXiv:2112.00016  [pdf, other

    hep-th cs.LG math.GT

    Learning knot invariants across dimensions

    Authors: Jessica Craven, Mark Hughes, Vishnu Jejjala, Arjun Kar

    Abstract: We use deep neural networks to machine learn correlations between knot invariants in various dimensions. The three-dimensional invariant of interest is the Jones polynomial $J(q)$, and the four-dimensional invariants are the Khovanov polynomial $\text{Kh}(q,t)$, smooth slice genus $g$, and Rasmussen's $s$-invariant. We find that a two-layer feed-forward neural network can predict $s$ from… ▽ More

    Submitted 21 October, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: v1: 35 pages, 6 figures; v2: 36 pages, 6 figures, figures updated, typos corrected

    Journal ref: SciPost Phys. 14, 021 (2023)

  27. arXiv:2110.03675  [pdf, other

    cs.CV

    ATISS: Autoregressive Transformers for Indoor Scene Synthesis

    Authors: Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, Sanja Fidler

    Abstract: The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

    Comments: To appear in NeurIPS 2021, Project Page: https://nv-tlabs.github.io/ATISS/

  28. arXiv:2109.01068  [pdf, other

    cs.CV cs.GR

    SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting

    Authors: Varun Jampani, Huiwen Chang, Kyle Sargent, Abhishek Kar, Richard Tucker, Michael Krainin, Dominik Kaeser, William T. Freeman, David Salesin, Brian Curless, Ce Liu

    Abstract: Single image 3D photography enables viewers to view a still image from novel viewpoints. Recent approaches combine monocular depth networks with inpainting networks to achieve compelling results. A drawback of these techniques is the use of hard depth layering, making them unable to model intricate appearance details such as thin hair-like structures. We present SLIDE, a modular and unified system… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 (Oral); Project page: https://varunjampani.github.io/slide ; Video: https://www.youtube.com/watch?v=RQio7q-ueY8

  29. arXiv:2104.12690  [pdf, other

    cs.CV

    Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

    Authors: Yuan-Hong Liao, Amlan Kar, Sanja Fidler

    Abstract: Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. While methods that exploit learnt models for labeling exist, a surprising… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 Oral

  30. arXiv:2101.02209  [pdf, other

    hep-th cs.CC quant-ph

    Complexity Growth in Integrable and Chaotic Models

    Authors: Vijay Balasubramanian, Matthew DeCross, Arjun Kar, Cathy Li, Onkar Parrikar

    Abstract: We use the SYK family of models with $N$ Majorana fermions to study the complexity of time evolution, formulated as the shortest geodesic length on the unitary group manifold between the identity and the time evolution operator, in free, integrable, and chaotic systems. Initially, the shortest geodesic follows the time evolution trajectory, and hence complexity grows linearly in time. We study how… ▽ More

    Submitted 26 April, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: 70+13 pages, 29 figures

  31. arXiv:2012.03955  [pdf, other

    hep-th cs.LG math.GT

    Disentangling a Deep Learned Volume Formula

    Authors: Jessica Craven, Vishnu Jejjala, Arjun Kar

    Abstract: We present a simple phenomenological formula which approximates the hyperbolic volume of a knot using only a single evaluation of its Jones polynomial at a root of unity. The average error is just $2.86$% on the first $1.7$ million knots, which represents a large improvement over previous formulas of this kind. To find the approximation formula, we use layer-wise relevance propagation to reverse e… ▽ More

    Submitted 7 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: v1: 26 + 19 pages, 15 figures; v2: 27 + 18 pages, figures updated, references added, journal version

  32. arXiv:2010.04292  [pdf, other

    cs.CL cs.LG cs.SI

    comp-syn: Perceptually Grounded Word Embeddings with Color

    Authors: Bhargav Srinivasa Desikan, Tasker Hull, Ethan O. Nadler, Douglas Guilbeault, Aabir Abubaker Kar, Mark Chu, Donald Ruggiero Lo Sardo

    Abstract: Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches mode… ▽ More

    Submitted 19 October, 2020; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: 9 pages, 3 figures, all code and data available at https://github.com/comp-syn/comp-syn. Forthcoming in the Proceedings of the 28th International Conference on Computational Linguistics

  33. arXiv:2009.00668  [pdf, other

    cs.CV cs.LG eess.IV

    Fed-Sim: Federated Simulation for Medical Imaging

    Authors: Daiqing Li, Amlan Kar, Nishant Ravikumar, Alejandro F Frangi, Sanja Fidler

    Abstract: Labelling data is expensive and time consuming especially for domains such as medical imaging that contain volumetric imaging data and require expert knowledge. Exploiting a larger pool of labeled data available across multiple centers, such as in federated learning, has also seen limited success since current deep learning approaches do not generalize well to images acquired with scanners from di… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: MICCAI 2020 (Early Accept)

  34. arXiv:2008.10719  [pdf, other

    cs.CV cs.LG

    Interactive Annotation of 3D Object Geometry using 2D Scribbles

    Authors: Tianchang Shen, Jun Gao, Amlan Kar, Sanja Fidler

    Abstract: Inferring detailed 3D geometry of the scene is crucial for robotics applications, simulation, and 3D content creation. However, such information is hard to obtain, and thus very few datasets support it. In this paper, we propose an interactive framework for annotating 3D object geometry from both point cloud data and RGB imagery. The key idea behind our approach is to exploit strong priors that hu… ▽ More

    Submitted 25 October, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

    Comments: Accepted to ECCV 2020

  35. arXiv:2008.09092  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

    Authors: Jeevan Devaranjan, Amlan Kar, Sanja Fidler

    Abstract: Procedural models are being widely used to synthesize scenes for graphics, gaming, and to create (labeled) synthetic datasets for ML. In order to produce realistic and diverse scenes, a number of parameters governing the procedural models have to be carefully tuned by experts. These parameters control both the structure of scenes being generated (e.g. how many cars in the scene), as well as parame… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  36. arXiv:2008.01701  [pdf, other

    cs.CV

    Progressive Update Guided Interdependent Networks for Single Image Dehazing

    Authors: Aupendu Kar, Sobhan Kanti Dhara, Debashis Sen, Prabir Kumar Biswas

    Abstract: Images with haze of different varieties often pose a significant challenge to dehazing. Therefore, guidance by estimates of haze parameters related to the variety would be beneficial, and their progressive update jointly with haze reduction will allow effective dehazing. To this end, we propose a multi-network dehazing framework containing novel interdependent dehazing and haze parameter updater n… ▽ More

    Submitted 7 June, 2023; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: First two authors contributed equally. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Project Website: https://aupendu.github.io/progressive-dehaze

  37. arXiv:2005.03795  [pdf

    eess.SP cs.CV cs.HC cs.LG

    MLGaze: Machine Learning-Based Analysis of Gaze Error Patterns in Consumer Eye Tracking Systems

    Authors: Anuradha Kar

    Abstract: Analyzing the gaze accuracy characteristics of an eye tracker is a critical task as its gaze data is frequently affected by non-ideal operating conditions in various consumer eye tracking applications. In this study, gaze error patterns produced by a commercial eye tracking device were studied with the help of machine learning algorithms, such as classifiers and regression models. Gaze data were c… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: https://github.com/anuradhakar49/MLGaze

  38. arXiv:2004.08745  [pdf, other

    cs.CV

    Learning to Evaluate Perception Models Using Planner-Centric Metrics

    Authors: Jonah Philion, Amlan Kar, Sanja Fidler

    Abstract: Variants of accuracy and precision are the gold-standard by which the computer vision community measures progress of perception algorithms. One reason for the ubiquity of these metrics is that they are largely task-agnostic; we in general seek to detect zero false negatives or positives. The downside of these metrics is that, at worst, they penalize all incorrect detections equally without conditi… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 poster

  39. arXiv:2004.05777  [pdf, other

    cs.DC

    Intelligent Orchestration of ADAS Pipelines on Next Generation Automotive Platforms

    Authors: Anirban Ghose, Srijeeta Maity, Arijit Kar, Kaustubh Maloo, Soumyajit Dey

    Abstract: Advanced Driver-Assistance Systems (ADAS) is one of the primary drivers behind increasing levels of autonomy, driving comfort in this age of connected mobility. However, the performance of such systems is a function of execution rate which demands on-board platform-level support. With GPGPU platforms making their way into automobiles, there exists an opportunity to adaptively support high executio… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  40. arXiv:1910.02055  [pdf, other

    cs.CV cs.GR cs.LG

    Neural Turtle Graphics for Modeling City Road Layouts

    Authors: Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

    Abstract: We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represent road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generate… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: ICCV-2019 Oral

  41. arXiv:1905.05765  [pdf, other

    hep-th cs.CC quant-ph

    Quantum Complexity of Time Evolution with Chaotic Hamiltonians

    Authors: Vijay Balasubramanian, Matthew DeCross, Arjun Kar, Onkar Parrikar

    Abstract: We study the quantum complexity of time evolution in large-$N$ chaotic systems, with the SYK model as our main example. This complexity is expected to increase linearly for exponential time prior to saturating at its maximum value, and is related to the length of minimal geodesics on the manifold of unitary operators that act on Hilbert space. Using the Euler-Arnold formalism, we demonstrate that… ▽ More

    Submitted 3 June, 2020; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: 35+11 pages, 13 figures, improved motivation of cost factors, improved discussion of superoperator corrections

  42. arXiv:1905.00889  [pdf, other

    cs.CV cs.GR

    Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

    Authors: Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, Abhishek Kar

    Abstract: We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an… ▽ More

    Submitted 2 May, 2019; originally announced May 2019.

    Comments: SIGGRAPH 2019. Project page with video and code: http://people.eecs.berkeley.edu/~bmild/llff/

  43. arXiv:1904.11621  [pdf, other

    cs.CV cs.AI cs.GR

    Meta-Sim: Learning to Generate Synthetic Datasets

    Authors: Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

    Abstract: Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose Meta-Sim, which learns a generative model of synthetic scenes, and obtain images as well as its corresponding ground-truth via a graphics engine. We parametrize o… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: Webpage: https://nv-tlabs.github.io/meta-sim/

  44. arXiv:1904.07934  [pdf, other

    cs.CV cs.AI

    Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations

    Authors: David Acuna, Amlan Kar, Sanja Fidler

    Abstract: We tackle the problem of semantic boundary prediction, which aims to identify pixels that belong to object(class) boundaries. We notice that relevant datasets consist of a significant level of label noise, reflecting the fact that precise annotations are laborious to get and thus annotators trade-off quality with efficiency. We aim to learn sharp and precise semantic boundaries by explicitly reaso… ▽ More

    Submitted 9 June, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Accepted as a CVPR 2019 oral paper (Project Page: https://nv-tlabs.github.io/STEAL/)

    Journal ref: CVPR 2019

  45. arXiv:1903.09410  [pdf, other

    cs.CV

    Fast Bayesian Uncertainty Estimation and Reduction of Batch Normalized Single Image Super-Resolution Network

    Authors: Aupendu Kar, Prabir Kumar Biswas

    Abstract: Convolutional neural network (CNN) has achieved unprecedented success in image super-resolution tasks in recent years. However, the network's performance depends on the distribution of the training sets and degrades on out-of-distribution samples. This paper adopts a Bayesian approach for estimating uncertainty associated with output and applies it in a deep image super-resolution model to address… ▽ More

    Submitted 19 May, 2021; v1 submitted 22 March, 2019; originally announced March 2019.

    Comments: To appear in the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021)

  46. arXiv:1903.06874  [pdf, other

    cs.CV cs.LG

    Fast Interactive Object Annotation with Curve-GCN

    Authors: Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler

    Abstract: Manually labeling objects by tracing their boundaries is a laborious process. In Polygon-RNN++ the authors proposed Polygon-RNN that produces polygonal annotations in a recurrent manner using a CNN-RNN architecture, allowing interactive correction via humans-in-the-loop. We propose a new framework that alleviates the sequential nature of Polygon-RNN, by predicting all vertices simultaneously using… ▽ More

    Submitted 15 March, 2019; originally announced March 2019.

    Comments: In Computer Vision and Pattern Recognition (CVPR), Long Beach, US, 2019

  47. arXiv:1901.05880  [pdf, other

    cs.CV

    UltraCompression: Framework for High Density Compression of Ultrasound Volumes using Physics Modeling Deep Neural Networks

    Authors: Debarghya China, Francis Tom, Sumanth Nandamuri, Aupendu Kar, Mukundhan Srinivasan, Pabitra Mitra, Debdoot Sheet

    Abstract: Ultrasound image compression by preserving speckle-based key information is a challenging task. In this paper, we introduce an ultrasound image compression framework with the ability to retain realism of speckle appearance despite achieving very high-density compression factors. The compressor employs a tissue segmentation method, transmitting segments along with transducer frequency, number of sa… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

    Comments: To appear in the Proceedings of the 2019 IEEE International Symposium on Biomedical Imaging (ISBI 2019); First three authors contributed equally

  48. arXiv:1901.01971  [pdf, other

    cs.CV

    Learning Independent Object Motion from Unlabelled Stereoscopic Videos

    Authors: Zhe Cao, Abhishek Kar, Christian Haene, Jitendra Malik

    Abstract: We present a system for learning motion of independently moving objects from stereo videos. The only human annotation used in our system are 2D object bounding boxes which introduce the notion of objects to our system. Unlike prior learning based work which has focused on predicting dense pixel-wise optical flow field and/or a depth map for each image, we propose to predict object instance specifi… ▽ More

    Submitted 8 January, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

  49. arXiv:1812.00235  [pdf, other

    cs.CV cs.CL

    Learning to Caption Images through a Lifetime by Asking Questions

    Authors: Kevin Shen, Amlan Kar, Sanja Fidler

    Abstract: In order to bring artificial agents into our lives, we will need to go beyond supervised learning on closed datasets to having the ability to continuously expand knowledge. Inspired by a student learning in a classroom, we present an agent that can continuously learn by posing natural language questions to humans. Our agent is composed of three interacting modules, one that performs captioning, an… ▽ More

    Submitted 21 March, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

    Comments: Fixed typos and added contribution list in intro, results remain the same

  50. arXiv:1806.10890  [pdf, other

    cs.CV

    Efficient CNN Implementation for Eye-Gaze Estimation on Low-Power/Low-Quality Consumer Imaging Systems

    Authors: Joseph Lemley, Anuradha Kar, Alexandru Drimbarean, Peter Corcoran

    Abstract: Accurate and efficient eye gaze estimation is important for emerging consumer electronic systems such as driver monitoring systems and novel user interfaces. Such systems are required to operate reliably in difficult, unconstrained environments with low power consumption and at minimal cost. In this paper a new hardware friendly, convolutional neural network model with minimal computational requir… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.