Skip to main content

Showing 1–50 of 64 results for author: Hays, J

  1. arXiv:2407.08711  [pdf, other

    cs.CV cs.RO

    OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects

    Authors: Akshay Krishnan, Abhijit Kundu, Kevis-Kokitsi Maninis, James Hays, Matthew Brown

    Abstract: We propose OmniNOCS, a large-scale monocular dataset with 3D Normalized Object Coordinate Space (NOCS) maps, object masks, and 3D bounding box annotations for indoor and outdoor scenes. OmniNOCS has 20 times more object classes and 200 times more instances than existing NOCS datasets (NOCS-Real275, Wild6D). We use OmniNOCS to train a novel, transformer-based monocular NOCS prediction model (NOCSfo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024, project website: https://omninocs.github.io

  2. arXiv:2407.04952  [pdf, other

    cs.CL cs.CV

    Granular Privacy Control for Geolocation with Vision Language Models

    Authors: Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter

    Abstract: Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  3. arXiv:2406.19390  [pdf, other

    cs.CV

    SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

    Authors: John Lambert, Yuguang Li, Ivaylo Boyadzhiev, Lambert Wixson, Manjunath Narayana, Will Hutchcroft, James Hays, Frank Dellaert, Sing Bing Kang

    Abstract: We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360$^\circ$ panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted at ECCV 2022

  4. arXiv:2406.10115  [pdf, other

    cs.CV cs.LG cs.RO

    Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection

    Authors: Mehar Khurana, Neehar Peri, Deva Ramanan, James Hays

    Abstract: State-of-the-art 3D object detectors are often trained on massive labeled datasets. However, annotating 3D bounding boxes remains prohibitively expensive and time-consuming, particularly for LiDAR. Instead, recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. Contemporary methods adapt best-practices for self-supervised… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  5. arXiv:2405.12978  [pdf, other

    cs.CV

    Personalized Residuals for Concept-Driven Text-to-Image Generation

    Authors: Cusuh Ham, Matthew Fisher, James Hays, Nicholas Kolkin, Yuchen Liu, Richard Zhang, Tobias Hinz

    Abstract: We present personalized residuals and localized attention-guided sampling for efficient concept-driven generation using text-to-image diffusion models. Our method first represents concepts by freezing the weights of a pretrained text-conditioned diffusion model and learning low-rank residuals for a small subset of the model's layers. The residual-based approach then directly enables application of… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: CVPR 2024. Project page at https://cusuh.github.io/personalized-residuals

  6. arXiv:2403.04739  [pdf, other

    cs.CV

    I Can't Believe It's Not Scene Flow!

    Authors: Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, James Hays

    Abstract: Current scene flow methods broadly fail to describe motion on small objects, and current scene flow evaluation protocols hide this failure by averaging over many points, with most drawn larger objects. To fix this evaluation failure, we propose a new evaluation protocol, Bucket Normalized EPE, which is class-aware and speed-normalized, enabling contextualized error comparisons between object types… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 13 pages, 3 pages of citations, 2 pages of supplemental

  7. arXiv:2310.12464  [pdf, other

    cs.CV cs.RO

    Lidar Panoptic Segmentation and Tracking without Bells and Whistles

    Authors: Abhinav Agarwalla, Xuhua Huang, Jason Ziglar, Francesco Ferroni, Laura Leal-Taixé, James Hays, Aljoša Ošep, Deva Ramanan

    Abstract: State-of-the-art lidar panoptic segmentation (LPS) methods follow bottom-up segmentation-centric fashion wherein they build upon semantic segmentation networks by utilizing clustering to obtain object instances. In this paper, we re-think this approach and propose a surprisingly simple yet effective detection-centric network for both LPS and tracking. Our network is modular by design and optimized… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: IROS 2023. Code at https://github.com/abhinavagarwalla/most-lps

  8. arXiv:2310.03743  [pdf, other

    cs.RO cs.LG

    The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

    Authors: Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays

    Abstract: How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using… ▽ More

    Submitted 9 May, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: ICRA 2024 camera ready

  9. arXiv:2309.04605  [pdf, other

    cs.DC cs.CY

    Evaluating Total Environmental Impact for a Computing Infrastructure

    Authors: Adrian Jackson, Jon Hays, Alex Owen, Nicholas Walton, Alison Packer, Anish Mudaraddi

    Abstract: In this paper we outline the results of a project to evaluate the total climate/carbon impact of a digital research infrastructure for a defined snapshot period. We outline the carbon model used to calculate the impact and the data collected to quantify that impact for a defined set of resources. We discuss the variation in potential impact across both the active and embodied carbon for computing… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  10. arXiv:2309.01202  [pdf, other

    cs.GR cs.CV cs.MM cs.SD eess.AS

    MAGMA: Music Aligned Generative Motion Autodecoder

    Authors: Sohan Anisetty, Amit Raj, James Hays

    Abstract: Mapping music to dance is a challenging problem that requires spatial and temporal coherence along with a continual synchronization with the music's progression. Taking inspiration from large language models, we introduce a 2-step approach for generating dance using a Vector Quantized-Variational Autoencoder (VQ-VAE) to distill motion into primitives and train a Transformer decoder to learn the co… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  11. arXiv:2308.15268  [pdf, other

    cs.RO

    Collision-Free Inverse Kinematics Through QP Optimization (iKinQP)

    Authors: Julia Ashkanazy, Ariana Spalter, Joe Hays, Laura Hiatt, Roxana Leontie, C. Glen Henshaw

    Abstract: Robotic manipulators are often designed with more actuated degrees-of-freedom than required to fully control an end effector's position and orientation. These "redundant" manipulators can allow infinite joint configurations that satisfy a particular task-space position and orientation, providing more possibilities for the manipulator to traverse a smooth collision-free trajectory. However, finding… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 9 pages, 8 figures, 2 tables

  12. arXiv:2308.09105  [pdf, other

    cs.CV cs.LG

    Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

    Authors: Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui

    Abstract: Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance the performance of lightweight classification models, its application to structured outputs like object detection and instance segmentation remains a complicated… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: ICML 2023

  13. arXiv:2308.04054  [pdf, other

    cs.CV cs.RO

    An Empirical Analysis of Range for 3D Object Detection

    Authors: Neehar Peri, Mengtian Li, Benjamin Wilson, Yu-Xiong Wang, James Hays, Deva Ramanan

    Abstract: LiDAR-based 3D detection plays a vital role in autonomous navigation. Surprisingly, although autonomous vehicles (AVs) must detect both near-field objects (for collision avoidance) and far-field objects (for longer-term planning), contemporary benchmarks focus only on near-field 3D detection. However, AVs must detect far-field objects for safe navigation. In this paper, we present an empirical ana… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023 Workshop - Robustness and Reliability of Autonomous Vehicles in the Open-World

  14. arXiv:2306.01906  [pdf, other

    cs.RO cs.AI cs.LG cs.NE

    Synaptic motor adaptation: A three-factor learning rule for adaptive robotic control in spiking neural networks

    Authors: Samuel Schmidgall, Joe Hays

    Abstract: Legged robots operating in real-world environments must possess the ability to rapidly adapt to unexpected conditions, such as changing terrains and varying payloads. This paper introduces the Synaptic Motor Adaptation (SMA) algorithm, a novel approach to achieving real-time online adaptation in quadruped robots through the utilization of neuroscience-derived rules of synaptic plasticity with thre… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  15. arXiv:2305.10424  [pdf, other

    cs.CV cs.LG

    ZeroFlow: Scalable Scene Flow via Distillation

    Authors: Kyle Vedder, Neehar Peri, Nathaniel Chodosh, Ishan Khatri, Eric Eaton, Dinesh Jayaraman, Yang Liu, Deva Ramanan, James Hays

    Abstract: Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process full-size point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feedforward… ▽ More

    Submitted 14 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ICLR 2024. 9 pages, 4 pages of citations, 6 pages of Supplemental. Project page with data releases is at http://vedder.io/zeroflow.html

  16. arXiv:2304.03280  [pdf, other

    cs.CV

    LANe: Lighting-Aware Neural Fields for Compositional Scene Synthesis

    Authors: Akshay Krishnan, Amit Raj, Xianling Zhang, Alexandra Carlson, Nathan Tseng, Sandhya Sridhar, Nikita Jaipuria, James Hays

    Abstract: Neural fields have recently enjoyed great success in representing and rendering 3D scenes. However, most state-of-the-art implicit representations model static or dynamic scenes as a whole, with minor variations. Existing work on learning disentangled world and object neural fields do not consider the problem of composing objects into different world neural fields in a lighting-aware manner. We pr… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: Project website: https://lane-composition.github.io

  17. arXiv:2302.12764  [pdf, other

    cs.CV

    Modulating Pretrained Diffusion Models for Multimodal Image Synthesis

    Authors: Cusuh Ham, James Hays, Jingwan Lu, Krishna Kumar Singh, Zhifei Zhang, Tobias Hinz

    Abstract: We present multimodal conditioning modules (MCM) for enabling conditional image synthesis using pretrained diffusion models. Previous multimodal synthesis works rely on training networks from scratch or fine-tuning pretrained networks, both of which are computationally expensive for large, state-of-the-art diffusion models. Our method uses pretrained networks but \textit{does not require any updat… ▽ More

    Submitted 18 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: SIGGRAPH Conference Proceedings 2023. Project page at https://mcm-diffusion.github.io

  18. arXiv:2301.02310  [pdf, other

    cs.CV

    PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images

    Authors: Patrick Grady, Jeremy A. Collins, Chengcheng Tang, Christopher D. Twigg, Kunal Aneja, James Hays, Charles C. Kemp

    Abstract: Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We pres… ▽ More

    Submitted 3 January, 2024; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: WACV 2024

  19. arXiv:2301.00493  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    Authors: Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, Deva Ramanan, Peter Carr, James Hays

    Abstract: We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

  20. arXiv:2212.07312  [pdf, other

    cs.CV

    Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection

    Authors: John Lambert, James Hays

    Abstract: High-definition (HD) map change detection is the task of determining when sensor data and map data are no longer in agreement with one another due to real-world changes. We collect the first dataset for the task, which we entitle the Trust, but Verify (TbV) dataset, by mining thousands of hours of data from over 9 months of autonomous vehicle fleet operations. We present learning-based formulation… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2021, Track on Datasets and Benchmarks. Project page: https://tbv-dataset.github.io/

  21. arXiv:2211.13858  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Far3Det: Towards Far-Field 3D Detection

    Authors: Shubham Gupta, Jeet Kanjani, Mengtian Li, Francesco Ferroni, James Hays, Deva Ramanan, Shu Kong

    Abstract: We focus on the task of far-field 3D detection (Far3Det) of objects beyond a certain distance from an observer, e.g., $>$50m. Far3Det is particularly important for autonomous vehicles (AVs) operating at highway speeds, which require detections of far-field obstacles to ensure sufficient braking distances. However, contemporary AV benchmarks such as nuScenes underemphasize this problem because they… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: WACV 2023 12 Pages, 8 Figures, 10 Tables

  22. arXiv:2211.04625  [pdf, other

    cs.CV

    Soft Augmentation for Image Classification

    Authors: Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan

    Abstract: Modern neural networks are over-parameterized and thus rely on strong regularization such as data augmentation and weight decay to reduce overfitting and improve generalization. The dominant form of data augmentation applies invariant transforms, where the learning target of a sample is invariant to the transform applied to that sample. We draw inspiration from human visual classification studies… ▽ More

    Submitted 23 January, 2024; v1 submitted 8 November, 2022; originally announced November 2022.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 16241-16250)

  23. arXiv:2208.03354  [pdf, other

    cs.CV cs.LG

    A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

    Authors: Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, James Hays

    Abstract: We address the problem of retrieving images with both a sketch and a text query. We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for image retrieval using a text description and a sketch as input. We argue that both input modalities complement each other in a manner that cannot be achieved easily by either one alone. TASK-former follows the late-fusion dual-enco… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    Comments: ECCV 2022

  24. arXiv:2206.12520  [pdf, other

    cs.NE cs.LG

    Learning to learn online with neuromodulated synaptic plasticity in spiking neural networks

    Authors: Samuel Schmidgall, Joe Hays

    Abstract: We propose that in order to harness our understanding of neuroscience toward machine learning, we must first have powerful tools for training brain-like models of learning. Although substantial progress has been made toward understanding the dynamics of learning in the brain, neuroscience-derived models of learning have yet to demonstrate the same performance capabilities as methods in deep learni… ▽ More

    Submitted 27 June, 2022; v1 submitted 24 June, 2022; originally announced June 2022.

  25. arXiv:2204.07268  [pdf, other

    cs.RO

    Visual Pressure Estimation and Control for Soft Robotic Grippers

    Authors: Patrick Grady, Jeremy A. Collins, Samarth Brahmbhatt, Christopher D. Twigg, Chengcheng Tang, James Hays, Charles C. Kemp

    Abstract: Soft robotic grippers facilitate contact-rich manipulation, including robust grasping of varied objects. Yet the beneficial compliance of a soft gripper also results in significant deformation that can make precision manipulation challenging. We present visual pressure estimation & control (VPEC), a method that infers pressure applied by a soft gripper using an RGB image from an external camera. W… ▽ More

    Submitted 9 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: IROS 2022

  26. arXiv:2203.15798  [pdf, other

    cs.CV

    DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance Fields for Articulated Avatars

    Authors: Amit Raj, Umar Iqbal, Koki Nagano, Sameh Khamis, Pavlo Molchanov, James Hays, Jan Kautz

    Abstract: Acquisition and creation of digital human avatars is an important problem with applications to virtual telepresence, gaming, and human modeling. Most contemporary approaches for avatar generation can be viewed either as 3D-based methods, which use multi-view data to learn a 3D representation with appearance (such as a mesh, implicit surface, or volume), or 2D-based methods which learn photo-realis… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Project page at https://dracon-avatars.github.io/

  27. arXiv:2203.10385  [pdf, other

    cs.CV

    PressureVision: Estimating Hand Pressure from a Single RGB Image

    Authors: Patrick Grady, Chengcheng Tang, Samarth Brahmbhatt, Christopher D. Twigg, Chengde Wan, James Hays, Charles C. Kemp

    Abstract: People often interact with their surroundings by applying pressure with their hands. While hand pressure can be measured by placing pressure sensors between the hand and the environment, doing so can alter contact mechanics, interfere with human tactile perception, require costly sensors, and scale poorly to large environments. We explore the possibility of using a conventional RGB camera to infer… ▽ More

    Submitted 30 September, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 oral

  28. arXiv:2203.09554  [pdf, other

    cs.CV

    CoGS: Controllable Generation and Search from Sketch and Style

    Authors: Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, John Collomosse

    Abstract: We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar "style" conditioning image t… ▽ More

    Submitted 20 July, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

  29. arXiv:2112.13762  [pdf, other

    cs.CV

    MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

    Authors: John Lambert, Zhuang Liu, Ozan Sener, James Hays, Vladlen Koltun

    Abstract: We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. A naive merge of the constituent datasets yields poor performance due to inconsistent taxonomies and annotation practices. We reconcile the taxonomies and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images, requiring more tha… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  30. arXiv:2111.09930  [pdf, other

    cs.LG

    Learning To Estimate Regions Of Attraction Of Autonomous Dynamical Systems Using Physics-Informed Neural Networks

    Authors: Cody Scharzenberger, Joe Hays

    Abstract: When learning to perform motor tasks in a simulated environment, neural networks must be allowed to explore their action space to discover new potentially viable solutions. However, in an online learning scenario with physical hardware, this exploration must be constrained by relevant safety considerations in order to avoid damage to the agent's hardware and environment. We aim to address this pro… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

    Comments: 31 pages, 17 figures

  31. arXiv:2111.04113  [pdf, other

    cs.NE

    Stable Lifelong Learning: Spiking neurons as a solution to instability in plastic neural networks

    Authors: Samuel Schmidgall, Joe Hays

    Abstract: Synaptic plasticity poses itself as a powerful method of self-regulated unsupervised learning in neural networks. A recent resurgence of interest has developed in utilizing Artificial Neural Networks (ANNs) together with synaptic plasticity for intra-lifetime learning. Plasticity has been shown to improve the learning capabilities of these networks in generalizing to novel environmental circumstan… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  32. arXiv:2109.08057  [pdf, other

    cs.NE

    Evolutionary Self-Replication as a Mechanism for Producing Artificial Intelligence

    Authors: Samuel Schmidgall, Joseph Hays

    Abstract: Can reproduction alone in the context of survival produce intelligence in our machines? In this work, self-replication is explored as a mechanism for the emergence of intelligent behavior in modern learning environments. By focusing purely on survival, while undergoing natural selection, evolved organisms are shown to produce meaningful, complex, and intelligent behavior, demonstrating creative so… ▽ More

    Submitted 23 September, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

  33. SpikePropamine: Differentiable Plasticity in Spiking Neural Networks

    Authors: Samuel Schmidgall, Julia Ashkanazy, Wallace Lawson, Joe Hays

    Abstract: The adaptive changes in synaptic efficacy that occur between spiking neurons have been demonstrated to play a critical role in learning for biological neural networks. Despite this source of inspiration, many learning focused applications using Spiking Neural Networks (SNNs) retain static synaptic connections, preventing additional learning after the initial training period. Here, we introduce a f… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Journal ref: Frontiers in Neurorobotics, 22 September 2021

  34. arXiv:2101.02697  [pdf, other

    cs.CV

    PVA: Pixel-aligned Volumetric Avatars

    Authors: Amit Raj, Michael Zollhoefer, Tomas Simon, Jason Saragih, Shunsuke Saito, James Hays, Stephen Lombardi

    Abstract: Acquisition and rendering of photo-realistic human heads is a highly challenging research problem of particular importance for virtual telepresence. Currently, the highest quality is achieved by volumetric approaches trained in a person specific manner on multi-view data. These models better represent fine structure, such as hair, compared to simpler mesh-based models. Volumetric models typically… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

    Comments: Project page located at https://volumetric-avatars.github.io/

  35. arXiv:2012.12890  [pdf, other

    cs.CV

    ANR: Articulated Neural Rendering for Virtual Avatars

    Authors: Amit Raj, Julian Tanke, James Hays, Minh Vo, Carsten Stoll, Christoph Lassner

    Abstract: The combination of traditional rendering with neural networks in Deferred Neural Rendering (DNR) provides a compelling balance between computational complexity and realism of the resulting images. Using skinned meshes for rendering articulating objects is a natural extension for the DNR framework and would open it up to a plethora of applications. However, in this case the neural shading step must… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

  36. arXiv:2011.00320  [pdf, other

    cs.CV cs.LG cs.RO

    Scene Flow from Point Clouds with or without Learning

    Authors: Jhony Kaesemodel Pontes, James Hays, Simon Lucey

    Abstract: Scene flow is the three-dimensional (3D) motion field of a scene. It provides information about the spatial arrangement and rate of change of objects in dynamic environments. Current learning-based approaches seek to estimate the scene flow directly from point clouds and have achieved state-of-the-art performance. However, supervised learning methods are inherently domain specific and require a la… ▽ More

    Submitted 31 October, 2020; originally announced November 2020.

    Comments: International Conference on 3D Vision (3DV 2020)

  37. arXiv:2008.10592  [pdf, other

    cs.CV cs.AI cs.LG

    3D for Free: Crossmodal Transfer Learning using HD Maps

    Authors: Benjamin Wilson, Zsolt Kira, James Hays

    Abstract: 3D object detection is a core perceptual challenge for robotics and autonomous driving. However, the class-taxonomies in modern autonomous driving datasets are significantly smaller than many influential 2D detection datasets. In this work, we address the long-tail problem by leveraging both the large class-taxonomies of modern 2D datasets and the robustness of state-of-the-art 2D detection method… ▽ More

    Submitted 24 August, 2020; originally announced August 2020.

  38. arXiv:2008.08115  [pdf, other

    cs.CV

    TIDE: A General Toolbox for Identifying Object Detection Errors

    Authors: Daniel Bolya, Sean Foley, James Hays, Judy Hoffman

    Abstract: We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms. Importantly, our framework is applicable across datasets and can be applied directly to output prediction files without required knowledge of the underlying prediction system. Thus, our framework can be used as a drop-in replacement for the standard mAP… ▽ More

    Submitted 31 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Updated LVIS results with the v1.0.1 error calculation

  39. arXiv:2007.09545  [pdf, other

    cs.CV

    ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

    Authors: Samarth Brahmbhatt, Chengcheng Tang, Christopher D. Twigg, Charles C. Kemp, James Hays

    Abstract: Grasping is natural for humans. However, it involves complex hand configurations and soft tissue deformation that can result in complicated regions of contact between the hand and the object. Understanding and modeling this contact can potentially improve hand models, AR/VR experiences, and robotic grasping. Yet, we currently lack datasets of hand-object contact paired with other data modalities,… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: The European Conference on Computer Vision (ECCV) 2020

  40. arXiv:1911.02620  [pdf, other

    cs.CV cs.RO

    Argoverse: 3D Tracking and Forecasting with Rich Maps

    Authors: Ming-Fang Chang, John Lambert, Patsorn Sangkloy, Jagjeet Singh, Slawomir Bak, Andrew Hartnett, De Wang, Peter Carr, Simon Lucey, Deva Ramanan, James Hays

    Abstract: We present Argoverse -- two datasets designed to support autonomous vehicle machine learning tasks such as 3D tracking and motion forecasting. Argoverse was collected by a fleet of autonomous vehicles in Pittsburgh and Miami. The Argoverse 3D Tracking dataset includes 360 degree images from 7 cameras with overlapping fields of view, 3D point clouds from long range LiDAR, 6-DOF pose, and 3D track a… ▽ More

    Submitted 6 November, 2019; originally announced November 2019.

    Comments: CVPR 2019

  41. arXiv:1907.07388  [pdf, other

    cs.CV

    Towards Markerless Grasp Capture

    Authors: Samarth Brahmbhatt, Charles C. Kemp, James Hays

    Abstract: Humans excel at grasping objects and manipulating them. Capturing human grasps is important for understanding grasping behavior and reconstructing it realistically in Virtual Reality (VR). However, grasp capture - capturing the pose of a hand grasping an object, and orienting it w.r.t. the object - is difficult because of the complexity and diversity of the human hand, and occlusion. Reflective ma… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Third Workshop on Computer Vision for AR/VR, CVPR 2019

  42. arXiv:1905.05882  [pdf, other

    cs.LG cs.CV stat.ML

    Kernel Mean Matching for Content Addressability of GANs

    Authors: Wittawat Jitkrittum, Patsorn Sangkloy, Muhammad Waleed Gondal, Amit Raj, James Hays, Bernhard Schölkopf

    Abstract: We propose a novel procedure which adds "content-addressability" to any given unconditional implicit model e.g., a generative adversarial network (GAN). The procedure allows users to control the generative process by specifying a set (arbitrary size) of desired examples based on which similar samples are generated from the model. The proposed approach, based on kernel mean matching, is applicable… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: Wittawat Jitkrittum and Patsorn Sangkloy contributed equally to this work

  43. arXiv:1904.06830  [pdf, other

    cs.CV

    ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging

    Authors: Samarth Brahmbhatt, Cusuh Ham, Charles C. Kemp, James Hays

    Abstract: Grasping and manipulating objects is an important human skill. Since hand-object contact is fundamental to grasping, capturing it can lead to important insights. However, observing contact through external sensors is challenging because of occlusion and the complexity of the human hand. We present ContactDB, a novel dataset of contact maps for household objects that captures the rich hand-object c… ▽ More

    Submitted 14 April, 2019; originally announced April 2019.

    Comments: CVPR 2019 Oral

  44. arXiv:1904.03754  [pdf, other

    cs.RO cs.CV

    ContactGrasp: Functional Multi-finger Grasp Synthesis from Contact

    Authors: Samarth Brahmbhatt, Ankur Handa, James Hays, Dieter Fox

    Abstract: Grasping and manipulating objects is an important human skill. Since most objects are designed to be manipulated by human hands, anthropomorphic hands can enable richer human-robot interaction. Desirable grasps are not only stable, but also functional: they enable post-grasp actions with the object. However, functional grasp synthesis for high degree-of-freedom anthropomorphic hands from object sh… ▽ More

    Submitted 25 July, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: IROS 2019 camera ready version

  45. arXiv:1903.00793  [pdf, other

    cs.CV

    Let's Transfer Transformations of Shared Semantic Representations

    Authors: Nam Vo, Lu Jiang, James Hays

    Abstract: With a good image understanding capability, can we manipulate the images high level semantic representation? Such transformation operation can be used to generate or retrieve similar images but with a desired modification (for example changing beach background to street background); similar ability has been demonstrated in zero shot learning, attribute composition and attribute manipulation image… ▽ More

    Submitted 2 March, 2019; originally announced March 2019.

  46. arXiv:1812.07119  [pdf, other

    cs.CV

    Composing Text and Image for Image Retrieval - An Empirical Odyssey

    Authors: Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays

    Abstract: In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel tower, and ask the system to find images which are visually similar but are modified in small ways, such as being taken at nighttime instead of during the day. To ta… ▽ More

    Submitted 17 December, 2018; originally announced December 2018.

  47. arXiv:1810.11630  [pdf, other

    stat.ML cs.LG

    Informative Features for Model Comparison

    Authors: Wittawat Jitkrittum, Heishiro Kanagawa, Patsorn Sangkloy, James Hays, Bernhard Schölkopf, Arthur Gretton

    Abstract: Given two candidate models, and a set of target observations, we address the problem of measuring the relative goodness of fit of the two models. We propose two new statistical tests which are nonparametric, computationally efficient (runtime complexity is linear in the sample size), and interpretable. As a unique advantage, our tests can produce a set of examples (informative features) indicating… ▽ More

    Submitted 27 October, 2018; originally announced October 2018.

    Comments: Accepted to NIPS 2018

    MSC Class: 46E22; 62G10 ACM Class: G.3; I.2.6

  48. arXiv:1803.03310  [pdf, other

    cs.CV

    Generalization in Metric Learning: Should the Embedding Layer be the Embedding Layer?

    Authors: Nam Vo, James Hays

    Abstract: This work studies deep metric learning under small to medium scale data as we believe that better generalization could be a contributing factor to the improvement of previous fine-grained image retrieval methods; it should be considered when designing future techniques. In particular, we investigate using other layers in a deep metric learning system (besides the embedding layer) for feature extra… ▽ More

    Submitted 10 December, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: new version for WACV

  49. arXiv:1801.07388  [pdf, other

    cs.CV

    Let's Dance: Learning From Online Dance Videos

    Authors: Daniel Castro, Steven Hickson, Patsorn Sangkloy, Bhavishya Mittal, Sean Dai, James Hays, Irfan Essa

    Abstract: In recent years, deep neural network approaches have naturally extended to the video domain, in their simplest case by aggregating per-frame classifications as a baseline for action recognition. A majority of the work in this area extends from the imaging domain, leading to visual-feature heavy approaches on temporal data. To address this issue we introduce "Let's Dance", a 1000 video dataset (and… ▽ More

    Submitted 22 January, 2018; originally announced January 2018.

    Comments: first submitted November 2016

    ACM Class: I.4; I.5; I.5.1

  50. arXiv:1801.02753  [pdf, other

    cs.CV

    SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis

    Authors: Wengling Chen, James Hays

    Abstract: Synthesizing realistic images from human drawn sketches is a challenging problem in computer graphics and vision. Existing approaches either need exact edge maps, or rely on retrieval of existing photographs. In this work, we propose a novel Generative Adversarial Network (GAN) approach that synthesizes plausible images from 50 categories including motorcycles, horses and couches. We demonstrate a… ▽ More

    Submitted 12 April, 2018; v1 submitted 8 January, 2018; originally announced January 2018.

    Comments: Accepted to CVPR 2018