Skip to main content

Showing 1–16 of 16 results for author: Casser, V

  1. arXiv:2210.08113  [pdf, other

    cs.CV

    Instance Segmentation with Cross-Modal Consistency

    Authors: Alex Zihao Zhu, Vincent Casser, Reza Mahjourian, Henrik Kretzschmar, Sören Pirk

    Abstract: Segmenting object instances is a key task in machine perception, with safety-critical applications in robotics and autonomous driving. We introduce a novel approach to instance segmentation that jointly leverages measurements from multiple sensor modalities, such as cameras and LiDAR. Our method learns to predict embeddings for each pixel or point that give rise to a dense segmentation of the scen… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 8 pages, 9 figures, 5 tables. Presented at IROS 2022

  2. arXiv:2206.07705  [pdf, other

    cs.CV

    LET-3D-AP: Longitudinal Error Tolerant 3D Average Precision for Camera-Only 3D Detection

    Authors: Wei-Chih Hung, Vincent Casser, Henrik Kretzschmar, Jyh-Jing Hwang, Dragomir Anguelov

    Abstract: The 3D Average Precision (3D AP) relies on the intersection over union between predictions and ground truth objects. However, camera-only detectors have limited depth accuracy, which may cause otherwise reasonable predictions that suffer from such longitudinal localization errors to be treated as false positives. We therefore propose variants of the 3D AP metric to be more permissive with respect… ▽ More

    Submitted 3 May, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Find the primary metrics for the 2022 Waymo Open Dataset 3D Camera-Only Detection Challenge at https://waymo.com/open/challenges/2022/3d-camera-only-detection/ . Find the code at https://github.com/waymo-research/waymo-open-dataset

  3. arXiv:2202.05263  [pdf, other

    cs.CV cs.GR

    Block-NeRF: Scalable Large Scene Neural View Synthesis

    Authors: Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P. Srinivasan, Jonathan T. Barron, Henrik Kretzschmar

    Abstract: We present Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. Specifically, we demonstrate that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs. This decomposition decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments,… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: Project page: https://waymo.com/research/block-nerf/

  4. arXiv:2201.05938  [pdf, other

    cs.LG cs.CV

    GradTail: Learning Long-Tailed Data Using Gradient-based Sample Weighting

    Authors: Zhao Chen, Vincent Casser, Henrik Kretzschmar, Dragomir Anguelov

    Abstract: We propose GradTail, an algorithm that uses gradients to improve model performance on the fly in the face of long-tailed training data distributions. Unlike conventional long-tail classifiers which operate on converged - and possibly overfit - models, we demonstrate that an approach based on gradient dot product agreement can isolate long-tailed data early on during model training and improve perf… ▽ More

    Submitted 18 January, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: 15 pages (including Appendix), 8 figures

  5. arXiv:2109.01066  [pdf, other

    cs.CV

    4D-Net for Learned Multi-Modal Alignment

    Authors: AJ Piergiovanni, Vincent Casser, Michael S. Ryoo, Anelia Angelova

    Abstract: We present 4D-Net, a 3D object detection approach, which utilizes 3D Point Cloud and RGB sensing information, both in time. We are able to incorporate the 4D information by performing a novel dynamic connection learning across various feature representations and levels of abstraction, as well as by observing geometric constraints. Our approach outperforms the state-of-the-art and strong baselines… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: ICCV 2021

  6. arXiv:2010.16404  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    Unsupervised Monocular Depth Learning in Dynamic Scenes

    Authors: Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova

    Abstract: We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision. We show that this apparently heavily underdetermined problem can be regularized by imposing the following prior knowledge about 3D translation fields: they are sparse, since most… ▽ More

    Submitted 7 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: Accepted at 4th Conference on Robot Learning (CoRL 2020)

  7. arXiv:2009.02568  [pdf, other

    cs.CV

    Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

    Authors: Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva

    Abstract: A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay over time. We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. Based on our findin… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

    Comments: European Conference on Computer Vision

  8. arXiv:2008.02912  [pdf, other

    cs.CV cs.GR cs.HC eess.IV

    Predicting Visual Importance Across Graphic Design Types

    Authors: Camilo Fosco, Vincent Casser, Amish Kumar Bedi, Peter O'Donovan, Aaron Hertzmann, Zoya Bylinskii

    Abstract: This paper introduces a Unified Model of Saliency and Importance (UMSI), which learns to predict visual importance in input graphic designs, and saliency in natural images, along with a new dataset and applications. Previous methods for predicting saliency or visual importance are trained individually on specialized datasets, making them limited in application and leading to poor generalization on… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Journal ref: Proceedings of UIST 2020

  9. arXiv:2005.07289  [pdf, other

    cs.CV cs.LG

    Taskology: Utilizing Task Relations at Scale

    Authors: Yao Lu, Sören Pirk, Jan Dlabal, Anthony Brohan, Ankita Pasad, Zhao Chen, Vincent Casser, Anelia Angelova, Ariel Gordon

    Abstract: Many computer vision tasks address the problem of scene understanding and are naturally interrelated e.g. object classification, detection, scene segmentation, depth estimation, etc. We show that we can leverage the inherent relationships among collections of tasks, as they are trained jointly, supervising each other through their known relationships via consistency losses. Furthermore, explicitly… ▽ More

    Submitted 17 March, 2021; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition, 2021

  10. arXiv:1906.05717  [pdf, other

    cs.CV cs.RO

    Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics

    Authors: Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova

    Abstract: We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion. More specifically, we model the motion of individual objects and learn their 3D motion vector jointly with depth and ego-motion. We obtain more accurate results, especially for challenging dynamic scenes not addressed by previous approaches. This is an extended… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: CVPR Workshop on Visual Odometry & Computer Vision Applications Based on Location Clues (VOCVALC), 2019. This is an extension of arXiv:1811.06152: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19)

  11. arXiv:1904.08801  [pdf, other

    cs.RO cs.CV cs.LG

    Learning a Controller Fusion Network by Online Trajectory Filtering for Vision-based UAV Racing

    Authors: Matthias Müller, Guohao Li, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

    Abstract: Autonomous UAV racing has recently emerged as an interesting research problem. The dream is to beat humans in this new fast-paced sport. A common approach is to learn an end-to-end policy that directly predicts controls from raw images by imitating an expert. However, such a policy is limited by the expert it imitates and scaling to other environments and vehicle dynamics is difficult. One approac… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: Accepted at CVPRW'19: UAVision 2019. First two authors contributed equally. Based on the initial work of arXiv:1803.01129 which was eventually split into two separate projects

  12. arXiv:1812.06024  [pdf, other

    cs.CV

    Fast Mitochondria Detection for Connectomics

    Authors: Vincent Casser, Kai Kang, Hanspeter Pfister, Daniel Haehn

    Abstract: High-resolution connectomics data allows for the identification of dysfunctional mitochondria which are linked to a variety of diseases such as autism or bipolar. However, manual analysis is not feasible since datasets can be petabytes in size. We present a fully automatic mitochondria detector based on a modified U-Net architecture that yields high accuracy and fast processing times. We evaluate… ▽ More

    Submitted 19 June, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

  13. arXiv:1811.06152  [pdf, other

    cs.CV

    Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

    Authors: Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova

    Abstract: Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has est… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19)

  14. arXiv:1803.01129  [pdf, other

    cs.CV cs.LG cs.RO

    OIL: Observational Imitation Learning

    Authors: Guohao Li, Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

    Abstract: Recent work has explored the problem of autonomous navigation by imitating a teacher and learning an end-to-end policy, which directly predicts controls from raw images. However, these approaches tend to be sensitive to mistakes by the teacher and do not scale well to other environments or vehicles. To this end, we propose Observational Imitation Learning (OIL), a novel imitation learning variant… ▽ More

    Submitted 23 May, 2019; v1 submitted 3 March, 2018; originally announced March 2018.

    Comments: Accepted at RSS'19. First two authors contributed equally

  15. arXiv:1708.05884  [pdf, other

    cs.CV

    Teaching UAVs to Race: End-to-End Regression of Agile Controls in Simulation

    Authors: Matthias Müller, Vincent Casser, Neil Smith, Dominik L. Michels, Bernard Ghanem

    Abstract: Automating the navigation of unmanned aerial vehicles (UAVs) in diverse scenarios has gained much attention in recent years. However, teaching UAVs to fly in challenging environments remains an unsolved problem, mainly due to the lack of training data. In this paper, we train a deep neural network to predict UAV controls from raw image data for the task of autonomous UAV racing in a photo-realisti… ▽ More

    Submitted 22 November, 2018; v1 submitted 19 August, 2017; originally announced August 2017.

    Comments: Accepted at ECCVW'18

  16. Sim4CV: A Photo-Realistic Simulator for Computer Vision Applications

    Authors: Matthias Müller, Vincent Casser, Jean Lahoud, Neil Smith, Bernard Ghanem

    Abstract: We present a photo-realistic training and evaluation simulator (Sim4CV) with extensive applications across various fields of computer vision. Built on top of the Unreal Engine, the simulator integrates full featured physics based cars, unmanned aerial vehicles (UAVs), and animated human actors in diverse urban and suburban 3D environments. We demonstrate the versatility of the simulator with two c… ▽ More

    Submitted 24 March, 2018; v1 submitted 19 August, 2017; originally announced August 2017.

    Comments: Published at the International Journal of Computer Vision (IJCV), 2018