Skip to main content

Showing 1–50 of 92 results for author: Lepetit, V

  1. arXiv:2405.11977  [pdf, other

    cs.CV

    GuidedRec: Guiding Ill-Posed Unsupervised Volumetric Recovery

    Authors: Alexandre Cafaro, Amaury Leroy, Guillaume Beldjoudi, Pauline Maury, Charlotte Robert, Eric Deutsch, Vincent Grégoire, Vincent Lepetit, Nikos Paragios

    Abstract: We introduce a novel unsupervised approach to reconstructing a 3D volume from only two planar projections that exploits a previous\-ly-captured 3D volume of the patient. Such volume is readily available in many important medical procedures and previous methods already used such a volume. Earlier methods that work by deforming this volume to match the projections typically fail when the number of p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  2. arXiv:2404.10620  [pdf, other

    cs.CV cs.LG

    PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction

    Authors: Sinisa Stekovic, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects from images using interpretable shape programs. In comparison to traditional CAD model retrieval methods, the use of shape programs for 3D reconstruction allows for reasoning about the semantic properties of reconstructed objects, editing, low memory footprint, etc. However, the utilization of shape programs for 3D s… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: In Submission

  3. arXiv:2403.14554  [pdf, other

    cs.CV cs.GR

    Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering

    Authors: Antoine Guédon, Vincent Lepetit

    Abstract: We propose Gaussian Frosting, a novel mesh-based representation for high-quality rendering and editing of complex 3D effects in real-time. Our approach builds on the recent 3D Gaussian Splatting framework, which optimizes a set of 3D Gaussians to approximate a radiance field from images. We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adapt… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project Webpage: https://anttwo.github.io/frosting/

  4. arXiv:2403.09799  [pdf, other

    cs.CV cs.RO

    BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

    Authors: Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 c… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.13075

  5. arXiv:2312.04527  [pdf, other

    cs.CV

    Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection

    Authors: Kohei Yamashita, Vincent Lepetit, Ko Nishino

    Abstract: Computer vision has long relied on two kinds of correspondences: pixel correspondences in images and 3D correspondences on object surfaces. Is there another kind, and if there is, what can they do for us? In this paper, we introduce correspondences of the third kind we call reflection correspondences and show that they can help estimate camera pose by just looking at objects without relying on the… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  6. arXiv:2311.14155  [pdf, other

    cs.CV

    GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence

    Authors: Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit

    Abstract: We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative "templates", rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  7. arXiv:2311.12775  [pdf, other

    cs.GR cs.CV

    SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering

    Authors: Antoine Guédon, Vincent Lepetit

    Abstract: We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has… ▽ More

    Submitted 2 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: We identified a minor typographical error in Equation 6; We updated the paper accordingly. Project Webpage: https://anttwo.github.io/sugar/

  8. arXiv:2310.17281  [pdf, other

    cs.CV cs.LG

    BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds

    Authors: Corentin Sautier, Gilles Puy, Alexandre Boulch, Renaud Marlet, Vincent Lepetit

    Abstract: We present a surprisingly simple and efficient method for self-supervision of 3D backbone on automotive Lidar point clouds. We design a contrastive loss between features of Lidar scans captured in the same scene. Several such approaches have been proposed in the literature from PointConstrast, which uses a contrast at the level of points, to the state-of-the-art TARL, which uses a contrast at the… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to 3DV 2024

  9. arXiv:2309.06107  [pdf, other

    cs.CV

    HOC-Search: Efficient CAD Model and Pose Retrieval from RGB-D Scans

    Authors: Stefan Ainetter, Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We present an automated and efficient approach for retrieving high-quality CAD models of objects and their poses in a scene captured by a moving RGB-D camera. We first investigate various objective functions to measure similarity between a candidate CAD object model and the available data, and the best objective function appears to be a "render-and-compare" method comparing depth and mask renderin… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  10. arXiv:2307.11067  [pdf, other

    cs.CV

    CNOS: A Strong Baseline for CAD-based Novel Object Segmentation

    Authors: Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin, Vincent Lepetit, Tomas Hodan

    Abstract: We propose a simple three-stage approach to segment unseen objects in RGB images using their CAD models. Leveraging recent powerful foundation models, DINOv2 and Segment Anything, we create descriptors and generate proposals, including binary masks for a given input RGB image. By matching proposals with reference descriptors created from CAD models, we achieve precise object ID assignment along wi… ▽ More

    Submitted 25 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: ICCV 2023, R6D Workshop

  11. arXiv:2304.11762  [pdf, other

    cs.CV

    You Never Get a Second Chance To Make a Good First Impression: Seeding Active Learning for 3D Semantic Segmentation

    Authors: Nermin Samet, Oriane Siméoni, Gilles Puy, Georgy Ponimatkin, Renaud Marlet, Vincent Lepetit

    Abstract: We propose SeedAL, a method to seed active learning for efficient annotation of 3D point clouds for semantic segmentation. Active Learning (AL) iteratively selects relevant data fractions to annotate within a given budget, but requires a first fraction of the dataset (a 'seed') to be already annotated to estimate the benefit of annotating other data fractions. We first show that the choice of the… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

    Comments: ICCV 2023

  12. arXiv:2303.13612  [pdf, other

    cs.CV

    NOPE: Novel Object Pose Estimation from a Single Image

    Authors: Van Nguyen Nguyen, Thibault Groueix, Yinlin Hu, Mathieu Salzmann, Vincent Lepetit

    Abstract: The practicality of 3D object pose estimation remains limited for many applications due to the need for prior knowledge of a 3D model and a training period for new objects. To address this limitation, we propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model and without requ… ▽ More

    Submitted 29 March, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  13. arXiv:2303.03315  [pdf, other

    cs.CV cs.AI cs.RO

    MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

    Authors: Antoine Guédon, Tom Monnier, Pascal Monasse, Vincent Lepetit

    Abstract: We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images only. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not sca… ▽ More

    Submitted 13 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: To appear at CVPR 2023. Project Webpage: https://imagine.enpc.fr/~guedona/MACARONS/

  14. arXiv:2212.11796  [pdf, other

    cs.CV

    Automatically Annotating Indoor Images with CAD Models via RGB-D Scans

    Authors: Stefan Ainetter, Sinisa Stekovic, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We present an automatic method for annotating images of indoor scenes with the CAD models of the objects by relying on RGB-D scans. Through a visual evaluation by 3D experts, we show that our method retrieves annotations that are at least as accurate as manual annotations, and can thus be used as ground truth without the burden of manually annotating 3D data. We do this using an analysis-by-synthe… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  15. arXiv:2211.16193  [pdf, other

    cs.CV

    In-Hand 3D Object Scanning from an RGB Sequence

    Authors: Shreyas Hampali, Tomas Hodan, Luan Tran, Lingni Ma, Cem Keskin, Vincent Lepetit

    Abstract: We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. Our method relies on a neural implicit surface representation that captures both the geometry and the appearance of the object, however, by contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known. Instead, we simultaneously optimize both the object shape and the… ▽ More

    Submitted 22 June, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: CVPR 2023

  16. arXiv:2211.07304  [pdf, other

    cs.RO

    Multi-Finger Grasping Like Humans

    Authors: Yuming Du, Philippe Weinzaepfel, Vincent Lepetit, Romain Brégier

    Abstract: Robots with multi-fingered grippers could perform advanced manipulation tasks for us if we were able to properly specify to them what to do. In this study, we take a step in that direction by making a robot grasp an object like a grasping demonstration performed by a human. We propose a novel optimization-based approach for transferring human grasp demonstrations to any multi-fingered grippers, wh… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: presented at IROS 2022 conference

    Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  17. arXiv:2209.10385  [pdf, other

    cs.CV

    Long-Lived Accurate Keypoints in Event Streams

    Authors: Philippe Chiberre, Etienne Perot, Amos Sironi, Vincent Lepetit

    Abstract: We present a novel end-to-end approach to keypoint detection and tracking in an event stream that provides better precision and much longer keypoint tracks than previous methods. This is made possible by two contributions working together. First, we propose a simple procedure to generate stable keypoint labels, which we use to train a recurrent architecture. This training data results in detecti… ▽ More

    Submitted 7 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

  18. arXiv:2209.09341  [pdf, other

    cs.CV

    A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation

    Authors: Georgy Ponimatkin, Nermin Samet, Yang Xiao, Yuming Du, Renaud Marlet, Vincent Lepetit

    Abstract: We propose a simple, yet powerful approach for unsupervised object segmentation in videos. We introduce an objective function whose minimum represents the mask of the main salient object over the input sequence. It only relies on independent image features and optical flows, which can be obtained using off-the-shelf self-supervised methods. It scales with the length of the sequence with no need fo… ▽ More

    Submitted 19 October, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted to the IEEE Winter Conference on Applications of Computer Vision (WACV) 2023

  19. arXiv:2209.07589  [pdf, other

    cs.CV

    PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

    Authors: Van Nguyen Nguyen, Yuming Du, Yang Xiao, Michael Ramamonjisoa, Vincent Lepetit

    Abstract: Estimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown… ▽ More

    Submitted 1 October, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: 3DV Oral

  20. arXiv:2208.10449  [pdf, other

    cs.CV cs.LG cs.RO

    SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration

    Authors: Antoine Guédon, Pascal Monasse, Vincent Lepetit

    Abstract: Next Best View computation (NBV) is a long-standing problem in robotics, and consists in identifying the next most informative sensor position(s) for reconstructing a 3D object or scene efficiently and accurately. Like most current methods, we consider NBV prediction from a depth sensor like Lidar systems. Learning-based methods relying on a volumetric representation of the scene are suitable for… ▽ More

    Submitted 1 November, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: NeurIPS 2022 Camera-Ready. Project Webpage: https://github.com/Anttwo/SCONE

  21. arXiv:2207.14268  [pdf, other

    cs.CV

    MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

    Authors: Michaël Ramamonjisoa, Sinisa Stekovic, Vincent Lepetit

    Abstract: We present MonteBoxFinder, a method that, given a noisy input point cloud, fits cuboids to the input scene. Our primary contribution is a discrete optimization algorithm that, from a dense set of initially detected cuboids, is able to efficiently filter good boxes from the noisy ones. Inspired by recent applications of MCTS to scene understanding problems, we develop a stochastic algorithm that is… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted at ECCV 2022. Project page: https://michaelramamonjisoa.github.io/projects/MonteBoxFinder, Code: https://github.com/MichaelRamamonjisoa/MonteBoxFinder

  22. arXiv:2207.03204  [pdf, other

    cs.CV cs.AI cs.GT

    MCTS with Refinement for Proposals Selection Games in Scene Understanding

    Authors: Sinisa Stekovic, Mahdi Rad, Alireza Moradi, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm, originally designed to learn to play games of high-state complexity. From a generated pool of proposals, our method jointly selects and optimizes proposals that minimize the objective term. In our first application for floor plan reconstruction from point clouds, our… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Submitted to: TPAMI Special Section on the Best Papers of ICCV2021 GitHub Repository: https://github.com/vevenom/MonteScene. arXiv admin note: substantial text overlap with arXiv:2103.11161

  23. arXiv:2207.01567  [pdf, other

    cs.CV cs.AI

    Back to MLP: A Simple Baseline for Human Motion Prediction

    Authors: Wen Guo, Yuming Du, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, Francesc Moreno-Noguer

    Abstract: This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stage… ▽ More

    Submitted 5 October, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to WACV 2023; Code available at https://github.com/dulucas/siMLPe

  24. arXiv:2203.17234  [pdf, other

    cs.CV

    Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions

    Authors: Van Nguyen Nguyen, Yinlin Hu, Yang Xiao, Mathieu Salzmann, Vincent Lepetit

    Abstract: We present a method that can recognize new objects and estimate their 3D pose in RGB images even under partial occlusions. Our method requires neither a training phase on these objects nor real images depicting them, only their CAD models. It relies on a small set of training objects to learn local object representations, which allow us to locally match the input image to a set of "templates", ren… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  25. arXiv:2110.11661  [pdf, other

    cs.CV

    UVO Challenge on Video-based Open-World Segmentation 2021: 1st Place Solution

    Authors: Yuming Du, Wen Guo, Yang Xiao, Vincent Lepetit

    Abstract: In this report, we introduce our (pretty straightforard) two-step "detect-then-match" video instance segmentation method. The first step performs instance segmentation for each frame to get a large number of instance mask proposals. The second step is to do inter-frame instance mask matching with the help of optical flow. We demonstrate that with high quality mask proposals, a simple matching mech… ▽ More

    Submitted 1 November, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Code:https://github.com/dulucas/UVO_Challenge. arXiv admin note: substantial text overlap with arXiv:2110.10239

  26. arXiv:2110.10239  [pdf, other

    cs.CV

    1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021

    Authors: Yuming Du, Wen Guo, Yang Xiao, Vincent Lepetit

    Abstract: We describe our two-stage instance segmentation framework we use to compete in the challenge. The first stage of our framework consists of an object detector, which generates object proposals in the format of bounding boxes. Then, the images and the detected bounding boxes are fed to the second stage, where a segmentation network is applied to segment the objects in the bounding boxes. We train al… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: Code:https://github.com/dulucas/UVO_Challenge

  27. arXiv:2107.00887  [pdf, other

    cs.CV cs.HC

    HO-3D_v3: Improving the Accuracy of Hand-Object Annotations of the HO-3D Dataset

    Authors: Shreyas Hampali, Sayan Deb Sarkar, Vincent Lepetit

    Abstract: HO-3D is a dataset providing image sequences of various hand-object interaction scenarios annotated with the 3D pose of the hand and the object and was originally introduced as HO-3D_v2. The annotations were obtained automatically using an optimization method, 'HOnnotate', introduced in the original paper. HO-3D_v3 provides more accurate annotations for both the hand and object poses thus resultin… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  28. arXiv:2106.09711  [pdf, other

    cs.CV

    Visual Correspondence Hallucination

    Authors: Hugo Germain, Vincent Lepetit, Guillaume Bourmaud

    Abstract: Given a pair of partially overlapping source and target images and a keypoint in the source image, the keypoint's correspondent in the target image can be either visible, occluded or outside the field of view. Local feature matching methods are only able to identify the correspondent's location when it is visible, while humans can also hallucinate its location when it is occluded or outside the fi… ▽ More

    Submitted 2 February, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

  29. arXiv:2106.02022  [pdf, other

    cs.CV

    Single Image Depth Prediction with Wavelet Decomposition

    Authors: Michaël Ramamonjisoa, Michael Firman, Jamie Watson, Vincent Lepetit, Daniyar Turmukhambetov

    Abstract: We present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast with previous works, we show th… ▽ More

    Submitted 16 August, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: CVPR 2021

  30. arXiv:2104.14639  [pdf, other

    cs.CV

    Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

    Authors: Shreyas Hampali, Sayan Deb Sarkar, Mahdi Rad, Vincent Lepetit

    Abstract: We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image. This is a very challenging problem, as large occlusions and many confusions between the joints may happen. State-of-the-art methods solve this problem by regressing a heatmap for each joint, which requires solving two problems simultaneously: localizing the joints and re… ▽ More

    Submitted 19 April, 2022; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted at CVPR2022

  31. arXiv:2104.12276  [pdf, other

    cs.CV cs.LG

    Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos

    Authors: Yuming Du, Yang Xiao, Vincent Lepetit

    Abstract: The ability to localize and segment objects from unseen classes would open the door to new applications, such as autonomous object learning in active vision. Nonetheless, improving the performance on unseen classes requires additional training data, while manually annotating the objects of the unseen classes can be labor-extensive and expensive. In this paper, we explore the use of unlabeled video… ▽ More

    Submitted 20 August, 2021; v1 submitted 25 April, 2021; originally announced April 2021.

    Comments: ICCV2021 Camera Ready. See project page https://dulucas.github.io/Homepage/gbopt/

  32. arXiv:2103.11161  [pdf, other

    cs.CV cs.AI cs.LG

    MonteFloor: Extending MCTS for Reconstructing Accurate Large-Scale Floor Plans

    Authors: Sinisa Stekovic, Mahdi Rad, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We propose a novel method for reconstructing floor plans from noisy 3D point clouds. Our main contribution is a principled approach that relies on the Monte Carlo Tree Search (MCTS) algorithm to maximize a suitable objective function efficiently despite the complexity of the problem. Like previous work, we first project the input point cloud to a top view to create a density map and extract room p… ▽ More

    Submitted 13 September, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

    Comments: Accepted for oral presentation at ICCV 2021

  33. arXiv:2103.09213  [pdf, other

    cs.CV

    Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

    Authors: Paul-Edouard Sarlin, Ajaykumar Unagar, Måns Larsson, Hugo Germain, Carl Toft, Viktor Larsson, Marc Pollefeys, Vincent Lepetit, Lars Hammarstrand, Fredrik Kahl, Torsten Sattler

    Abstract: Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robus… ▽ More

    Submitted 7 April, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  34. arXiv:2103.07969  [pdf, other

    cs.CV cs.AI cs.LG

    Monte Carlo Scene Search for 3D Scene Understanding

    Authors: Shreyas Hampali, Sinisa Stekovic, Sayan Deb Sarkar, Chetan Srinivasa Kumar, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We explore how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. Our adapted MCTS algo… ▽ More

    Submitted 5 May, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

    Comments: To be presented at CVPR 2021

  35. arXiv:2103.07153  [pdf, other

    cs.CV

    Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation

    Authors: Hugo Germain, Vincent Lepetit, Guillaume Bourmaud

    Abstract: Absolute camera pose estimation is usually addressed by sequentially solving two distinct subproblems: First a feature matching problem that seeks to establish putative 2D-3D correspondences, and then a Perspective-n-Point problem that minimizes, with respect to the camera pose, the sum of so-called Reprojection Errors (RE). We argue that generating putative 2D-3D correspondences 1) leads to an im… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  36. arXiv:2010.04075  [pdf, other

    cs.CV

    3D Object Detection and Pose Estimation of Unseen Objects in Color Images with Local Surface Embeddings

    Authors: Giorgia Pitteri, Aurélie Bugeau, Slobodan Ilic, Vincent Lepetit

    Abstract: We present an approach for detecting and estimating the 3D poses of objects in images that requires only an untextured CAD model and no training phase for new objects. Our approach combines Deep Learning and 3D geometry: It relies on an embedding of local 3D geometry to match the CAD models to the input images. For points at the surface of objects, this embedding can be computed directly from the… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

  37. arXiv:2007.12107  [pdf, other

    cs.CV

    Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild

    Authors: Yang Xiao, Vincent Lepetit, Renaud Marlet

    Abstract: Detecting objects and estimating their viewpoints in images are key tasks of 3D scene understanding. Recent approaches have achieved excellent results on very large benchmarks for object detection and viewpoint estimation. However, performances are still lagging behind for novel object categories with few samples. In this paper, we tackle the problems of few-shot object detection and few-shot view… ▽ More

    Submitted 12 October, 2022; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: Accepted by TPAMI, add experimental results and additional ablation studies

  38. arXiv:2007.08939  [pdf, other

    cs.CV

    Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild

    Authors: Alexander Grabner, Yaming Wang, Peizhao Zhang, Peihong Guo, Tong Xiao, Peter Vajda, Peter M. Roth, Vincent Lepetit

    Abstract: We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In contrast to previous methods, we make two main contributions: First, instead of comparing real-world images and synthetic renderings in the RGB or mask space, we compare them in a feature space optimized for 3D pose refinement. Second, we introduce a novel differenti… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted to European Conference on Computer Vision (ECCV) 2020

  39. arXiv:2006.05927  [pdf, other

    cs.CV

    Recent Advances in 3D Object and Hand Pose Estimation

    Authors: Vincent Lepetit

    Abstract: 3D object and hand pose estimation have huge potentials for Augmented Reality, to enable tangible interfaces, natural interfaces, and blurring the boundaries between the real and virtual worlds. In this chapter, we present the recent developments for 3D object and hand pose estimation using cameras, and discuss their abilities and limitations and the possible future development of the field.

    Submitted 10 June, 2020; originally announced June 2020.

  40. ALCN: Adaptive Local Contrast Normalization

    Authors: Mahdi Rad, Peter M. Roth, Vincent Lepetit

    Abstract: To make Robotics and Augmented Reality applications robust to illumination changes, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is a very unwieldy and complex task. We therefore propose a novel illumination normalization method that can easily be used for different problems with cha… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: This version corresponds to the pre-print of the paper accepted for Computer Vision and Image Understanding (CVIU). arXiv admin note: substantial text overlap with arXiv:1708.09633

  41. arXiv:2004.01673  [pdf, other

    cs.CV

    S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching

    Authors: Hugo Germain, Guillaume Bourmaud, Vincent Lepetit

    Abstract: Establishing robust and accurate correspondences is a fundamental backbone to many computer vision algorithms. While recent learning-based feature matching methods have shown promising results in providing robust correspondences under challenging conditions, they are often limited in terms of precision. In this paper, we introduce S2DNet, a novel feature matching pipeline, designed and trained to… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  42. arXiv:2003.13764  [pdf, other

    cs.CV

    Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

    Authors: Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou , et al. (10 additional authors not shown)

    Abstract: We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole… ▽ More

    Submitted 10 September, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

    Comments: European Conference on Computer Vision (ECCV), 2020

  43. arXiv:2002.12730  [pdf, other

    cs.CV

    Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields

    Authors: Michael Ramamonjisoa, Yuming Du, Vincent Lepetit

    Abstract: Current methods for depth map prediction from monocular images tend to predict smooth, poorly localized contours for the occlusion boundaries in the input image. This is unfortunate as occlusion boundaries are important cues to recognize objects, and as we show, may lead to a way to discover new objects from scene reconstruction. To improve predicted depth maps, recent methods rely on various form… ▽ More

    Submitted 10 May, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

    Comments: Accepted to CVPR 2020

  44. arXiv:2001.02149  [pdf, other

    cs.CV

    General 3D Room Layout from a Single View by Render-and-Compare

    Authors: Sinisa Stekovic, Shreyas Hampali, Mahdi Rad, Sayan Deb Sarkar, Friedrich Fraundorfer, Vincent Lepetit

    Abstract: We present a novel method to reconstruct the 3D layout of a room (walls, floors, ceilings) from a single perspective view in challenging conditions, by contrast with previous single-view methods restricted to cuboid-shaped layouts. This input view can consist of a color image only, but considering a depth map results in a more accurate reconstruction. Our approach is formalized as solving a constr… ▽ More

    Submitted 21 July, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

  45. arXiv:1911.09098  [pdf

    eess.IV cs.CV cs.LG

    AssemblyNet: A large ensemble of CNNs for 3D Whole Brain MRI Segmentation

    Authors: Pierrick Coupé, Boris Mansencal, Michaël Clément, Rémi Giraud, Baudouin Denis de Senneville, Vinh-Thong Ta, Vincent Lepetit, José V. Manjon

    Abstract: Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a single convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processin… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1906.01862

  46. arXiv:1910.12257  [pdf, other

    cs.CV

    Smart Hypothesis Generation for Efficient and Robust Room Layout Estimation

    Authors: Martin Hirzer, Peter M. Roth, Vincent Lepetit

    Abstract: We propose a novel method to efficiently estimate the spatial layout of a room from a single monocular RGB image. As existing approaches based on low-level feature extraction, followed by a vanishing point estimation are very slow and often unreliable in realistic scenarios, we build on semantic segmentation of the input image. To obtain better segmentations, we introduce a robust, accurate and ve… ▽ More

    Submitted 27 October, 2019; originally announced October 2019.

    Comments: Accepted: Winter Conference on Applications of Computer Vision (WACV) 2020

  47. arXiv:1908.11656  [pdf, other

    cs.CV

    LU-Net: An Efficient Network for 3D LiDAR Point Cloud Semantic Segmentation Based on End-to-End-Learned 3D Features and U-Net

    Authors: Pierre Biasutti, Vincent Lepetit, Jean-François Aujol, Mathieu Brédif, Aurélie Bugeau

    Abstract: We propose LU-Net -- for LiDAR U-Net, a new method for the semantic segmentation of a 3D LiDAR point cloud. Instead of applying some global 3D segmentation method such as PointNet, we propose an end-to-end architecture for LiDAR point cloud semantic segmentation that efficiently solves the problem as an image processing problem. We first extract high-level 3D features for each point given its 3D n… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: 9 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:1905.08748

  48. arXiv:1908.11457  [pdf, other

    cs.CV

    CorNet: Generic 3D Corners for 6D Pose Estimation of New Objects without Retraining

    Authors: Giorgia Pitteri, Slobodan Ilic, Vincent Lepetit

    Abstract: We present a novel approach to the detection and 3D pose estimation of objects in color images. Its main contribution is that it does not require any training phases nor data for new objects, while state-of-the-art methods typically require hours of training time and hundreds of training registered images. Instead, our method relies only on the objects' geometries. Our method focuses on objects wi… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  49. arXiv:1908.07640  [pdf, other

    cs.CV

    On Object Symmetries and 6D Pose Estimation from Images

    Authors: Giorgia Pitteri, Michaël Ramamonjisoa, Slobodan Ilic, Vincent Lepetit

    Abstract: Objects with symmetries are common in our daily life and in industrial contexts, but are often ignored in the recent literature on 6D pose estimation from images. In this paper, we study in an analytical way the link between the symmetries of a 3D object and its appearance in images. We explain why symmetrical objects can be a challenge when training machine learning algorithms that aim at estimat… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: International Conference on 3D Vision

  50. arXiv:1908.02853  [pdf, other

    cs.CV

    Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

    Authors: Alexander Grabner, Peter M. Roth, Vincent Lepetit

    Abstract: We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location fields encode correspondences between 2D pixels and… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: Accepted to International Conference on 3D Vision (3DV) 2019 (Oral)