Skip to main content

Showing 1–39 of 39 results for author: Manhardt, F

  1. arXiv:2405.21066  [pdf, other

    cs.CV

    Mixed Diffusion for 3D Indoor Scene Synthesis

    Authors: Siyi Hu, Diego Martin Arroyo, Stephanie Debats, Fabian Manhardt, Luca Carlone, Federico Tombari

    Abstract: Realistic conditional 3D scene synthesis significantly enhances and accelerates the creation of virtual environments, which can also provide extensive training data for computer vision and robotics research among other applications. Diffusion models have shown great performance in related applications, e.g., making precise arrangements of unordered sets. However, these models have not been fully e… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures. Under review. Code to be released at: https://github.com/MIT-SPARK/MiDiffusion

  2. arXiv:2404.03650  [pdf, other

    cs.CV

    OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views

    Authors: Francis Engelmann, Fabian Manhardt, Michael Niemeyer, Keisuke Tateno, Marc Pollefeys, Federico Tombari

    Abstract: Large visual-language models (VLMs), like CLIP, enable open-set image segmentation to segment arbitrary concepts from an image in a zero-shot manner. This goes beyond the traditional closed-set assumption, i.e., where models can only segment classes from a pre-defined training set. More recently, first works on open-set segmentation in 3D scenes have appeared in the literature. These methods are h… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: ICLR 2024, Project page: https://opennerf.github.io

    Journal ref: ICLR 2024

  3. arXiv:2404.01887   

    cs.CV

    3D scene generation from scene graphs and self-attention

    Authors: Pietro Bonazzi, Mengqi Wang, Diego Martin Arroyo, Fabian Manhardt, Nico Messikomer, Federico Tombari, Davide Scaramuzza

    Abstract: Synthesizing realistic and diverse indoor 3D scene layouts in a controllable fashion opens up applications in simulated navigation and virtual reality. As concise and robust representations of a scene, scene graphs have proven to be well-suited as the semantic control on the generated layout. We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from… ▽ More

    Submitted 23 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Some authors were not timely informed of the submission

  4. arXiv:2403.13806  [pdf, other

    cs.CV cs.GR

    RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS

    Authors: Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari

    Abstract: Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Project page at https://m-niemeyer.github.io/radsplat/

  5. arXiv:2403.10099  [pdf, other

    cs.CV

    KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation

    Authors: Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji

    Abstract: In this paper, we present KP-RED, a unified KeyPoint-driven REtrieval and Deformation framework that takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models from a pre-processed database to tightly match the target. Unlike existing dense matching based methods that typically struggle with noisy partial scans, we propose to leverage category-consisten… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  6. arXiv:2402.03445  [pdf, other

    cs.CV cs.GR cs.LG

    Denoising Diffusion via Image-Based Rendering

    Authors: Titas Anciukevičius, Fabian Manhardt, Federico Tombari, Paul Henderson

    Abstract: Generating 3D scenes is a challenging open problem, which requires synthesizing plausible content that is fully consistent in 3D space. While recent methods such as neural radiance fields excel at view synthesis and 3D reconstruction, they cannot synthesize plausible details in unobserved regions since they lack a generative capability. Conversely, existing generative methods are typically not cap… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at ICLR 2024. Project page: https://anciukevicius.github.io/generative-image-based-rendering

  7. arXiv:2311.14189  [pdf, other

    cs.CV

    D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

    Authors: Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

    Abstract: Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  8. arXiv:2311.12588  [pdf, other

    cs.CV

    HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

    Authors: Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricker, Jason Rambach, Yu Zhang

    Abstract: In this work, we present a novel dense-correspondence method for 6DoF object pose estimation from a single RGB-D image. While many existing data-driven methods achieve impressive performance, they tend to be time-consuming due to their reliance on rendering-based refinement approaches. To circumvent this limitation, we present HiPose, which establishes 3D-3D correspondences in a coarse-to-fine man… ▽ More

    Submitted 7 April, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  9. arXiv:2311.11125  [pdf, other

    cs.CV

    SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

    Authors: Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam

    Abstract: Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of capturing this variation. To address this issue, we present SecondPose, a novel approach integrating object-specific geometric features with semantic category priors fr… ▽ More

    Submitted 21 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 accepted. Code is available at: https://github.com/NOrangeeroli/SecondPose

  10. arXiv:2310.11696  [pdf, other

    cs.CV

    MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

    Authors: Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

    Abstract: Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-vie… ▽ More

    Submitted 13 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: CVPR 2024

  11. arXiv:2309.12188  [pdf, other

    cs.RO cs.CV

    SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs

    Authors: Guangyao Zhai, Xiaoni Cai, Dianye Huang, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam

    Abstract: Object rearrangement is pivotal in robotic-environment interactions, representing a significant capability in embodied AI. In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation. Unlike previous methods that rely on either known goal priors or zero-shot large models, SG-Bot exemplifies lightweight, real… ▽ More

    Submitted 24 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: ICRA 2024 accepted. Project website: https://sites.google.com/view/sg-bot

  12. arXiv:2308.08231  [pdf, other

    cs.CV

    DDF-HO: Hand-Held Object Reconstruction via Conditional Directed Distance Field

    Authors: Chenyangguang Zhang, Yan Di, Ruida Zhang, Guangyao Zhai, Fabian Manhardt, Federico Tombari, Xiangyang Ji

    Abstract: Reconstructing hand-held objects from a single RGB image is an important and challenging problem. Existing works utilizing Signed Distance Fields (SDF) reveal limitations in comprehensively capturing the complex hand-object interactions, since SDF is only reliable within the proximity of the target, and hence, infeasible to simultaneously encode local hand and object cues. To address this issue, w… ▽ More

    Submitted 26 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Camera Ready for NeurIPS 2023

  13. arXiv:2308.07837  [pdf, other

    cs.CV

    CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction

    Authors: Yan Di, Chenyangguang Zhang, Pengyuan Wang, Guangyao Zhai, Ruida Zhang, Fabian Manhardt, Benjamin Busam, Xiangyang Ji, Federico Tombari

    Abstract: In this paper, we present a novel shape reconstruction method leveraging diffusion model to generate 3D sparse point cloud for the object captured in a single RGB image. Recent methods typically leverage global embedding or local projection-based features as the condition to guide the diffusion model. However, such strategies fail to consistently align the denoised point cloud with the given image… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 11 pages

  14. arXiv:2308.06383  [pdf, other

    cs.CV

    U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds

    Authors: Yan Di, Chenyangguang Zhang, Ruida Zhang, Fabian Manhardt, Yongzhi Su, Jason Rambach, Didier Stricker, Xiangyang Ji, Federico Tombari

    Abstract: In this paper, we propose U-RED, an Unsupervised shape REtrieval and Deformation pipeline that takes an arbitrary object observation as input, typically captured by RGB images or scans, and jointly retrieves and deforms the geometrically similar CAD models from a pre-established database to tightly match the target. Considering existing methods typically fail to handle noisy partial observations,… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  15. arXiv:2305.17972  [pdf, other

    cs.CV

    View-to-Label: Multi-View Consistency for Self-Supervised 3D Object Detection

    Authors: Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

    Abstract: For autonomous vehicles, driving safely is highly dependent on the capability to correctly perceive the environment in 3D space, hence the task of 3D object detection represents a fundamental aspect of perception. While 3D sensors deliver accurate metric perception, monocular approaches enjoy cost and availability advantages that are valuable in a wide range of applications. Unfortunately, trainin… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  16. arXiv:2304.12439  [pdf, other

    cs.CV

    TextMesh: Generation of Realistic 3D Meshes From Text Prompts

    Authors: Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico Tombari

    Abstract: The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in the generation of 3D content from such text prompts. To this end, a new line of methods recently emerged trying to harness diffusion models, trained on… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

    Comments: Project Website: https://fabi92.github.io/textmesh/

  17. arXiv:2303.09431  [pdf, other

    cs.CV

    NeRFMeshing: Distilling Neural Radiance Fields into Geometrically-Accurate 3D Meshes

    Authors: Marie-Julie Rakotosaona, Fabian Manhardt, Diego Martin Arroyo, Michael Niemeyer, Abhijit Kundu, Federico Tombari

    Abstract: With the introduction of Neural Radiance Fields (NeRFs), novel view synthesis has recently made a big leap forward. At the core, NeRF proposes that each 3D point can emit radiance, allowing to conduct view synthesis using differentiable volumetric rendering. While neural radiance fields can accurately represent 3D scenes for computing the image rendering, 3D meshes are still the main scene represe… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  18. arXiv:2303.00575  [pdf, other

    cs.CV cs.RO

    IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

    Authors: Dekai Zhu, Guangyao Zhai, Yan Di, Fabian Manhardt, Hendrik Berkemeyer, Tuan Tran, Nassir Navab, Federico Tombari, Benjamin Busam

    Abstract: Reliable multi-agent trajectory prediction is crucial for the safe planning and control of autonomous systems. Compared with single-agent cases, the major challenge in simultaneously processing multiple agents lies in modeling complex social interactions caused by various driving intentions and road conditions. Previous methods typically leverage graph-based message propagation or attention mechan… ▽ More

    Submitted 30 April, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 accepted

  19. arXiv:2212.12902  [pdf, other

    cs.CV

    TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation

    Authors: Hanzhi Chen, Fabian Manhardt, Nassir Navab, Benjamin Busam

    Abstract: In this paper, we introduce neural texture learning for 6D object pose estimation from synthetic data and a few unlabelled real images. Our major contribution is a novel learning scheme which removes the drawbacks of previous works, namely the strong dependency on co-modalities or additional refinement. These have been previously necessary to provide training signals for convergence. We formulate… ▽ More

    Submitted 14 March, 2023; v1 submitted 25 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023

  20. arXiv:2211.11738  [pdf, other

    cs.CV

    SPARF: Neural Radiance Fields from Sparse and Noisy Poses

    Authors: Prune Truong, Marie-Julie Rakotosaona, Fabian Manhardt, Federico Tombari

    Abstract: Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of… ▽ More

    Submitted 13 June, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Code is released at https://github.com/google-research/sparf. Published at CVPR 2023 as a Highlight

  21. arXiv:2211.01142  [pdf, other

    cs.CV

    OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

    Authors: Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari

    Abstract: Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage,… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  22. arXiv:2209.13036  [pdf, other

    cs.RO cs.AI cs.CV

    MonoGraspNet: 6-DoF Grasping with a Single RGB Image

    Authors: Guangyao Zhai, Dianye Huang, Shun-Cheng Wu, Hyunjun Jung, Yan Di, Fabian Manhardt, Federico Tombari, Nassir Navab, Benjamin Busam

    Abstract: 6-DoF robotic grasping is a long-lasting but unsolved problem. Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors, demonstrating superior accuracy on common objects but perform unsatisfactorily on photometrically challenging objects, e.g., objects in transparent or reflective materials. The bottleneck lies in that the surface of these objects… ▽ More

    Submitted 1 March, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: ICRA 2023 accepted. Project website: https://sites.google.com/view/monograsp

  23. arXiv:2208.06661  [pdf, other

    cs.CV

    SSP-Pose: Symmetry-Aware Shape Prior Deformation for Direct Category-Level Object Pose Estimation

    Authors: Ruida Zhang, Yan Di, Fabian Manhardt, Federico Tombari, Xiangyang Ji

    Abstract: Category-level pose estimation is a challenging problem due to intra-class shape variations. Recent methods deform pre-computed shape priors to map the observed point cloud into the normalized object coordinate space and then retrieve the pose via post-processing, i.e., Umeyama's Algorithm. The shortcomings of this two-stage strategy lie in two aspects: 1) The surrogate supervision on the intermed… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.

    Comments: Accepted by IROS 2022

  24. arXiv:2208.00237  [pdf, other

    cs.CV

    RBP-Pose: Residual Bounding Box Projection for Category-Level Pose Estimation

    Authors: Ruida Zhang, Yan Di, Zhiqiang Lou, Fabian Manhardt, Federico Tombari, Xiangyang Ji

    Abstract: Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. However, their shape prior integration strategy boosts pose estimation indirectly, which l… ▽ More

    Submitted 28 September, 2022; v1 submitted 30 July, 2022; originally announced August 2022.

    Comments: Accepted by ECCV 2022

  25. Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation

    Authors: Gu Wang, Fabian Manhardt, Xingyu Liu, Xiangyang Ji, Federico Tombari

    Abstract: 6D object pose estimation is a fundamental yet challenging problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even under monocular settings. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very time-consuming and labor intensive. To overcome th… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted to TPAMI 2021, in IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv admin note: text overlap with arXiv:2004.06468

  26. arXiv:2203.07918  [pdf, other

    cs.CV

    GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

    Authors: Yan Di, Ruida Zhang, Zhiqiang Lou, Fabian Manhardt, Xiangyang Ji, Nassir Navab, Federico Tombari

    Abstract: While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications. To circumvent this problem, category-level object pose estimation has recently been revamped, which aims at predicting the 6D pose as well as the 3D metric size for previously unseen instances from a given set of obje… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: CVPR 2022

  27. Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

    Authors: Issa Mouawad, Nikolas Brasch, Fabian Manhardt, Federico Tombari, Francesca Odone

    Abstract: Monocular 3D object detection continues to attract attention due to the cost benefits and wider availability of RGB cameras. Despite the recent advances and the ability to acquire data at scale, annotation cost and complexity still limit the size of 3D object detection datasets in the supervised settings. Self-supervised methods, on the other hand, aim at training deep networks relying on pretext… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

  28. arXiv:2112.02849  [pdf, other

    cs.RO cs.CV

    DemoGrasp: Few-Shot Learning for Robotic Grasping with Human Demonstration

    Authors: Pengyuan Wang, Fabian Manhardt, Luca Minciullo, Lorenzo Garattoni, Sven Meie, Nassir Navab, Benjamin Busam

    Abstract: The ability to successfully grasp objects is crucial in robotics, as it enables several interactive downstream applications. To this end, most approaches either compute the full 6D pose for the object of interest or learn to predict a set of grasping points. While the former approaches do not scale well to multiple object instances or classes yet, the latter require large annotated datasets and ar… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted by IROS 2021

  29. arXiv:2112.01521  [pdf, other

    cs.CV cs.LG

    Object-aware Monocular Depth Prediction with Instance Convolutions

    Authors: Enis Simsar, Evin Pınar Örnek, Fabian Manhardt, Helisa Dhamo, Nassir Navab, Federico Tombari

    Abstract: With the advent of deep learning, estimating depth from a single RGB image has recently received a lot of attention, being capable of empowering many different applications ranging from path planning for robotics to computational cinematography. Nevertheless, while the depth maps are in their entirety fairly reliable, the estimates around object discontinuities are still far from satisfactory. Thi… ▽ More

    Submitted 24 February, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

  30. arXiv:2108.08841  [pdf, other

    cs.CV

    Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs

    Authors: Helisa Dhamo, Fabian Manhardt, Nassir Navab, Federico Tombari

    Abstract: Controllable scene synthesis consists of generating 3D information that satisfy underlying specifications. Thereby, these specifications should be abstract, i.e. allowing easy user interaction, whilst providing enough interface for detailed control. Scene graphs are representations of a scene, composed of objects (nodes) and inter-object relationships (edges), proven to be particularly suited for… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: accepted to ICCV 2021

  31. arXiv:2108.08367  [pdf, other

    cs.CV

    SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

    Authors: Yan Di, Fabian Manhardt, Gu Wang, Xiangyang Ji, Nassir Navab, Federico Tombari

    Abstract: Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e.g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate P$n$P/RANSAC-based approaches in terms of pose accuracy. In this work… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: ICCV2021

  32. arXiv:2102.12145  [pdf, other

    cs.CV cs.RO

    GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

    Authors: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji

    Abstract: 6D pose estimation from a single RGB image is a fundamental task in computer vision. The current top-performing deep learning-based methods rely on an indirect strategy, i.e., first establishing 2D-3D correspondences between the coordinates in the image plane and object coordinate system, and then applying a variant of the P$n$P/RANSAC algorithm. However, this two-stage pipeline is not end-to-end… ▽ More

    Submitted 9 March, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: CVPR 2021 camera ready, typo fixed

  33. arXiv:2004.06468  [pdf, other

    cs.CV

    Self6D: Self-Supervised Monocular 6D Object Pose Estimation

    Authors: Gu Wang, Fabian Manhardt, Jianzhun Shao, Xiangyang Ji, Nassir Navab, Federico Tombari

    Abstract: 6D object pose estimation is a fundamental problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even from monocular images. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very time-consuming and labor intensive. To overcome this shortcoming, we… ▽ More

    Submitted 3 August, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: ECCV 2020

  34. arXiv:2003.05848  [pdf, other

    cs.CV

    CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning

    Authors: Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, Nassir Navab

    Abstract: Contemporary monocular 6D pose estimation methods can only cope with a handful of object instances. This naturally hampers possible applications as, for instance, robots seamlessly integrated in everyday processes necessarily require the ability to work with hundreds of different objects. To tackle this problem of immanent practical relevance, we propose a novel method for class-level monocular 6D… ▽ More

    Submitted 11 September, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

  35. arXiv:1812.02781  [pdf, other

    cs.CV

    ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape

    Authors: Fabian Manhardt, Wadim Kehl, Adrien Gaidon

    Abstract: We present a deep learning method for end-to-end monocular 3D object detection and metric shape retrieval. We propose a novel loss formulation by lifting 2D detection, orientation, and scale estimation into 3D space. Instead of optimizing these quantities separately, the 3D instantiation allows to properly measure the metric misalignment of boxes. We experimentally show that our 10D lifting of spa… ▽ More

    Submitted 10 April, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

    Comments: CVPR 2019

  36. arXiv:1812.00287  [pdf, other

    cs.CV

    Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data

    Authors: Fabian Manhardt, Diego Martin Arroyo, Christian Rupprecht, Benjamin Busam, Tolga Birdal, Nassir Navab, Federico Tombari

    Abstract: 3D object detection and pose estimation from a single image are two inherently ambiguous problems. Oftentimes, objects appear similar from different viewpoints due to shape symmetries, occlusion and repetitive textures. This ambiguity in both detection and pose estimation means that an object instance can be perfectly described by several different poses and even classes. In this work we propose t… ▽ More

    Submitted 20 August, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

    Comments: ICCV 2019

  37. arXiv:1810.03065  [pdf, other

    cs.CV

    Deep Model-Based 6D Pose Refinement in RGB

    Authors: Fabian Manhardt, Wadim Kehl, Nassir Navab, Federico Tombari

    Abstract: We present a novel approach for model-based 6D pose refinement in color data. Building on the established idea of contour-based pose tracking, we teach a deep neural network to predict a translational and rotational update. At the core, we propose a new visual loss that drives the pose update by aligning object contours, thus avoiding the definition of any explicit appearance model. In contrast to… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: The first two authors contributed equally to this work

  38. arXiv:1808.08319  [pdf, other

    cs.CV cs.AI cs.RO

    BOP: Benchmark for 6D Object Pose Estimation

    Authors: Tomas Hodan, Frank Michel, Eric Brachmann, Wadim Kehl, Anders Glent Buch, Dirk Kraft, Bertram Drost, Joel Vidal, Stephan Ihrke, Xenophon Zabulis, Caner Sahin, Fabian Manhardt, Federico Tombari, Tae-Kyun Kim, Jiri Matas, Carsten Rother

    Abstract: We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation met… ▽ More

    Submitted 24 August, 2018; originally announced August 2018.

    Comments: ECCV 2018

  39. arXiv:1711.10006  [pdf, other

    cs.CV

    SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again

    Authors: Wadim Kehl, Fabian Manhardt, Federico Tombari, Slobodan Ilic, Nassir Navab

    Abstract: We present a novel method for detecting 3D model instances and estimating their 6D poses from RGB data in a single shot. To this end, we extend the popular SSD paradigm to cover the full 6D pose space and train on synthetic model data only. Our approach competes or surpasses current state-of-the-art methods that leverage RGB-D data on multiple challenging datasets. Furthermore, our method produces… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: The first two authors contributed equally to this work