Skip to main content

Showing 1–50 of 59 results for author: Sminchisescu, C

  1. arXiv:2406.07516  [pdf, other

    cs.CV

    Instant 3D Human Avatar Generation using Image Diffusion Models

    Authors: Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu

    Abstract: We present AvatarPopUp, a method for fast, high quality 3D human avatar generation from different input modalities, such as images and text prompts and with control over the generated pose and shape. The common theme is the use of diffusion-based image generation networks that are specialized for each particular task, followed by a 3D lifting network. We purposefully decouple the generation from t… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Camera-ready version

  2. arXiv:2404.00485  [pdf, other

    cs.CV

    DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans

    Authors: Akash Sengupta, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, Cristian Sminchisescu

    Abstract: We present DiffHuman, a probabilistic method for photorealistic 3D human reconstruction from a single RGB image. Despite the ill-posed nature of this problem, most methods are deterministic and output a single solution, often resulting in a lack of geometric detail and blurriness in unseen or uncertain regions. In contrast, DiffHuman predicts a probability distribution over 3D reconstructions cond… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  3. arXiv:2403.08764  [pdf, other

    cs.CV

    VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

    Authors: Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan, Nikos Kolotouros, Thiemo Alldieck, Cristian Sminchisescu

    Abstract: We propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2) a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls. This supports the generation o… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project web: https://enriccorona.github.io/vlogger/

  4. arXiv:2401.05293  [pdf, other

    cs.CV

    Score Distillation Sampling with Learned Manifold Corrective

    Authors: Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu

    Abstract: Score Distillation Sampling (SDS) is a recent but already widely popular method that relies on an image diffusion model to control optimization problems using text prompts. In this paper, we conduct an in-depth analysis of the SDS loss function, identify an inherent problem with its formulation, and propose a surprisingly easy but effective fix. Specifically, we decompose the loss into different f… ▽ More

    Submitted 4 July, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  5. arXiv:2312.03528  [pdf, other

    cs.CV

    Personalized Pose Forecasting

    Authors: Maria Priisalu, Ted Kronvall, Cristian Sminchisescu

    Abstract: Human pose forecasting is the task of predicting articulated human motion given past human motion. There exists a number of popular benchmarks that evaluate an array of different models performing human pose forecasting. These benchmarks do not reflect that a human interacting system, such as a delivery robot, observes and plans for the motion of the same individual over an extended period of time… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  6. arXiv:2311.02461  [pdf, other

    cs.CV

    SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling

    Authors: Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir, Cristian Sminchisescu

    Abstract: We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \e… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: To be published at the International Conference on 3D Vision 2024

  7. arXiv:2309.05782  [pdf, other

    cs.CV

    Blendshapes GHUM: Real-time Monocular Facial Blendshape Prediction

    Authors: Ivan Grishchenko, Geng Yan, Eduard Gabriel Bazavan, Andrei Zanfir, Nikolai Chinaev, Karthik Raveendran, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present Blendshapes GHUM, an on-device ML pipeline that predicts 52 facial blendshape coefficients at 30+ FPS on modern mobile phones, from a single monocular RGB image and enables facial motion capture applications like virtual avatars. Our main contributions are: i) an annotation-free offline method for obtaining blendshape coefficients from real-world human scans, ii) a lightweight real-time… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 4 pages, 3 figures

  8. arXiv:2308.01854  [pdf, other

    cs.CV

    Reconstructing Three-Dimensional Models of Interacting Humans

    Authors: Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad Olaru, Cristian Sminchisescu

    Abstract: Understanding 3d human interactions is fundamental for fine-grained scene analysis and behavioural modeling. However, most of the existing models predict incorrect, lifeless 3d estimates, that miss the subtle human contact aspects--the essence of the event--and are of little use for detailed behavioral understanding. This paper addresses such issues with several contributions: (1) we introduce mod… ▽ More

    Submitted 4 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

  9. arXiv:2306.09329  [pdf, other

    cs.CV

    DreamHuman: Animatable 3D Avatars from Text

    Authors: Nikos Kolotouros, Thiemo Alldieck, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Fieraru, Cristian Sminchisescu

    Abstract: We present DreamHuman, a method to generate realistic animatable 3D human avatar models solely from textual descriptions. Recent text-to-3D methods have made considerable strides in generation, but are still lacking in important aspects. Control and often spatial resolution remain limited, existing methods produce fixed rather than animated 3D human models, and anthropometric consistency for compl… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Project website at https://dream-human.github.io/

  10. arXiv:2212.07729  [pdf, other

    cs.CV

    HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving

    Authors: Andrei Zanfir, Mihai Zanfir, Alexander Gorban, Jingwei Ji, Yin Zhou, Dragomir Anguelov, Cristian Sminchisescu

    Abstract: Autonomous driving is an exciting new industry, posing important research questions. Within the perception module, 3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians. While hardware systems and sensors have dramatically improved over the decades -- with cars potentially boasting comp… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Published at the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand

  11. arXiv:2212.07275  [pdf, other

    cs.CV

    PhoMoH: Implicit Photorealistic 3D Models of Human Heads

    Authors: Mihai Zanfir, Thiemo Alldieck, Cristian Sminchisescu

    Abstract: We present PhoMoH, a neural network methodology to construct generative models of photo-realistic 3D geometry and appearance of human heads including hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH models the human head using neural fields, thus supporting complex topology. Instead of learning a head model from scratch, we propose to augment an existing expressive hea… ▽ More

    Submitted 24 October, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: To be published at the International Conference on 3D Vision 2024

  12. arXiv:2212.06820  [pdf, other

    cs.CV

    Structured 3D Features for Reconstructing Controllable Avatars

    Authors: Enric Corona, Mihai Zanfir, Thiemo Alldieck, Eduard Gabriel Bazavan, Andrei Zanfir, Cristian Sminchisescu

    Abstract: We introduce Structured 3D Features, a model based on a novel implicit 3D representation that pools pixel-aligned image features onto dense 3D points sampled from a parametric, statistical human mesh surface. The 3D points have associated semantics and can move freely in 3D space. This allows for optimal coverage of the person of interest, beyond just the body shape, which in turn, additionally he… ▽ More

    Submitted 15 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted at CVPR 2023. Project page: https://enriccorona.github.io/s3f/, Video: https://www.youtube.com/watch?v=mcZGcQ6L-2s

  13. arXiv:2212.01055  [pdf, other

    cs.CV

    Transformer-Based Learned Optimization

    Authors: Erik Gärtner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman, Cristian Sminchisescu

    Abstract: We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic B… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR) in Vancouver, Canada

  14. arXiv:2206.11678  [pdf, other

    cs.CV

    BlazePose GHUM Holistic: Real-time 3D Human Landmarks and Pose Estimation

    Authors: Ivan Grishchenko, Valentin Bazarevsky, Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, Richard Yee, Karthik Raveendran, Matsvei Zhdanovich, Matthias Grundmann, Cristian Sminchisescu

    Abstract: We present BlazePose GHUM Holistic, a lightweight neural network pipeline for 3D human body landmarks and pose estimation, specifically tailored to real-time on-device inference. BlazePose GHUM Holistic enables motion capture from a single RGB image including avatar control, fitness tracking and AR/VR effects. Our main contributions include i) a novel method for 3D ground truth data acquisition, i… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 4 pages, 4 figures; CVPR Workshop on Computer Vision for Augmented and Virtual Reality, New Orleans, LA, 2022

  15. arXiv:2205.12292  [pdf, other

    cs.CV

    Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video

    Authors: Erik Gärtner, Mykhaylo Andriluka, Hongyi Xu, Cristian Sminchisescu

    Abstract: We focus on the task of estimating a physically plausible articulated human motion from monocular video. Existing approaches that do not consider physics often produce temporally inconsistent output with motion artifacts, while state-of-the-art physics-based approaches have either been shown to work only in controlled laboratory conditions or consider simplified body-ground contact limited to feet… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022

  16. arXiv:2205.12256  [pdf, other

    cs.CV

    Differentiable Dynamics for Articulated 3d Human Motion Reconstruction

    Authors: Erik Gärtner, Mykhaylo Andriluka, Erwin Coumans, Cristian Sminchisescu

    Abstract: We introduce DiffPhy, a differentiable physics-based model for articulated 3d human motion reconstruction from video. Applications of physics-based reasoning in human motion analysis have so far been limited, both by the complexity of constructing adequate physical models of articulated human motion, and by the formidable challenges of performing stable and efficient inference with physics in the… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022

  17. arXiv:2204.08906  [pdf, other

    cs.CV

    Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing

    Authors: Thiemo Alldieck, Mihai Zanfir, Cristian Sminchisescu

    Abstract: We present PHORHUM, a novel, end-to-end trainable, deep neural network methodology for photorealistic 3D human reconstruction given just a monocular RGB image. Our pixel-aligned method estimates detailed 3D geometry and, for the first time, the unshaded surface color together with the scene illumination. Observing that 3D supervision alone is not sufficient for high fidelity color reconstruction,… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: https://phorhum.github.io/

  18. arXiv:2204.06950  [pdf, other

    cs.CV

    BEHAVE: Dataset and Method for Tracking Human Object Interactions

    Authors: Bharat Lal Bhatnagar, Xianghui Xie, Ilya A. Petrov, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

    Abstract: Modelling interactions between humans and objects in natural environments is central to many applications including gaming, virtual and mixed reality, as well as human behavior analysis and human-robot collaboration. This challenging operation scenario requires generalization to vast number of objects, scenes, and human actions. Unfortunately, there exist no such dataset. Moreover, this data needs… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: Accepted at CVPR'22

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022

  19. arXiv:2204.03353  [pdf, other

    cs.CV

    Learning Online Multi-Sensor Depth Fusion

    Authors: Erik Sandström, Martin R. Oswald, Suryansh Kumar, Silvan Weder, Fisher Yu, Cristian Sminchisescu, Luc Van Gool

    Abstract: Many hand-held or mixed reality devices are used with a single sensor for 3D reconstruction, although they often comprise multiple sensors. Multi-sensor depth fusion is able to substantially improve the robustness and accuracy of 3D reconstruction methods, but existing techniques are not robust enough to handle sensors which operate with diverse value ranges as well as noise and outlier statistics… ▽ More

    Submitted 21 September, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: Accepted to ECCV 2022. 31 pages, 17 figures, 15 Tables

  20. arXiv:2203.01035  [pdf, other

    cs.LG

    Discriminating Against Unrealistic Interpolations in Generative Adversarial Networks

    Authors: Henning Petzka, Ted Kronvall, Cristian Sminchisescu

    Abstract: Interpolations in the latent space of deep generative models is one of the standard tools to synthesize semantically meaningful mixtures of generated samples. As the generator function is non-linear, commonly used linear interpolations in the latent space do not yield the shortest paths in the sample space, resulting in non-smooth interpolations. Recent work has therefore equipped the latent space… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: The first two authors made equal contribution

  21. arXiv:2112.14084  [pdf, other

    cs.CV

    Embodied Learning for Lifelong Visual Perception

    Authors: David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu

    Abstract: We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception capabilities. The purpose of the agents is to recognize objects and other semantic classes in the whole building at the end of a process that combines exploration a… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  22. arXiv:2112.12867  [pdf, other

    cs.CV

    HSPACE: Synthetic Parametric Humans Animated in Complex Environments

    Authors: Eduard Gabriel Bazavan, Andrei Zanfir, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: Advances in the state of the art for 3d human sensing are currently limited by the lack of visual datasets with 3d ground truth, including multiple people, in motion, operating in real-world environments, with complex illumination or occlusion, and potentially observed by a moving camera. Sophisticated scene understanding would require estimating human pose and shape as well as gestures, towards r… ▽ More

    Submitted 6 January, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

  23. arXiv:2110.13746  [pdf, other

    cs.CV

    H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion

    Authors: Hongyi Xu, Thiemo Alldieck, Cristian Sminchisescu

    Abstract: We present neural radiance fields for rendering and temporal (4D) reconstruction of humans in motion (H-NeRF), as captured by a sparse set of cameras or even from a monocular video. Our approach combines ideas from neural scene representation, novel-view synthesis, and implicit statistical geometric human representations, coupled using novel loss functions. Instead of learning a radiance field wit… ▽ More

    Submitted 2 November, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

  24. arXiv:2108.10842  [pdf, other

    cs.CV cs.LG

    imGHUM: Implicit Generative Models of 3D Human Shape and Articulated Pose

    Authors: Thiemo Alldieck, Hongyi Xu, Cristian Sminchisescu

    Abstract: We present imGHUM, the first holistic generative model of 3D human shape and articulated pose, represented as a signed distance function. In contrast to prior work, we model the full human body implicitly as a function zero-level-set and without the use of an explicit template mesh. We propose a novel network architecture and a learning paradigm, which make it possible to learn a detailed implicit… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  25. arXiv:2108.05246  [pdf, other

    cs.CV

    A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes

    Authors: Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstrom, Cristian Sminchisescu, Luc Van Gool

    Abstract: This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label. Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed deep neural network based approach learns to fuse the depth over frames with suitable semantic labels in the scene space. Our approach exploits the joint volumetric representatio… ▽ More

    Submitted 28 December, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted for publication at IEEE Robotics and Automation Letters (RA-L), 2022. Draft info: 9 pages, 5 figures, 4 tables

  26. arXiv:2106.13365  [pdf, other

    cs.CV

    RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection

    Authors: Pei Sun, Weiyue Wang, Yuning Chai, Gamaleldin Elsayed, Alex Bewley, Xiao Zhang, Cristian Sminchisescu, Dragomir Anguelov

    Abstract: The detection of 3D objects from LiDAR data is a critical component in most autonomous driving systems. Safe, high speed driving needs larger detection ranges, which are enabled by new LiDARs. These larger detection ranges require more efficient and accurate detection models. Towards this goal, we propose Range Sparse Net (RSN), a simple, efficient, and accurate 3D object detector in order to tack… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Journal ref: CVPR 2021

  27. arXiv:2106.09336  [pdf, other

    cs.CV

    THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers

    Authors: Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predictive power of model-free-output architectures and the regularizing, anthropometrically-preserving properties of a statistical human surface model like… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  28. arXiv:2104.02631  [pdf, other

    cs.CV

    Local Metrics for Multi-Object Tracking

    Authors: Jack Valmadre, Alex Bewley, Jonathan Huang, Chen Sun, Cristian Sminchisescu, Cordelia Schmid

    Abstract: This paper introduces temporally local metrics for Multi-Object Tracking. These metrics are obtained by restricting existing metrics based on track matching to a finite temporal horizon, and provide new insight into the ability of trackers to maintain identity over time. Moreover, the horizon parameter offers a novel, meaningful mechanism by which to define the relative importance of detection and… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  29. arXiv:2012.10366  [pdf, other

    cs.CV

    Learning Complex 3D Human Self-Contact

    Authors: Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad Olaru, Cristian Sminchisescu

    Abstract: Monocular estimation of three dimensional human self-contact is fundamental for detailed scene analysis including body language understanding and behaviour modeling. Existing 3d reconstruction methods do not focus on body regions in self-contact and consequently recover configurations that are either far from each other or self-intersecting, when they should just touch. This leads to perceptually… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: To be published in the Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-2021)

  30. arXiv:2012.09503  [pdf, other

    cs.CV

    Embodied Visual Active Learning for Semantic Segmentation

    Authors: David Nilsson, Aleksis Pirinen, Erik Gärtner, Cristian Sminchisescu

    Abstract: We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some benchmarks, today's deep visual recognition pipelines tend to not generalize well in certain real-world scenarios, or for unusual viewpoints. Robotic perception… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: Accepted to AAAI 2021

  31. arXiv:2010.12447  [pdf, other

    cs.CV

    LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration

    Authors: Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

    Abstract: We address the problem of fitting 3D human models to 3D scans of dressed humans. Classical methods optimize both the data-to-model correspondences and the human model parameters (pose and shape), but are reliable only when initialized close to the solution. Some methods initialize the optimization based on fully supervised correspondence predictors, which is not differentiable end-to-end, and can… ▽ More

    Submitted 26 November, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: NeurIPS'20 (Oral)

    Journal ref: NeurIPS 2020

  32. arXiv:2008.06910  [pdf, other

    cs.CV

    Neural Descent for Visual 3D Human Pose and Shape

    Authors: Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William T. Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct its pose and shape state in a self-supervised regime. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNe… ▽ More

    Submitted 14 June, 2021; v1 submitted 16 August, 2020; originally announced August 2020.

    Comments: CVPR 2021

  33. arXiv:2007.11432  [pdf, other

    cs.CV

    Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction

    Authors: Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

    Abstract: Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces. However, they can only produce static surfaces that are not controllable, which provides limited ability to modify the resulting model by editing its pose or shape parameters. Nevertheless, such features are essential in building flexible models for both computer graphics and computer vision… ▽ More

    Submitted 26 November, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

    Comments: Accepted at ECCV'20 (Oral)

  34. arXiv:2006.12807  [pdf, other

    cs.LG cs.CV stat.ML

    Post-hoc Calibration of Neural Networks by g-Layers

    Authors: Amir Rahimi, Thomas Mensink, Kartik Gupta, Thalaiyasingam Ajanthan, Cristian Sminchisescu, Richard Hartley

    Abstract: Calibration of neural networks is a critical aspect to consider when incorporating machine learning models in real-world decision-making systems where the confidence of decisions are equally important as the decisions themselves. In recent years, there is a surge of research on neural network calibration and the majority of the works can be categorized into post-hoc calibration methods, defined as… ▽ More

    Submitted 21 February, 2022; v1 submitted 23 June, 2020; originally announced June 2020.

  35. arXiv:2006.12800  [pdf, other

    cs.LG cs.CV stat.ML

    Calibration of Neural Networks using Splines

    Authors: Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

    Abstract: Calibrating neural networks is of utmost importance when employing them in safety-critical applications where the downstream decision making depends on the predicted probabilities. Measuring calibration error amounts to comparing two empirical distributions. In this work, we introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test in which the… ▽ More

    Submitted 29 December, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

    Comments: ICLR 2021

  36. arXiv:2006.11242  [pdf, other

    cs.CV

    Consistency Guided Scene Flow Estimation

    Authors: Yuhua Chen, Luc Van Gool, Cordelia Schmid, Cristian Sminchisescu

    Abstract: Consistency Guided Scene Flow Estimation (CGSF) is a self-supervised framework for the joint reconstruction of 3D scene structure and motion from stereo video. The model takes two temporal stereo pairs as input, and predicts disparity and scene flow. The model self-adapts at test time by iteratively refining its predictions. The refinement process is guided by a consistency loss, which combines st… ▽ More

    Submitted 17 August, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

  37. arXiv:2005.09927  [pdf, other

    cs.CV cs.LG cs.RO

    Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection

    Authors: Alex Bewley, Pei Sun, Thomas Mensink, Dragomir Anguelov, Cristian Sminchisescu

    Abstract: This paper presents a novel 3D object detection framework that processes LiDAR data directly on its native representation: range images. Benefiting from the compactness of range images, 2D convolutions can efficiently process dense LiDAR data of a scene. To overcome scale sensitivity in this perspective view, a novel range-conditioned dilation (RCD) layer is proposed to dynamically adjust a contin… ▽ More

    Submitted 22 January, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: CoRL 2020

  38. arXiv:2003.10350  [pdf, other

    cs.CV

    Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

    Authors: Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, Bill Freeman, Rahul Sukthankar, Cristian Sminchisescu

    Abstract: Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning in complex visual scenes. In this paper we present practical semi-supervised and self-supervised models that support training and good generalization in real-world images and video. Our formulation is based o… ▽ More

    Submitted 22 August, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

    Journal ref: ECCV 2020

  39. arXiv:2001.02024  [pdf, other

    cs.CV

    Deep Reinforcement Learning for Active Human Pose Estimation

    Authors: Erik Gärtner, Aleksis Pirinen, Cristian Sminchisescu

    Abstract: Most 3d human pose estimation methods assume that input -- be it images of a scene collected from one or several viewpoints, or from a video -- is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore th… ▽ More

    Submitted 16 December, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: Accepted to The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). Submission updated to include supplementary material

  40. arXiv:2001.00939  [pdf, other

    cs.LG stat.ML

    Relative Flatness and Generalization

    Authors: Henning Petzka, Michael Kamp, Linara Adilova, Cristian Sminchisescu, Mario Boley

    Abstract: Flatness of the loss curve is conjectured to be connected to the generalization ability of machine learning models, in particular neural networks. While it has been empirically observed that flatness measures consistently correlate strongly with generalization, it is still an open theoretical problem why and under which circumstances flatness is connected to generalization, in particular in light… ▽ More

    Submitted 4 November, 2021; v1 submitted 3 January, 2020; originally announced January 2020.

    Comments: The first two authors made equal contribution; Accepted for publication at NeurIPS 2021; arXiv admin note: substantial text overlap with arXiv:1912.00058

  41. arXiv:1912.00058  [pdf, other

    cs.LG stat.ML

    A Reparameterization-Invariant Flatness Measure for Deep Neural Networks

    Authors: Henning Petzka, Linara Adilova, Michael Kamp, Cristian Sminchisescu

    Abstract: The performance of deep neural networks is often attributed to their automated, task-related feature construction. It remains an open question, though, why this leads to solutions with good generalization, even in cases where the number of parameters is larger than the number of samples. Back in the 90s, Hochreiter and Schmidhuber observed that flatness of the loss surface around a local minimum c… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 14 pages; accepted at Workshop "Science meets Engineering of Deep Learning", 33rd Conference on Neural Information Processing Systems (NeurIPS 2019)

  42. arXiv:1909.10307  [pdf, other

    cs.CV

    Human Synthesis and Scene Compositing

    Authors: Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Andrei Zanfir, Cristian Sminchisescu

    Abstract: Generating good quality and geometrically plausible synthetic images of humans with the ability to control appearance, pose and shape parameters, has become increasingly important for a variety of tasks ranging from photo editing, fashion virtual try-on, to special effects and image compression. In this paper, we propose HUSC, a HUman Synthesis and Scene Compositing framework for the realistic syn… ▽ More

    Submitted 18 October, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

  43. arXiv:1907.05820  [pdf, other

    cs.CV

    Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

    Authors: Yuhua Chen, Cordelia Schmid, Cristian Sminchisescu

    Abstract: We present GLNet, a self-supervised framework for learning depth, optical flow, camera pose and intrinsic parameters from monocular video - addressing the difficulty of acquiring realistic ground-truth for such tasks. We propose three contributions: 1) we design new loss functions that capture multiple geometric constraints (eg. epipolar geometry) as well as an adaptive photometric loss that suppo… ▽ More

    Submitted 9 September, 2019; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: ICCV'19 camera ready

  44. arXiv:1812.06486  [pdf, other

    cs.LG stat.ML

    Non-attracting Regions of Local Minima in Deep and Wide Neural Networks

    Authors: Henning Petzka, Cristian Sminchisescu

    Abstract: Understanding the loss surface of neural networks is essential for the design of models with predictable performance and their success in applications. Experimental results suggest that sufficiently deep and wide neural networks are not negatively impacted by suboptimal local minima. Despite recent progress, the reason for this outcome is not fully understood. Could deep networks have very few, if… ▽ More

    Submitted 31 August, 2020; v1 submitted 16 December, 2018; originally announced December 2018.

  45. arXiv:1701.08985  [pdf, other

    cs.CV

    Deep Multitask Architecture for Integrated 2D and 3D Human Sensing

    Authors: Alin-Ionut Popa, Mihai Zanfir, Cristian Sminchisescu

    Abstract: We propose a deep multitask architecture for \emph{fully automatic 2d and 3d human sensing} (DMHS), including \emph{recognition and reconstruction}, in \emph{monocular images}. The system computes the figure-ground segmentation, semantically identifies the human body parts at pixel level, and estimates the 2d and 3d pose of the person. The model supports the joint training of all components by mea… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

  46. arXiv:1612.08871  [pdf, other

    cs.CV

    Semantic Video Segmentation by Gated Recurrent Flow Propagation

    Authors: David Nilsson, Cristian Sminchisescu

    Abstract: Semantic video segmentation is challenging due to the sheer amount of data that needs to be processed and labeled in order to construct accurate models. In this paper we present a deep, end-to-end trainable methodology to video segmentation that is capable of leveraging information present in unlabeled data in order to improve semantic estimates. Our model combines a convolutional architecture and… ▽ More

    Submitted 2 October, 2017; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: The experiments section is extended compared to the previous version

  47. arXiv:1610.04997  [pdf, other

    cs.CV

    Spatio-Temporal Attention Models for Grounded Video Captioning

    Authors: Mihai Zanfir, Elisabeta Marinoiu, Cristian Sminchisescu

    Abstract: Automatic video captioning is challenging due to the complex interactions in dynamic real scenes. A comprehensive system would ultimately localize and track the objects, actions and interactions present in a video and generate a description that relies on temporal localization in order to ground the visual concepts. However, most existing automatic video captioning systems map from raw video data… ▽ More

    Submitted 18 October, 2016; v1 submitted 17 October, 2016; originally announced October 2016.

    Comments: To appear in Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, 2016

  48. arXiv:1509.07838  [pdf, other

    cs.CV cs.AI

    Training Deep Networks with Structured Layers by Matrix Backpropagation

    Authors: Catalin Ionescu, Orestis Vantzos, Cristian Sminchisescu

    Abstract: Deep neural network architectures have recently produced excellent results in a variety of areas in artificial intelligence and visual recognition, well surpassing traditional shallow architectures trained using hand-designed features. The power of deep networks stems both from their ability to perform local computations followed by pointwise non-linearities over increasingly larger receptive fiel… ▽ More

    Submitted 14 April, 2016; v1 submitted 25 September, 2015; originally announced September 2015.

    Comments: This is an extended version of our ICCV 2015 article

  49. arXiv:1509.06004  [pdf, other

    cs.CV

    A Parallel Framework for Parametric Maximum Flow Problems in Image Segmentation

    Authors: Vlad Olaru, Mihai Florea, Cristian Sminchisescu

    Abstract: This paper presents a framework that supports the implementation of parallel solutions for the widespread parametric maximum flow computational routines used in image segmentation algorithms. The framework is based on supergraphs, a special construction combining several image graphs into a larger one, and works on various architectures (multi-core or GPU), either locally or remotely in a cluster… ▽ More

    Submitted 7 December, 2015; v1 submitted 20 September, 2015; originally announced September 2015.

  50. arXiv:1502.01761  [pdf, other

    cs.CV

    A Framework for Symmetric Part Detection in Cluttered Scenes

    Authors: Tom Lee, Sanja Fidler, Alex Levinshtein, Cristian Sminchisescu, Sven Dickinson

    Abstract: The role of symmetry in computer vision has waxed and waned in importance during the evolution of the field from its earliest days. At first figuring prominently in support of bottom-up indexing, it fell out of favor as shape gave way to appearance and recognition gave way to detection. With a strong prior in the form of a target object, the role of the weaker priors offered by perceptual grouping… ▽ More

    Submitted 5 February, 2015; originally announced February 2015.

    Comments: 10 pages, 8 figures