Skip to main content

Showing 1–29 of 29 results for author: Pirk, S

  1. arXiv:2406.09371  [pdf, other

    cs.CV cs.LG

    LRM-Zero: Training Large Reconstruction Models with Synthesized Data

    Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

    Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 23 pages, 8 figures. Our code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/

  2. arXiv:2404.16292  [pdf, other

    cs.GR cs.CV cs.LG

    One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns

    Authors: Arman Maesumi, Dylan Hu, Krishi Saripalli, Vladimir G. Kim, Matthew Fisher, Sören Pirk, Daniel Ritchie

    Abstract: Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages

  3. arXiv:2404.00593  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    LAESI: Leaf Area Estimation with Synthetic Imagery

    Authors: Jacek Kałużny, Yannik Schreckenberg, Karol Cyganik, Peter Annighöfer, Sören Pirk, Dominik L. Michels, Mikolaj Cieslak, Farhah Assaad-Gerbert, Bedrich Benes, Wojciech Pałubicki

    Abstract: We introduce LAESI, a Synthetic Leaf Dataset of 100,000 synthetic leaf images on millimeter paper, each with semantic masks and surface area labels. This dataset provides a resource for leaf morphology analysis primarily aimed at beech and oak leaves. We evaluate the applicability of the dataset by training machine learning models for leaf surface area prediction and semantic segmentation, using r… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages, 12 figures, 1 table

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.4.6

  4. arXiv:2403.18351  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Generating Diverse Agricultural Data for Vision-Based Farming Applications

    Authors: Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hädrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Sören Pirk, Chia-Chun Fu, Wojciech Pałubicki

    Abstract: We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural gene… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 10 pages, 8 figures, 3 tables

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.4.6

  5. arXiv:2402.03287  [pdf, other

    cs.LG physics.comp-ph

    A Lennard-Jones Layer for Distribution Normalization

    Authors: Mulun Na, Jonathan Klein, Biao Zhang, Wojtek Pałubicki, Sören Pirk, Dominik L. Michels

    Abstract: We introduce the Lennard-Jones layer (LJL) for the equalization of the density of 2D and 3D point clouds through systematically rearranging points without destroying their overall structure (distribution normalization). LJL simulates a dissipative process of repulsive and weakly attractive interactions between individual points by considering the nearest neighbor of each point at a given moment in… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Upon request, we are happy to share the source code to generate the results presented in this paper. Please contact the first or the last author of this manuscript

    MSC Class: 68T07 ACM Class: I.2; I.3.5

  6. arXiv:2402.02570  [pdf, other

    cs.RO cs.LG

    Gazebo Plants: Simulating Plant-Robot Interaction with Cosserat Rods

    Authors: Junchen Deng, Samhita Marri, Jonathan Klein, Wojtek Pałubicki, Sören Pirk, Girish Chowdhary, Dominik L. Michels

    Abstract: Robotic harvesting has the potential to positively impact agricultural productivity, reduce costs, improve food quality, enhance sustainability, and to address labor shortage. In the rapidly advancing field of agricultural robotics, the necessity of training robots in a virtual environment has become essential. Generating training data to automatize the underlying computer vision tasks such as ima… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Upon request, we are happy to share our GazeboPlants plugin open-source (MPL 2.0)

    ACM Class: I.6.3; I.6.m

  7. arXiv:2312.13980  [pdf, other

    cs.CV cs.LG

    Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

    Authors: Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman

    Abstract: Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benef… ▽ More

    Submitted 9 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 22 pages, 16 figures. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d. This paper has been accepted to CVPR 2024. v2: incorporated changes from the CVPR 2024 camera-ready version

  8. arXiv:2310.19188  [pdf, other

    cs.CV

    3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets

    Authors: Ta-Ying Cheng, Matheus Gadelha, Soren Pirk, Thibault Groueix, Radomir Mech, Andrew Markham, Niki Trigoni

    Abstract: We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image repres… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: In ICCV 2023

  9. arXiv:2308.11617  [pdf, other

    cs.CV

    GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency

    Authors: Omid Taheri, Yi Zhou, Dimitrios Tzionas, Yang Zhou, Duygu Ceylan, Soren Pirk, Michael J. Black

    Abstract: Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. Consequently, modeling realistic hand-object interactions, including the subtle motion of individual fingers, is critical for applications in computer graphics, computer vision, and mixed reality. Prior work on capturing and modeling humans interacting with objects in 3… ▽ More

    Submitted 15 July, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: The project has been started during Omid Taheri's internship at Adobe and as a collaboration with the Max Planck Institute for Intelligent Systems

  10. arXiv:2307.01425  [pdf, other

    cs.CV

    Consistent Multimodal Generation via A Unified GAN Framework

    Authors: Zhen Zhu, Yijun Li, Weijie Lyu, Krishna Kumar Singh, Zhixin Shu, Soeren Pirk, Derek Hoiem

    Abstract: We investigate how to generate multimodal image outputs, such as RGB, depth, and surface normals, with a single generative model. The challenge is to produce outputs that are realistic, and also consistent with each other. Our solution builds on the StyleGAN3 architecture, with a shared backbone and modality-specific branches in the last layers of the synthesis network, and we propose per-modality… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: In review

  11. arXiv:2306.16740  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Principles and Guidelines for Evaluating Social Robot Navigation Algorithms

    Authors: Anthony Francis, Claudia Pérez-D'Arpino, Chengshu Li, Fei Xia, Alexandre Alahi, Rachid Alami, Aniket Bera, Abhijat Biswas, Joydeep Biswas, Rohan Chandra, Hao-Tien Lewis Chiang, Michael Everett, Sehoon Ha, Justin Hart, Jonathan P. How, Haresh Karnan, Tsang-Wei Edward Lee, Luis J. Manso, Reuth Mirksy, Sören Pirk, Phani Teja Singamaneni, Peter Stone, Ada V. Taylor, Peter Trautman, Nathan Tsoi , et al. (6 additional authors not shown)

    Abstract: A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agent… ▽ More

    Submitted 19 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 42 pages, 11 figures, 6 tables

    ACM Class: I.2.9

  12. arXiv:2305.05153  [pdf, ps, other

    cs.LG cs.CV cs.GR

    DeepTree: Modeling Trees with Situated Latents

    Authors: Xiaochen Zhou, Bosheng Li, Bedrich Benes, Songlin Fei, Sören Pirk

    Abstract: In this paper, we propose DeepTree, a novel method for modeling trees based on learning developmental rules for branching structures instead of manually defining them. We call our deep neural model situated latent because its behavior is determined by the intrinsic state -- encoded as a latent space of a deep neural model -- and by the extrinsic (environmental) data that is situated as the locatio… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  13. arXiv:2210.08113  [pdf, other

    cs.CV

    Instance Segmentation with Cross-Modal Consistency

    Authors: Alex Zihao Zhu, Vincent Casser, Reza Mahjourian, Henrik Kretzschmar, Sören Pirk

    Abstract: Segmenting object instances is a key task in machine perception, with safety-critical applications in robotics and autonomous driving. We introduce a novel approach to instance segmentation that jointly leverages measurements from multiple sensor modalities, such as cameras and LiDAR. Our method learns to predict embeddings for each pixel or point that give rise to a dense segmentation of the scen… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: 8 pages, 9 figures, 5 tables. Presented at IROS 2022

  14. arXiv:2209.09375  [pdf, other

    cs.RO cs.CV

    Gesture2Path: Imitation Learning for Gesture-aware Navigation

    Authors: Catie Cuan, Edward Lee, Emre Fisher, Anthony Francis, Leila Takayama, Tingnan Zhang, Alexander Toshev, Sören Pirk

    Abstract: As robots increasingly enter human-centered environments, they must not only be able to navigate safely around humans, but also adhere to complex social norms. Humans often rely on non-verbal communication through gestures and facial expressions when navigating around other people, especially in densely occupied spaces. Consequently, robots also need to be able to interpret gestures as part of sol… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 8 pages, 12 figures

  15. arXiv:2204.05443  [pdf, other

    cs.RO cs.HC

    A Protocol for Validating Social Navigation Policies

    Authors: Sören Pirk, Edward Lee, Xuesu Xiao, Leila Takayama, Anthony Francis, Alexander Toshev

    Abstract: Enabling socially acceptable behavior for situated agents is a major goal of recent robotics research. Robots should not only operate safely around humans, but also abide by complex social norms. A key challenge for developing socially-compliant policies is measuring the quality of their behavior. Social behavior is enormously complex, making it difficult to create reliable metrics to gauge the pe… ▽ More

    Submitted 29 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: IEEE International Conference on Robotics and Automation; Workshop: Social Robot Navigation: Advances and Evaluation

  16. arXiv:2203.15041  [pdf, other

    cs.RO cs.CV cs.LG eess.SY

    Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

    Authors: Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

    Abstract: Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans. With the emergence of autonomously navigating mobile robots in human populated environments (e.g., domestic service robots in homes and restaurants and food delivery robots on public sidewalks), incorporating socially… ▽ More

    Submitted 8 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Journal ref: Robotics and Automation Letters (RA-L) 2022

  17. arXiv:2008.05567  [pdf, other

    cs.GR cs.CV cs.LG

    Procedural Urban Forestry

    Authors: Till Niese, Sören Pirk, Matthias Albrecht, Bedrich Benes, Oliver Deussen

    Abstract: The placement of vegetation plays a central role in the realism of virtual scenes. We introduce procedural placement models (PPMs) for vegetation in urban layouts. PPMs are environmentally sensitive to city geometry and allow identifying plausible plant positions based on structural and functional zones in an urban layout. PPMs can either be directly used by defining their parameters or can be lea… ▽ More

    Submitted 13 August, 2020; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: 14 pages

  18. arXiv:2006.09322  [pdf, other

    cs.CV cs.LG eess.IV

    Domain Adaptation with Morphologic Segmentation

    Authors: Jonathan Klein, Sören Pirk, Dominik L. Michels

    Abstract: We present a novel domain adaptation framework that uses morphologic segmentation to translate images from arbitrary input domains (real and synthetic) into a uniform output domain. Our framework is based on an established image-to-image translation pipeline that allows us to first transform the input image into a generalized representation that encodes morphology and semantics - the edge-plus-seg… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: This work has been supported by KAUST under individual baseline funding

    MSC Class: cs.LG; cs.AI

  19. arXiv:2006.04843  [pdf, other

    cs.RO cs.LG

    Modeling Long-horizon Tasks as Sequential Interaction Landscapes

    Authors: Sören Pirk, Karol Hausman, Alexander Toshev, Mohi Khansari

    Abstract: Complex object manipulation tasks often span over long sequences of operations. Task planning over long-time horizons is a challenging and open problem in robotics, and its complexity grows exponentially with an increasing number of subtasks. In this paper we present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos. We repre… ▽ More

    Submitted 23 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Published at 4th Conference on Robot Learning (CoRL 2020), Cambridge MA, USA More details available at: http://www.pirk.io

  20. arXiv:2006.03897  [pdf, other

    physics.comp-ph cs.LG

    Accurately Solving Physical Systems with Graph Learning

    Authors: Han Shao, Tassilo Kugelstadt, Torsten Hädrich, Wojciech Pałubicki, Jan Bender, Sören Pirk, Dominik L. Michels

    Abstract: Iterative solvers are widely used to accurately simulate physical systems. These solvers require initial guesses to generate a sequence of improving approximate solutions. In this contribution, we introduce a novel method to accelerate iterative solvers for physical systems with graph networks (GNs) by predicting the initial guesses to reduce the number of iterations. Unlike existing methods that… ▽ More

    Submitted 13 January, 2021; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: This work has been supported by KAUST under individual baseline and center partnership funding

    MSC Class: Machine Learning (cs.LG); Machine Learning (stat.ML)

  21. arXiv:2005.07289  [pdf, other

    cs.CV cs.LG

    Taskology: Utilizing Task Relations at Scale

    Authors: Yao Lu, Sören Pirk, Jan Dlabal, Anthony Brohan, Ankita Pasad, Zhao Chen, Vincent Casser, Anelia Angelova, Ariel Gordon

    Abstract: Many computer vision tasks address the problem of scene understanding and are naturally interrelated e.g. object classification, detection, scene segmentation, depth estimation, etc. We show that we can leverage the inherent relationships among collections of tasks, as they are trained jointly, supervising each other through their known relationships via consistency losses. Furthermore, explicitly… ▽ More

    Submitted 17 March, 2021; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: IEEE Conference on Computer Vision and Pattern Recognition, 2021

  22. arXiv:1906.08989  [pdf, other

    cs.RO cs.CV

    Data-Efficient Learning for Sim-to-Real Robotic Grasping using Deep Point Cloud Prediction Networks

    Authors: Xinchen Yan, Mohi Khansari, Jasmine Hsu, Yuanzheng Gong, Yunfei Bai, Sören Pirk, Honglak Lee

    Abstract: Training a deep network policy for robot manipulation is notoriously costly and time consuming as it depends on collecting a significant amount of real world data. To work well in the real world, the policy needs to see many instances of the task, including various object arrangements in the scene as well as variations in object geometry, texture, material, and environmental illumination. In thi… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

  23. arXiv:1906.05717  [pdf, other

    cs.CV cs.RO

    Unsupervised Monocular Depth and Ego-motion Learning with Structure and Semantics

    Authors: Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova

    Abstract: We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion. More specifically, we model the motion of individual objects and learn their 3D motion vector jointly with depth and ego-motion. We obtain more accurate results, especially for challenging dynamic scenes not addressed by previous approaches. This is an extended… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Comments: CVPR Workshop on Visual Odometry & Computer Vision Applications Based on Location Clues (VOCVALC), 2019. This is an extension of arXiv:1811.06152: Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos. Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19)

  24. arXiv:1906.04312  [pdf, other

    cs.CV cs.LG cs.RO

    Online Object Representations with Contrastive Learning

    Authors: Sören Pirk, Mohi Khansari, Yunfei Bai, Corey Lynch, Pierre Sermanet

    Abstract: We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics. The main contributions of this paper are: 1) a self-supervising objective trained with contrastive learning that can discover and disentangle object attributes from video without using any labels; 2) we leverage object… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: 10 pages

  25. arXiv:1811.11358  [pdf, other

    cs.CV

    Future Segmentation Using 3D Structure

    Authors: Suhani Vora, Reza Mahjourian, Soeren Pirk, Anelia Angelova

    Abstract: Predicting the future to anticipate the outcome of events and actions is a critical attribute of autonomous agents; particularly for agents which must rely heavily on real time visual data for decision making. Working towards this capability, we address the task of predicting future frame segmentation from a stream of monocular video by leveraging the 3D structure of the scene. Our framework is ba… ▽ More

    Submitted 27 November, 2018; originally announced November 2018.

  26. arXiv:1811.06152  [pdf, other

    cs.CV

    Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos

    Authors: Vincent Casser, Soeren Pirk, Reza Mahjourian, Anelia Angelova

    Abstract: Learning to predict scene depth from RGB inputs is a challenging task both for indoor and outdoor robot navigation. In this work we address unsupervised learning of scene depth and robot ego-motion where supervision is provided by monocular videos, as cameras are the cheapest, least restrictive and most ubiquitous sensor for robotics. Previous work in unsupervised image-to-depth learning has est… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19)

  27. arXiv:1808.01337  [pdf, other

    cs.CV

    Parsing Geometry Using Structure-Aware Shape Templates

    Authors: Vignesh Ganapathi-Subramanian, Olga Diamanti, Soeren Pirk, Chengcheng Tang, Matthias Niessner, Leonidas J. Guibas

    Abstract: Real-life man-made objects often exhibit strong and easily-identifiable structure, as a direct result of their design or their intended functionality. Structure typically appears in the form of individual parts and their arrangement. Knowing about object structure can be an important cue for object recognition and scene understanding - a key goal for various AR and robotics applications. However,… ▽ More

    Submitted 4 September, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

  28. arXiv:1609.08685  [pdf, other

    cs.GR cs.CG cs.CV

    Understanding and Exploiting Object Interaction Landscapes

    Authors: Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Bedrich Benes, Yusuke Yoshiyasu, Leonidas J. Guibas

    Abstract: Interactions play a key role in understanding objects and scenes, for both virtual and real world agents. We introduce a new general representation for proximal interactions among physical objects that is agnostic to the type of objects or interaction involved. The representation is based on tracking particles on one of the participating objects and then observing them with sensors appropriately p… ▽ More

    Submitted 8 November, 2016; v1 submitted 27 September, 2016; originally announced September 2016.

    Comments: 14 pages, 19 figures

  29. arXiv:1605.06240  [pdf, other

    cs.CV

    FPNN: Field Probing Neural Networks for 3D Data

    Authors: Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, Leonidas J. Guibas

    Abstract: Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of 3D CNNs grows c… ▽ More

    Submitted 24 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: To appear in NIPS 2016

    ACM Class: I.5.1; I.2.10