subscribe to arXiv mailings

arXiv:2404.19283 [pdf, other]

MAP-Former: Multi-Agent-Pair Gaussian Joint Prediction

Authors: Marlon Steiner, Marvin Klemp, Christoph Stiller

Abstract: There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed. Closing this gap necessitates advancements in prediction beyond current practices. Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density function… ▽ More There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed. Closing this gap necessitates advancements in prediction beyond current practices. Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density functions (PDFs) for single agents. Although, these methods achieve high accurate trajectory predictions, they only provide little or no information about the dependencies of interacting agents. Since traffic is a process of highly interdependent agents, whose actions directly influence their mutual behavior, the existing methods are not sufficient to reliably assess the risk of future trajectories. This paper addresses that gap by introducing a novel approach to motion prediction, focusing on predicting agent-pair covariance matrices in a ``scene-centric'' manner, which can then be used to model Gaussian joint PDFs for all agent-pairs in a scene. We propose a model capable of predicting those agent-pair covariance matrices, leveraging an enhanced awareness of interactions. Utilizing the prediction results of our model, this work forms the foundation for comprehensive risk assessment with statistically based methods for analyzing agents' relations by their joint PDFs. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: Accepted for publication in Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Jeju Island - Korea, 2-5 June 2024

arXiv:2403.11728 [pdf, ps, other]

PITA: Physics-Informed Trajectory Autoencoder

Authors: Johannes Fischer, Kevin Rösch, Martin Lauer, Christoph Stiller

Abstract: Validating robotic systems in safety-critical appli-cations requires testing in many scenarios including rare edgecases that are unlikely to occur, requiring to complement real-world testing with testing in simulation. Generative models canbe used to augment real-world datasets with generated data toproduce edge case scenarios by sampling in a learned latentspace. Autoencoders can learn said laten… ▽ More Validating robotic systems in safety-critical appli-cations requires testing in many scenarios including rare edgecases that are unlikely to occur, requiring to complement real-world testing with testing in simulation. Generative models canbe used to augment real-world datasets with generated data toproduce edge case scenarios by sampling in a learned latentspace. Autoencoders can learn said latent representation for aspecific domain by learning to reconstruct the input data froma lower-dimensional intermediate representation. However, theresulting trajectories are not necessarily physically plausible, butinstead typically contain noise that is not present in the inputtrajectory. To resolve this issue, we propose the novel Physics-Informed Trajectory Autoencoder (PITA) architecture, whichincorporates a physical dynamics model into the loss functionof the autoencoder. This results in smooth trajectories that notonly reconstruct the input trajectory but also adhere to thephysical model. We evaluate PITA on a real-world dataset ofvehicle trajectories and compare its performance to a normalautoencoder and a state-of-the-art action-space autoencoder. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.01512 [pdf, other]

doi 10.1109/IV55152.2023.10186638

Cooperative Automated Driving for Bottleneck Scenarios in Mixed Traffic

Authors: M. V. Baumann, J. Beyerer, H. S. Buck, B. Deml, S. Ehrhardt, Ch. Frese, D. Kleiser, M. Lauer, M. Roschani, M. Ruf, Ch. Stiller, P. Vortisch, J. R. Ziehn

Abstract: Connected automated vehicles (CAV), which incorporate vehicle-to-vehicle (V2V) communication into their motion planning, are expected to provide a wide range of benefits for individual and overall traffic flow. A frequent constraint or required precondition is that compatible CAVs must already be available in traffic at high penetration rates. Achieving such penetration rates incrementally before… ▽ More Connected automated vehicles (CAV), which incorporate vehicle-to-vehicle (V2V) communication into their motion planning, are expected to provide a wide range of benefits for individual and overall traffic flow. A frequent constraint or required precondition is that compatible CAVs must already be available in traffic at high penetration rates. Achieving such penetration rates incrementally before providing ample benefits for users presents a chicken-and-egg problem that is common in connected driving development. Based on the example of a cooperative driving function for bottleneck traffic flows (e.g. at a roadblock), we illustrate how such an evolutionary, incremental introduction can be achieved under transparent assumptions and objectives. To this end, we analyze the challenge from the perspectives of automation technology, traffic flow, human factors and market, and present a principle that 1) accounts for individual requirements from each domain; 2) provides benefits for any penetration rate of compatible CAVs between 0 % and 100 % as well as upward-compatibility for expected future developments in traffic; 3) can strictly limit the negative effects of cooperation for any participant and 4) can be implemented with close-to-market technology. We discuss the technical implementation as well as the effect on traffic flow over a wide parameter spectrum for human and technical aspects. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 8 pages, 7 figures

Journal ref: 35th IEEE Intelligent Vehicles Symposium (IV 2023)

arXiv:2402.00989 [pdf, other]

YOLinO++: Single-Shot Estimation of Generic Polylines for Mapless Automated Diving

Authors: Annika Meyer, Christoph Stiller

Abstract: In automated driving, highly accurate maps are commonly used to support and complement perception. These maps are costly to create and quickly become outdated as the traffic world is permanently changing. In order to support or replace the map of an automated system with detections from sensor data, a perception module must be able to detect the map features. We propose a neural network that follo… ▽ More In automated driving, highly accurate maps are commonly used to support and complement perception. These maps are costly to create and quickly become outdated as the traffic world is permanently changing. In order to support or replace the map of an automated system with detections from sensor data, a perception module must be able to detect the map features. We propose a neural network that follows the one shot philosophy of YOLO but is designed for detection of 1D structures in images, such as lane boundaries. We extend previous ideas by a midpoint based line representation and anchor definitions. This representation can be used to describe lane borders, markings, but also implicit features such as centerlines of lanes. The broad applicability of the approach is shown with the detection performance on lane centerlines, lane borders as well as the markings both on highways and in urban areas. Versatile lane boundaries are detected and can be inherently classified as dashed or solid lines, curb, road boundaries, or implicit delimitation. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2310.17963 [pdf, other]

Decision-theoretic MPC: Motion Planning with Weighted Maneuver Preferences Under Uncertainty

Authors: Ömer Şahin Taş, Philipp Heinrich Brusius, Christoph Stiller

Abstract: Continuous optimization based motion planners require deciding on a maneuver homotopy before optimizing the trajectory. Under uncertainty, maneuver intentions of other participants can be unclear, and the vehicle might not be able to decide on the most suitable maneuver. This work introduces a method that incorporates multiple maneuver preferences in planning. It optimizes the trajectory by consid… ▽ More Continuous optimization based motion planners require deciding on a maneuver homotopy before optimizing the trajectory. Under uncertainty, maneuver intentions of other participants can be unclear, and the vehicle might not be able to decide on the most suitable maneuver. This work introduces a method that incorporates multiple maneuver preferences in planning. It optimizes the trajectory by considering weighted maneuver preferences together with uncertainties ranging from perception to prediction while ensuring the feasibility of a chance-constrained fallback option. Evaluations in both driving experiments and simulation studies show enhanced interaction capabilities and comfort levels compared to conventional planners, which consider only a single maneuver. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2306.10840 [pdf, other]

RedMotion: Motion Prediction via Redundancy Reduction

Authors: Royden Wagner, Omer Sahin Tas, Marvin Klemp, Carlos Fernandez, Christoph Stiller

Abstract: We introduce RedMotion, a transformer model for motion prediction in self-driving vehicles that learns environment representations via redundancy reduction. Our first type of redundancy reduction is induced by an internal transformer decoder and reduces a variable-sized set of local road environment tokens, representing road graphs and agent data, to a fixed-sized global embedding. The second type… ▽ More We introduce RedMotion, a transformer model for motion prediction in self-driving vehicles that learns environment representations via redundancy reduction. Our first type of redundancy reduction is induced by an internal transformer decoder and reduces a variable-sized set of local road environment tokens, representing road graphs and agent data, to a fixed-sized global embedding. The second type of redundancy reduction is obtained by self-supervised learning and applies the redundancy reduction principle to embeddings generated from augmented views of road environments. Our experiments reveal that our representation learning approach outperforms PreTraM, Traj-MAE, and GraphDINO in a semi-supervised setting. Moreover, RedMotion achieves competitive results compared to HPTR or MTR++ in the Waymo Motion Prediction Challenge. Our open-source implementation is available at: https://github.com/kit-mrt/future-motion △ Less

Submitted 25 May, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 17 pages, 8 figures; v2: focus on transformer model; v3: TMLR camera-ready

arXiv:2305.02080 [pdf, other]

HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization

Authors: Fabian Immel, Richard Fehler, Mohammad M. Ghanaat, Florian Ries, Martin Haueis, Christoph Stiller

Abstract: High Definition (HD) maps are necessary for many applications of automated driving (AD), but their manual creation and maintenance is very costly. Vehicle fleet data from series production vehicles can be used to automatically generate HD maps, but the data is often incomplete and noisy. We propose a system for the generation of HD maps from vehicle fleet data, which is tolerant to missing or misc… ▽ More High Definition (HD) maps are necessary for many applications of automated driving (AD), but their manual creation and maintenance is very costly. Vehicle fleet data from series production vehicles can be used to automatically generate HD maps, but the data is often incomplete and noisy. We propose a system for the generation of HD maps from vehicle fleet data, which is tolerant to missing or misclassified detections and can handle drives with multiple routes, generating a single complete map, model-free and without prior reference lines. Using randomly selected drives as pivot drives, a step-wise lateral sampling of detections is performed. These sampled points are then clustered and aligned using Expectation Maximization (EM), estimating a lateral offset for each drive to compensate localization errors. The clustered points are replaced with the maxima of their probability density function (PDF) and connected to form polylines using a modified rectangular linear assignment algorithm. The data from vehicles on varying routes is then fused into a hierarchical singular map graph. The proposed approach achieves an average accuracy below 0.5 meters compared to a hand annotated ground truth map, as well as correctly resolving lane splits and merges, proving the feasibility of the use of vehicle fleet data for the generation of highway HD maps. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: Accepted for the 35th IEEE Intelligent Vehicles Symposium (IV 2023), 7 pages

arXiv:2302.05968 [pdf, other]

doi 10.1371/journal.pone.0290561

Self-supervised pseudo-colorizing of masked cells

Authors: Royden Wagner, Carlos Fernandez Lopez, Christoph Stiller

Abstract: Self-supervised learning, which is strikingly referred to as the dark matter of intelligence, is gaining more attention in biomedical applications of deep learning. In this work, we introduce a novel self-supervision objective for the analysis of cells in biomedical microscopy images. We propose training deep learning models to pseudo-colorize masked cells. We use a physics-informed pseudo-spectra… ▽ More Self-supervised learning, which is strikingly referred to as the dark matter of intelligence, is gaining more attention in biomedical applications of deep learning. In this work, we introduce a novel self-supervision objective for the analysis of cells in biomedical microscopy images. We propose training deep learning models to pseudo-colorize masked cells. We use a physics-informed pseudo-spectral colormap that is well suited for colorizing cell topology. Our experiments reveal that approximating semantic segmentation by pseudo-colorization is beneficial for subsequent fine-tuning on cell detection. Inspired by the recent success of masked image modeling, we additionally mask out cell parts and train to reconstruct these parts to further enrich the learned representations. We compare our pre-training method with self-supervised frameworks including contrastive learning (SimCLR), masked autoencoders (MAEs), and edge-based self-supervision. We build upon our previous work and train hybrid models for cell detection, which contain both convolutional and vision transformer modules. Our pre-training method can outperform SimCLR, MAE-like masked image modeling, and edge-based self-supervision when pre-training on a diverse set of six fluorescence microscopy datasets. Code is available at: https://github.com/roydenwa/pseudo-colorize-masked-cells △ Less

Submitted 28 August, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: 14 pages, 3 figures; Published in PLOS ONE

Journal ref: PLoS ONE 18(8): e0290561 (2023)

arXiv:2210.08885 [pdf, other]

Space, Time, and Interaction: A Taxonomy of Corner Cases in Trajectory Datasets for Automated Driving

Authors: Kevin Rösch, Florian Heidecker, Julian Truetsch, Kamil Kowol, Clemens Schicktanz, Maarten Bieshaar, Bernhard Sick, Christoph Stiller

Abstract: Trajectory data analysis is an essential component for highly automated driving. Complex models developed with these data predict other road users' movement and behavior patterns. Based on these predictions - and additional contextual information such as the course of the road, (traffic) rules, and interaction with other road users - the highly automated vehicle (HAV) must be able to reliably and… ▽ More Trajectory data analysis is an essential component for highly automated driving. Complex models developed with these data predict other road users' movement and behavior patterns. Based on these predictions - and additional contextual information such as the course of the road, (traffic) rules, and interaction with other road users - the highly automated vehicle (HAV) must be able to reliably and safely perform the task assigned to it, e.g., moving from point A to B. Ideally, the HAV moves safely through its environment, just as we would expect a human driver to do. However, if unusual trajectories occur, so-called trajectory corner cases, a human driver can usually cope well, but an HAV can quickly get into trouble. In the definition of trajectory corner cases, which we provide in this work, we will consider the relevance of unusual trajectories with respect to the task at hand. Based on this, we will also present a taxonomy of different trajectory corner cases. The categorization of corner cases into the taxonomy will be shown with examples and is done by cause and required data sources. To illustrate the complexity between the machine learning (ML) model and the corner case cause, we present a general processing chain underlying the taxonomy. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2207.14042 [pdf, other]

doi 10.1109/LRA.2022.3216991

Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings

Authors: Miguel Ángel Muñoz-Bañón, Jan-Hendrik Pauls, Haohao Hu, Christoph Stiller, Francisco A. Candelas, Fernando Torres

Abstract: Localization in aerial imagery-based maps offers many advantages, such as global consistency, geo-referenced maps, and the availability of publicly accessible data. However, the landmarks that can be observed from both aerial imagery and on-board sensors is limited. This leads to ambiguities or aliasing during the data association. Building upon a highly informative representation (that allows e… ▽ More Localization in aerial imagery-based maps offers many advantages, such as global consistency, geo-referenced maps, and the availability of publicly accessible data. However, the landmarks that can be observed from both aerial imagery and on-board sensors is limited. This leads to ambiguities or aliasing during the data association. Building upon a highly informative representation (that allows efficient data association), this paper presents a complete pipeline for resolving these ambiguities. Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements. Additionally, to smooth the final result, we adjust the information matrix for the associated data as a function of the relative transform produced by the data association process. We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany. We compare state-of-the-art outlier mitigation methods with our self-tuning approach, demonstrating a considerable improvement, especially for outer-urban scenarios. △ Less

Submitted 20 May, 2024; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: The paper was published in the journal "IEEE Robotics and Automation Letters" (RA-L)

Journal ref: In IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 12339-12346, Oct. 2022

arXiv:2207.11211 [pdf, other]

Improving Predictive Performance and Calibration by Weight Fusion in Semantic Segmentation

Authors: Timo Sämann, Ahmed Mostafa Hammam, Andrei Bursuc, Christoph Stiller, Horst-Michael Groß

Abstract: Averaging predictions of a deep ensemble of networks is apopular and effective method to improve predictive performance andcalibration in various benchmarks and Kaggle competitions. However, theruntime and training cost of deep ensembles grow linearly with the size ofthe ensemble, making them unsuitable for many applications. Averagingensemble weights instead of predictions circumvents this disadv… ▽ More Averaging predictions of a deep ensemble of networks is apopular and effective method to improve predictive performance andcalibration in various benchmarks and Kaggle competitions. However, theruntime and training cost of deep ensembles grow linearly with the size ofthe ensemble, making them unsuitable for many applications. Averagingensemble weights instead of predictions circumvents this disadvantageduring inference and is typically applied to intermediate checkpoints ofa model to reduce training cost. Albeit effective, only few works haveimproved the understanding and the performance of weight averaging.Here, we revisit this approach and show that a simple weight fusion (WF)strategy can lead to a significantly improved predictive performance andcalibration. We describe what prerequisites the weights must meet interms of weight space, functional space and loss. Furthermore, we presenta new test method (called oracle test) to measure the functional spacebetween weights. We demonstrate the versatility of our WF strategy acrossstate of the art segmentation CNNs and Transformers as well as real worlddatasets such as BDD100K and Cityscapes. We compare WF with similarapproaches and show our superiority for in- and out-of-distribution datain terms of predictive performance and calibration. △ Less

Submitted 8 November, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

arXiv:2204.08780 [pdf, other]

Sensor Data Fusion in Top-View Grid Maps using Evidential Reasoning with Advanced Conflict Resolution

Authors: Sven Richter, Frank Bieder, Sascha Wirges, Christoph Stiller

Abstract: We present a new method to combine evidential top-view grid maps estimated based on heterogeneous sensor sources. Dempster's combination rule that is usually applied in this context provides undesired results with highly conflicting inputs. Therefore, we use more advanced evidential reasoning techniques and improve the conflict resolution by modeling the reliability of the evidence sources. We pro… ▽ More We present a new method to combine evidential top-view grid maps estimated based on heterogeneous sensor sources. Dempster's combination rule that is usually applied in this context provides undesired results with highly conflicting inputs. Therefore, we use more advanced evidential reasoning techniques and improve the conflict resolution by modeling the reliability of the evidence sources. We propose a data-driven reliability estimation to optimize the fusion quality using the Kitti-360 dataset. We apply the proposed method to the fusion of LiDAR and stereo camera data and evaluate the results qualitatively and quantitatively. The results demonstrate that our proposed method robustly combines measurements from heterogeneous sensors and successfully resolves sensor conflicts. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2204.07887 [pdf, other]

Mapping LiDAR and Camera Measurements in a Dual Top-View Grid Representation Tailored for Automated Vehicles

Authors: Sven Richter, Frank Bieder, Sascha Wirges, Christoph Stiller

Abstract: We present a generic evidential grid mapping pipeline designed for imaging sensors such as LiDARs and cameras. Our grid-based evidential model contains semantic estimates for cell occupancy and ground separately. We specify the estimation steps for input data represented by point sets, but mainly focus on input data represented by images such as disparity maps or LiDAR range images. Instead of rel… ▽ More We present a generic evidential grid mapping pipeline designed for imaging sensors such as LiDARs and cameras. Our grid-based evidential model contains semantic estimates for cell occupancy and ground separately. We specify the estimation steps for input data represented by point sets, but mainly focus on input data represented by images such as disparity maps or LiDAR range images. Instead of relying on an external ground segmentation only, we deduce occupancy evidence by analyzing the surface orientation around measurements. We conduct experiments and evaluate the presented method using LiDAR and stereo camera data recorded in real traffic scenarios. Our method estimates cell occupancy robustly and with a high level of detail while maximizing efficiency and minimizing the dependency to external processing modules. △ Less

Submitted 21 April, 2022; v1 submitted 16 April, 2022; originally announced April 2022.

arXiv:2203.01180 [pdf, other]

Fast and Robust Ground Surface Estimation from LIDAR Measurements using Uniform B-Splines

Authors: Sascha Wirges, Kevin Rösch, Frank Bieder, Christoph Stiller

Abstract: We propose a fast and robust method to estimate the ground surface from LIDAR measurements on an automated vehicle. The ground surface is modeled as a UBS which is robust towards varying measurement densities and with a single parameter controlling the smoothness prior. We model the estimation process as a robust LS optimization problem which can be reformulated as a linear problem and thus solved… ▽ More We propose a fast and robust method to estimate the ground surface from LIDAR measurements on an automated vehicle. The ground surface is modeled as a UBS which is robust towards varying measurement densities and with a single parameter controlling the smoothness prior. We model the estimation process as a robust LS optimization problem which can be reformulated as a linear problem and thus solved efficiently. Using the SemanticKITTI data set, we conduct a quantitative evaluation by classifying the point-wise semantic annotations into ground and non-ground points. Finally, we validate the approach on our research vehicle in real-world scenarios. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2203.01151 [pdf, other]

Improving Lidar-Based Semantic Segmentation of Top-View Grid Maps by Learning Features in Complementary Representations

Authors: Frank Bieder, Maximilian Link, Simon Romanski, Haohao Hu, Christoph Stiller

Abstract: In this paper we introduce a novel way to predict semantic information from sparse, single-shot LiDAR measurements in the context of autonomous driving. In particular, we fuse learned features from complementary representations. The approach is aimed specifically at improving the semantic segmentation of top-view grid maps. Towards this goal the 3D LiDAR point cloud is projected onto two orthogona… ▽ More In this paper we introduce a novel way to predict semantic information from sparse, single-shot LiDAR measurements in the context of autonomous driving. In particular, we fuse learned features from complementary representations. The approach is aimed specifically at improving the semantic segmentation of top-view grid maps. Towards this goal the 3D LiDAR point cloud is projected onto two orthogonal 2D representations. For each representation a tailored deep learning architecture is developed to effectively extract semantic information which are fused by a superordinate deep neural network. The contribution of this work is threefold: (1) We examine different stages within the segmentation network for fusion. (2) We quantify the impact of embedding different features. (3) We use the findings of this survey to design a tailored deep neural network architecture leveraging respective advantages of different representations. Our method is evaluated using the SemanticKITTI dataset which provides a point-wise semantic annotation of more than 23.000 LiDAR measurements. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.13855 [pdf, other]

Large-Scale 3D Semantic Reconstruction for Automated Driving Vehicles with Adaptive Truncated Signed Distance Function

Authors: Haohao Hu, Hexing Yang, Jian Wu, Xiao Lei, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

Abstract: The Large-scale 3D reconstruction, texturing and semantic mapping are nowadays widely used for automated driving vehicles, virtual reality and automatic data generation. However, most approaches are developed for RGB-D cameras with colored dense point clouds and not suitable for large-scale outdoor environments using sparse LiDAR point clouds. Since a 3D surface can be usually observed from multip… ▽ More The Large-scale 3D reconstruction, texturing and semantic mapping are nowadays widely used for automated driving vehicles, virtual reality and automatic data generation. However, most approaches are developed for RGB-D cameras with colored dense point clouds and not suitable for large-scale outdoor environments using sparse LiDAR point clouds. Since a 3D surface can be usually observed from multiple camera images with different view poses, an optimal image patch selection for the texturing and an optimal semantic class estimation for the semantic mapping are still challenging. To address these problems, we propose a novel 3D reconstruction, texturing and semantic mapping system using LiDAR and camera sensors. An Adaptive Truncated Signed Distance Function is introduced to describe surfaces implicitly, which can deal with different LiDAR point sparsities and improve model quality. The from this implicit function extracted triangle mesh map is then textured from a series of registered camera images by applying an optimal image patch selection strategy. Besides that, a Markov Random Field-based data fusion approach is proposed to estimate the optimal semantic class for each triangle mesh. Our approach is evaluated on a synthetic dataset, the KITTI dataset and a dataset recorded with our experimental vehicle. The results show that the 3D models generated using our approach are more accurate in comparison to using other state-of-the-art approaches. The texturing and semantic mapping achieve also very promising results. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 8 pages

arXiv:2202.13847 [pdf, other]

TEScalib: Targetless Extrinsic Self-Calibration of LiDAR and Stereo Camera for Automated Driving Vehicles with Uncertainty Analysis

Authors: Haohao Hu, Fengze Han, Frank Bieder, Jan-Hendrik Pauls, Christoph Stiller

Abstract: In this paper, we present TEScalib, a novel extrinsic self-calibration approach of LiDAR and stereo camera using the geometric and photometric information of surrounding environments without any calibration targets for automated driving vehicles. Since LiDAR and stereo camera are widely used for sensor data fusion on automated driving vehicles, their extrinsic calibration is highly important. Howe… ▽ More In this paper, we present TEScalib, a novel extrinsic self-calibration approach of LiDAR and stereo camera using the geometric and photometric information of surrounding environments without any calibration targets for automated driving vehicles. Since LiDAR and stereo camera are widely used for sensor data fusion on automated driving vehicles, their extrinsic calibration is highly important. However, most of the LiDAR and stereo camera calibration approaches are mainly target-based and therefore time consuming. Even the newly developed targetless approaches in last years are either inaccurate or unsuitable for driving platforms. To address those problems, we introduce TEScalib. By applying a 3D mesh reconstruction-based point cloud registration, the geometric information is used to estimate the LiDAR to stereo camera extrinsic parameters accurately and robustly. To calibrate the stereo camera, a photometric error function is builded and the LiDAR depth is involved to transform key points from one camera to another. During driving, these two parts are processed iteratively. Besides that, we also propose an uncertainty analysis for reflecting the reliability of the estimated extrinsic parameters. Our TEScalib approach evaluated on the KITTI dataset achieves very promising results. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 8 pages

arXiv:2111.09230 [pdf, other]

doi 10.1109/ICRA46639.2022.9812271

DA-LMR: A Robust Lane Marking Representation for Data Association

Authors: Miguel Ángel Muñoz-Bañón, Jan-Hendrik Pauls, Haohao Hu, Christoph Stiller

Abstract: While complete localization approaches are widely studied in the literature, their data association and data representation subprocesses usually go unnoticed. However, both are a key part of the final pose estimation. In this work, we present DA-LMR (Delta-Angle Lane Marking Representation), a robust data representation in the context of localization approaches. We propose a representation of la… ▽ More While complete localization approaches are widely studied in the literature, their data association and data representation subprocesses usually go unnoticed. However, both are a key part of the final pose estimation. In this work, we present DA-LMR (Delta-Angle Lane Marking Representation), a robust data representation in the context of localization approaches. We propose a representation of lane markings that encodes how a curve changes in each point and includes this information in an additional dimension, thus providing a more detailed geometric structure description of the data. We also propose DC-SAC (Distance-Compatible Sample Consensus), a data association method. This is a heuristic version of RANSAC that dramatically reduces the hypothesis space by distance compatibility restrictions. We compare the presented methods with some state-of-the-art data representation and data association approaches in different noisy scenarios. The DA-LMR and DC-SAC produce the most promising combination among those compared, reaching 98.1% in precision and 99.7% in recall for noisy data with 0.5 m of standard deviation. △ Less

Submitted 7 March, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: Accepted ICRA 2022 (camera ready version)

Journal ref: 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 2022

arXiv:2111.02493 [pdf]

doi 10.1088/1361-6501/ac2dbd

Roadmap on Signal Processing for Next Generation Measurement Systems

Authors: D. K. Iakovidis, M. Ooi, Y. C. Kuang, S. Demidenko, A. Shestakov, V. Sinitsin, M. Henry, A. Sciacchitano, A. Discetti, S. Donati, M. Norgia, A. Menychtas, I. Maglogiannis, S. C. Wriessnegger, L. A. Barradas Chacon, G. Dimas, D. Filos, A. H. Aletras, J. Töger, F. Dong, S. Ren, A. Uhl, J. Paziewski, J. Geng, F. Fioranelli , et al. (9 additional authors not shown)

Abstract: Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shi… ▽ More Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shifting the research attention towards intelligent, data-driven, signal processing. This roadmap presents a critical overview of the state-of-the-art methods and applications aiming to highlight future challenges and research opportunities towards next generation measurement systems. It covers a broad spectrum of topics ranging from basic to industrial research, organized in concise thematic sections that reflect the trends and the impacts of current and future developments per research field. Furthermore, it offers guidance to researchers and funding agencies in identifying new prospects. △ Less

Submitted 28 January, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: 48 pages, https://iopscience.iop.org/article/10.1088/1361-6501/ac2dbd

Journal ref: Measurement Science and Technology 33(1) (2022) 1-48

arXiv:2110.07322 [pdf, other]

Modeling dynamic target deformation in camera calibration

Authors: Annika Hagemann, Moritz Knorr, Christoph Stiller

Abstract: Most approaches to camera calibration rely on calibration targets of well-known geometry. During data acquisition, calibration target and camera system are typically moved w.r.t. each other, to allow image coverage and perspective versatility. We show that moving the target can lead to small temporary deformations of the target, which can introduce significant errors into the calibration result. W… ▽ More Most approaches to camera calibration rely on calibration targets of well-known geometry. During data acquisition, calibration target and camera system are typically moved w.r.t. each other, to allow image coverage and perspective versatility. We show that moving the target can lead to small temporary deformations of the target, which can introduce significant errors into the calibration result. While static inaccuracies of calibration targets have been addressed in previous works, to our knowledge, none of the existing approaches can capture time-varying, dynamic deformations. To achieve high-accuracy calibrations despite moving the target, we propose a way to explicitly model dynamic target deformations in camera calibration. This is achieved by using a low-dimensional deformation model with only few parameters per image, which can be optimized jointly with target poses and intrinsics. We demonstrate the effectiveness of modeling dynamic deformations using different calibration targets and show its significance in a structure-from-motion application. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted for publication at IEEE/CVF, WACV 2022

arXiv:2108.08166 [pdf, other]

Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization

Authors: Lukas Stäcker, Juncong Fei, Philipp Heidenreich, Frank Bonarens, Jason Rambach, Didier Stricker, Christoph Stiller

Abstract: Deep neural networks have proven increasingly important for automotive scene understanding with new algorithms offering constant improvements of the detection performance. However, there is little emphasis on experiences and needs for deployment in embedded environments. We therefore perform a case study of the deployment of two representative object detection networks on an edge AI platform. In p… ▽ More Deep neural networks have proven increasingly important for automotive scene understanding with new algorithms offering constant improvements of the detection performance. However, there is little emphasis on experiences and needs for deployment in embedded environments. We therefore perform a case study of the deployment of two representative object detection networks on an edge AI platform. In particular, we consider RetinaNet for image-based 2D object detection and PointPillars for LiDAR-based 3D object detection. We describe the modifications necessary to convert the algorithms from a PyTorch training environment to the deployment environment taking into account the available tools. We evaluate the runtime of the deployed DNN using two different libraries, TensorRT and TorchScript. In our experiments, we observe slight advantages of TensorRT for convolutional layers and TorchScript for fully connected layers. We also study the trade-off between runtime and performance, when selecting an optimized setup for deployment, and observe that quantization significantly reduces the runtime while having only little impact on the detection performance. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: To present in ICCV 2021 (ERCVAD Workshop)

arXiv:2107.13484 [pdf, other]

Inferring bias and uncertainty in camera calibration

Authors: Annika Hagemann, Moritz Knorr, Holger Janssen, Christoph Stiller

Abstract: Accurate camera calibration is a precondition for many computer vision applications. Calibration errors, such as wrong model assumptions or imprecise parameter estimation, can deteriorate a system's overall performance, making the reliable detection and quantification of these errors critical. In this work, we introduce an evaluation scheme to capture the fundamental error sources in camera calibr… ▽ More Accurate camera calibration is a precondition for many computer vision applications. Calibration errors, such as wrong model assumptions or imprecise parameter estimation, can deteriorate a system's overall performance, making the reliable detection and quantification of these errors critical. In this work, we introduce an evaluation scheme to capture the fundamental error sources in camera calibration: systematic errors (biases) and uncertainty (variance). The proposed bias detection method uncovers smallest systematic errors and thereby reveals imperfections of the calibration setup and provides the basis for camera model selection. A novel resampling-based uncertainty estimator enables uncertainty estimation under non-ideal conditions and thereby extends the classical covariance estimator. Furthermore, we derive a simple uncertainty metric that is independent of the camera model. In combination, the proposed methods can be used to assess the accuracy of individual calibrations, but also to benchmark new calibration algorithms, camera models, or calibration setups. We evaluate the proposed methods with simulations and real cameras. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.07316 [pdf, other]

Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning

Authors: Danial Kamran, Tizian Engelgeh, Marvin Busch, Johannes Fischer, Christoph Stiller

Abstract: Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of… ▽ More Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility. Using a proactive safety verification approach, the proposed framework can guarantee that actions generated from RL are fail-safe according to the worst-case assumptions. Concurrently, the policy is encouraged to minimize safety interference and generate more comfortable behavior. We trained and evaluated the proposed approach and baseline policies using a high level simulator with a variety of randomized scenarios including several corner cases which rarely happen in reality but are very crucial. In light of our experiments, the behavior of policies learned using distributional RL can be adaptive at run-time and robust to the environment uncertainty. Quantitatively, the learned distributional RL agent drives in average 8 seconds faster than the normal DQN policy and requires 83\% less safety interference compared to the rule-based policy with slightly increasing the average crossing time. We also study sensitivity of the learned policy in environments with higher perception noise and show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2107.00346 [pdf, other]

MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense Top-View Understanding

Authors: Kunyu Peng, Juncong Fei, Kailun Yang, Alina Roitberg, Jiaming Zhang, Frank Bieder, Philipp Heidenreich, Christoph Stiller, Rainer Stiefelhagen

Abstract: At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost co… ▽ More At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost complete environment information. In this paper, we introduce MASS - a Multi-Attentional Semantic Segmentation model specifically built for dense top-view understanding of the driving scenes. Our framework operates on pillar- and occupancy features and comprises three attention-based building blocks: (1) a keypoint-driven graph attention, (2) an LSTM-based attention computed from a vector embedding of the spatial input, and (3) a pillar-based attention, resulting in a dense 360-degree segmentation mask. With extensive experiments on both, SemanticKITTI and nuScenes-LidarSeg, we quantitatively demonstrate the effectiveness of our model, outperforming the state of the art by 19.0% on SemanticKITTI and reaching 30.4% in mIoU on nuScenes-LidarSeg, where MASS is the first work addressing the dense segmentation task. Furthermore, our multi-attention model is shown to be very effective for 3D object detection validated on the KITTI-3D dataset, showcasing its high generalizability to other tasks related to 3D vision. △ Less

Submitted 20 January, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). Code is publicly available at https://github.com/KPeng9510/MASS

arXiv:2105.06896 [pdf, other]

doi 10.1109/ISC253183.2021.9562912

Towards Sensor Data Abstraction of Autonomous Vehicle Perception Systems

Authors: Hannes Reichert, Lukas Lang, Kevin Rösch, Daniel Bogdoll, Konrad Doll, Bernhard Sick, Hans-Christian Reuss, Christoph Stiller, J. Marius Zöllner

Abstract: Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstractio… ▽ More Full-stack autonomous driving perception modules usually consist of data-driven models based on multiple sensor modalities. However, these models might be biased to the sensor setup used for data acquisition. This bias can seriously impair the perception models' transferability to new sensor setups, which continuously occur due to the market's competitive nature. We envision sensor data abstraction as an interface between sensor data and machine learning applications for highly automated vehicles (HAD). For this purpose, we review the primary sensor modalities, camera, lidar, and radar, published in autonomous-driving related datasets, examine single sensor abstraction and abstraction of sensor setups, and identify critical paths towards an abstraction of sensor data from multiple perception configurations. △ Less

Submitted 28 September, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: Hannes Reichert, Lukas Lang, Kevin Rösch and Daniel Bogdoll contributed equally. Accepted for publication at ISC2 2021

arXiv:2105.04169 [pdf, other]

PillarSegNet: Pillar-based Semantic Grid Map Estimation using Sparse LiDAR Data

Authors: Juncong Fei, Kunyu Peng, Philipp Heidenreich, Frank Bieder, Christoph Stiller

Abstract: Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semanti… ▽ More Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semantic grid map. In contrast to a previously proposed grid map method, PillarSegNet uses PointNet to learn features directly from the 3D point cloud and then conducts 2D semantic segmentation in the top view. To train and evaluate our approach, we use both sparse and dense ground truth, where the dense ground truth is obtained from multiple superimposed scans. Experimental results on the SemanticKITTI dataset show that PillarSegNet achieves a performance gain of about 10% mIoU over the state-of-the-art grid map method. △ Less

Submitted 5 July, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: Accepted to present in the 2021 IEEE Intelligent Vehicles Symposium (IV21)

arXiv:2103.14420 [pdf, other]

YOLinO: Generic Single Shot Polyline Detection in Real Time

Authors: Annika Meyer, Philipp Skudlik, Jan-Hendrik Pauls, Christoph Stiller

Abstract: The detection of polylines is usually either bound to branchless polylines or formulated in a recurrent way, prohibiting their use in real-time systems. We propose an approach that builds upon the idea of single shot object detection. Reformulating the problem of polyline detection as a bottom-up composition of small line segments allows to detect bounded, dashed and continuous polylines with a… ▽ More The detection of polylines is usually either bound to branchless polylines or formulated in a recurrent way, prohibiting their use in real-time systems. We propose an approach that builds upon the idea of single shot object detection. Reformulating the problem of polyline detection as a bottom-up composition of small line segments allows to detect bounded, dashed and continuous polylines with a single head. This has several major advantages over previous methods. Not only is the method at 187 fps more than suited for real-time applications with virtually any restriction on the shapes of the detected polylines. By predicting multiple line segments for each cell, even branching or crossing polylines can be detected. We evaluate our approach on three different applications for road marking, lane border and center line detection. Hereby, we demonstrate the ability to generalize to different domains as well as both implicit and explicit polyline detection tasks. △ Less

Submitted 5 October, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: published on ICCV 2021 Workshop

arXiv:2103.03678 [pdf, other]

doi 10.1109/IV48863.2021.9575933

An Application-Driven Conceptualization of Corner Cases for Perception in Highly Automated Driving

Authors: Florian Heidecker, Jasmin Breitenstein, Kevin Rösch, Jonas Löhdefink, Maarten Bieshaar, Christoph Stiller, Tim Fingscheidt, Bernhard Sick

Abstract: Systems and functions that rely on machine learning (ML) are the basis of highly automated driving. An essential task of such ML models is to reliably detect and interpret unusual, new, and potentially dangerous situations. The detection of those situations, which we refer to as corner cases, is highly relevant for successfully developing, applying, and validating automotive perception functions i… ▽ More Systems and functions that rely on machine learning (ML) are the basis of highly automated driving. An essential task of such ML models is to reliably detect and interpret unusual, new, and potentially dangerous situations. The detection of those situations, which we refer to as corner cases, is highly relevant for successfully developing, applying, and validating automotive perception functions in future vehicles where multiple sensor modalities will be used. A complication for the development of corner case detectors is the lack of consistent definitions, terms, and corner case descriptions, especially when taking into account various automotive sensors. In this work, we provide an application-driven view of corner cases in highly automated driving. To achieve this goal, we first consider existing definitions from the general outlier, novelty, anomaly, and out-of-distribution detection to show relations and differences to corner cases. Moreover, we extend an existing camera-focused systematization of corner cases by adding RADAR (radio detection and ranging) and LiDAR (light detection and ranging) sensors. For this, we describe an exemplary toolchain for data acquisition and processing, highlighting the interfaces of the corner case detection. We also define a novel level of corner cases, the method layer corner cases, which appear due to uncertainty inherent in the methodology or the data distribution. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: This paper is submitted to IEEE Intelligent Vehicles Symposium 2021

arXiv:2012.07170 [pdf, other]

doi 10.1109/ITSC.2018.8569580

Decision-Time Postponing Motion Planning for Combinatorial Uncertain Maneuvering

Authors: Ömer Şahin Taş, Felix Hauser, Christoph Stiller

Abstract: Motion planning involves decision making among combinatorial maneuver variants in urban driving. A planner must consider uncertainties and associated risks of the maneuver variants, and subsequently select a maneuver alternative. In this paper we present a planning approach that considers the uncertainties in the prediction and, in case of high uncertainty, postpones the combinatorial decision mak… ▽ More Motion planning involves decision making among combinatorial maneuver variants in urban driving. A planner must consider uncertainties and associated risks of the maneuver variants, and subsequently select a maneuver alternative. In this paper we present a planning approach that considers the uncertainties in the prediction and, in case of high uncertainty, postpones the combinatorial decision making to a later time within the planning horizon. With our proposed approach, safe but at the same time not overconservative motion is planned. △ Less

Submitted 13 December, 2020; originally announced December 2020.

Comments: 7 pages, 5 figures

Journal ref: 2018 21st International Conference on Intelligent Transportation Systems (ITSC)

arXiv:2009.12276 [pdf, other]

doi 10.1109/MFI49285.2020.9235240

SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

Authors: Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, Christoph Stiller

Abstract: 3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public be… ▽ More 3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird's eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: Accepted to present in the 2020 IEEE International Conference on Multisensor Fusion and Integration (MFI 2020)

arXiv:2008.11647 [pdf, other]

RNN-based Pedestrian Crossing Prediction using Activity and Pose-related Features

Authors: Javier Lorenzo, Ignacio Parra, Florian Wirth, Christoph Stiller, David Fernandez Llorca, Miguel Angel Sotelo

Abstract: Pedestrian crossing prediction is a crucial task for autonomous driving. Numerous studies show that an early estimation of the pedestrian's intention can decrease or even avoid a high percentage of accidents. In this paper, different variations of a deep learning system are proposed to attempt to solve this problem. The proposed models are composed of two parts: a CNN-based feature extractor and a… ▽ More Pedestrian crossing prediction is a crucial task for autonomous driving. Numerous studies show that an early estimation of the pedestrian's intention can decrease or even avoid a high percentage of accidents. In this paper, different variations of a deep learning system are proposed to attempt to solve this problem. The proposed models are composed of two parts: a CNN-based feature extractor and an RNN module. All the models were trained and tested on the JAAD dataset. The results obtained indicate that the choice of the features extraction method, the inclusion of additional variables such as pedestrian gaze direction and discrete orientation, and the chosen RNN type have a significant impact on the final performance. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: 6 pages, 5 figures. This work has been accepted for publication at IEEE Intelligent Vehicle Symposium 2020

arXiv:2005.06667 [pdf, other]

Exploiting Multi-Layer Grid Maps for Surround-View Semantic Segmentation of Sparse LiDAR Data

Authors: Frank Bieder, Sascha Wirges, Johannes Janosovits, Sven Richter, Zheyuan Wang, Christoph Stiller

Abstract: In this paper, we consider the transformation of laser range measurements into a top-view grid map representation to approach the task of LiDAR-only semantic segmentation. Since the recent publication of the SemanticKITTI data set, researchers are now able to study semantic segmentation of urban LiDAR sequences based on a reasonable amount of data. While other approaches propose to directly learn… ▽ More In this paper, we consider the transformation of laser range measurements into a top-view grid map representation to approach the task of LiDAR-only semantic segmentation. Since the recent publication of the SemanticKITTI data set, researchers are now able to study semantic segmentation of urban LiDAR sequences based on a reasonable amount of data. While other approaches propose to directly learn on the 3D point clouds, we are exploiting a grid map framework to extract relevant information and represent them by using multi-layer grid maps. This representation allows us to use well-studied deep learning architectures from the image domain to predict a dense semantic grid map using only the sparse input data of a single LiDAR scan. We compare single-layer and multi-layer approaches and demonstrate the benefit of a multi-layer grid map input. Since the grid map representation allows us to predict a dense, 360° semantic environment representation, we further develop a method to combine the semantic information from multiple scans and create dense ground truth grids. This method allows us to evaluate and compare the performance of our models not only based on grid cells with a detection, but on the full visible measurement range. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2004.04450 [pdf, other]

Risk-Aware High-level Decisions for Automated Driving at Occluded Intersections with Reinforcement Learning

Authors: Danial Kamran, Carlos Fernandez Lopez, Martin Lauer, Christoph Stiller

Abstract: Reinforcement learning is nowadays a popular framework for solving different decision making problems in automated driving. However, there are still some remaining crucial challenges that need to be addressed for providing more reliable policies. In this paper, we propose a generic risk-aware DQN approach in order to learn high level actions for driving through unsignalized occluded intersections.… ▽ More Reinforcement learning is nowadays a popular framework for solving different decision making problems in automated driving. However, there are still some remaining crucial challenges that need to be addressed for providing more reliable policies. In this paper, we propose a generic risk-aware DQN approach in order to learn high level actions for driving through unsignalized occluded intersections. The proposed state representation provides lane based information which allows to be used for multi-lane scenarios. Moreover, we propose a risk based reward function which punishes risky situations instead of only collision failures. Such rewarding approach helps to incorporate risk prediction into our deep Q network and learn more reliable policies which are safer in challenging situations. The efficiency of the proposed approach is compared with a DQN learned with conventional collision based rewarding scheme and also with a rule-based intersection navigation policy. Evaluation results show that the proposed approach outperforms both of these methods. It provides safer actions than collision-aware DQN approach and is less overcautious than the rule-based policy. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:2003.00710 [pdf, other]

Learned Enrichment of Top-View Grid Maps Improves Object Detection

Authors: Sascha Wirges, Ye Yang, Sven Richter, Haohao Hu, Christoph Stiller

Abstract: We propose an object detector for top-view grid maps which is additionally trained to generate an enriched version of its input. Our goal in the joint model is to improve generalization by regularizing towards structural knowledge in form of a map fused from multiple adjacent range sensor measurements. This training data can be generated in an automatic fashion, thus does not require manual annota… ▽ More We propose an object detector for top-view grid maps which is additionally trained to generate an enriched version of its input. Our goal in the joint model is to improve generalization by regularizing towards structural knowledge in form of a map fused from multiple adjacent range sensor measurements. This training data can be generated in an automatic fashion, thus does not require manual annotations. We present an evidential framework to generate training data, investigate different model architectures and show that predicting enriched inputs as an additional task can improve object detection performance. △ Less

Submitted 9 March, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: 6 pages, 6 figures, 4 tables

arXiv:2002.01254 [pdf, other]

Tackling Existence Probabilities of Objects with Motion Planning for Automated Urban Driving

Authors: Omer Sahin Tas, Christoph Stiller

Abstract: Motion planners take uncertain information about the environment as an input. The environment information is often quite noisy and has a tendency to contain false positive object detection. State-of-the-art motion planners consider all objects alike, thus producing overcautious behavior. In this paper we present a planning approach that considers alternative maneuvers in a combined fashion and pla… ▽ More Motion planners take uncertain information about the environment as an input. The environment information is often quite noisy and has a tendency to contain false positive object detection. State-of-the-art motion planners consider all objects alike, thus producing overcautious behavior. In this paper we present a planning approach that considers alternative maneuvers in a combined fashion and plans a motion that is formed by the probabilities of those alternatives. The proposed planner can smoothly react to objects with low existence probability while remaining collision-free in case their existence substantiates. In this way, it tolerates the faults arising from perception and prediction, thus reducing their impact on operational reliability. △ Less

Submitted 21 October, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

Comments: 5 pages, 5 figures

arXiv:2002.00667 [pdf, other]

Single-Stage Object Detection from Top-View Grid Maps on Custom Sensor Setups

Authors: Sascha Wirges, Shuxiao Ding, Christoph Stiller

Abstract: We present our approach to unsupervised domain adaptation for single-stage object detectors on top-view grid maps in automated driving scenarios. Our goal is to train a robust object detector on grid maps generated from custom sensor data and setups. We first introduce a single-stage object detector for grid maps based on RetinaNet. We then extend our model by image- and instance-level domain clas… ▽ More We present our approach to unsupervised domain adaptation for single-stage object detectors on top-view grid maps in automated driving scenarios. Our goal is to train a robust object detector on grid maps generated from custom sensor data and setups. We first introduce a single-stage object detector for grid maps based on RetinaNet. We then extend our model by image- and instance-level domain classifiers at different feature pyramid levels which are trained in an adversarial manner. This allows us to train robust object detectors for unlabeled domains. We evaluate our approach quantitatively on the nuScenes and KITTI benchmarks and present qualitative domain adaptation results for unlabeled measurements recorded by our experimental vehicle. Our results demonstrate that object detection accuracy for unlabeled domains can be improved by applying our domain adaptation strategy. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: 6 pages, 5 figures, 4 tables

arXiv:1910.03088 [pdf, other]

INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

Authors: Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius Kummerle, Hendrik Konigshof, Christoph Stiller, Arnaud de La Fortelle, Masayoshi Tomizuka

Abstract: Behavior-related research areas such as motion prediction/planning, representation/imitation learning, behavior modeling/generation, and algorithm testing, require support from high-quality motion datasets containing interactive driving scenarios with different driving cultures. In this paper, we present an INTERnational, Adversarial and Cooperative moTION dataset (INTERACTION dataset) in interact… ▽ More Behavior-related research areas such as motion prediction/planning, representation/imitation learning, behavior modeling/generation, and algorithm testing, require support from high-quality motion datasets containing interactive driving scenarios with different driving cultures. In this paper, we present an INTERnational, Adversarial and Cooperative moTION dataset (INTERACTION dataset) in interactive driving scenarios with semantic maps. Five features of the dataset are highlighted. 1) The interactive driving scenarios are diverse, including urban/highway/ramp merging and lane changes, roundabouts with yield/stop signs, signalized intersections, intersections with one/two/all-way stops, etc. 2) Motion data from different countries and different continents are collected so that driving preferences and styles in different cultures are naturally included. 3) The driving behavior is highly interactive and complex with adversarial and cooperative motions of various traffic participants. Highly complex behavior such as negotiations, aggressive/irrational decisions and traffic rule violations are densely contained in the dataset, while regular behavior can also be found from cautious car-following, stop, left/right/U-turn to rational lane-change and cycling and pedestrian crossing, etc. 4) The levels of criticality span wide, from regular safe operations to dangerous, near-collision maneuvers. Real collision, although relatively slight, is also included. 5) Maps with complete semantic information are provided with physical layers, reference lines, lanelet connections and traffic rules. The data is recorded from drones and traffic cameras. Statistics of the dataset in terms of number of entities and interaction density are also provided, along with some utilization examples in a variety of behavior-related research areas. The dataset can be downloaded via https://interaction-dataset.com. △ Less

Submitted 30 September, 2019; originally announced October 2019.

arXiv:1906.02495 [pdf, other]

Anytime Lane-Level Intersection Estimation Based on Trajectories of Other Traffic Participants

Authors: Annika Meyer, Jonas Walter, Martin Lauer, Christoph Stiller

Abstract: Estimating and understanding the current scene is an inevitable capability of automated vehicles. Usually, maps are used as prior for interpreting sensor measurements in order to drive safely and comfortably. Only few approaches take into account that maps might be outdated and lead to wrong assumptions on the environment. This work estimates a lane-level intersection topology without any map prio… ▽ More Estimating and understanding the current scene is an inevitable capability of automated vehicles. Usually, maps are used as prior for interpreting sensor measurements in order to drive safely and comfortably. Only few approaches take into account that maps might be outdated and lead to wrong assumptions on the environment. This work estimates a lane-level intersection topology without any map prior by observing the trajectories of other traffic participants. We are able to deliver both a coarse lane-level topology as well as the lane course inside and outside of the intersection using Markov chain Monte Carlo sampling. The model is neither limited to a number of lanes or arms nor to the topology of the intersection. We present our results on an evaluation set of 1000 simulated intersections and achieve 99.9% accuracy on the topology estimation that takes only 36ms, when utilizing tracked object detections. The precise lane course on these intersections is estimated with an error of 15cm on average after 140ms. Our approach shows a similar level of precision on 14 real-world intersections with 18cm average deviation on simple intersections and 27cm for more complex scenarios. Here the estimation takes only 113ms in total. △ Less

Submitted 7 August, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

arXiv:1904.12599 [pdf, other]

Self-Supervised Flow Estimation using Geometric Regularization with Applications to Camera Image and Grid Map Sequences

Authors: Sascha Wirges, Johannes Gräter, Qiuhao Zhang, Christoph Stiller

Abstract: We present a self-supervised approach to estimate flow in camera image and top-view grid map sequences using fully convolutional neural networks in the domain of automated driving. We extend existing approaches for self-supervised optical flow estimation by adding a regularizer expressing motion consistency assuming a static environment. However, as this assumption is violated for other moving tra… ▽ More We present a self-supervised approach to estimate flow in camera image and top-view grid map sequences using fully convolutional neural networks in the domain of automated driving. We extend existing approaches for self-supervised optical flow estimation by adding a regularizer expressing motion consistency assuming a static environment. However, as this assumption is violated for other moving traffic participants we also estimate a mask to scale this regularization. Adding a regularization towards motion consistency improves convergence and flow estimation accuracy. Furthermore, we scale the errors due to spatial flow inconsistency by a mask that we derive from the motion mask. This improves accuracy in regions where the flow drastically changes due to a better separation between static and dynamic environment. We apply our approach to optical flow estimation from camera image sequences, validate on odometry estimation and suggest a method to iteratively increase optical flow estimation accuracy using the generated motion masks. Finally, we provide quantitative and qualitative results based on the KITTI odometry and tracking benchmark for scene flow estimation based on grid map sequences. We show that we can improve accuracy and convergence when applying motion and spatial consistency regularization. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: 6 pages, 5 figures

arXiv:1903.10205 [pdf, other]

Accurate Global Trajectory Alignment using Poles and Road Markings

Authors: Haohao Hu, Marc Sons, Christoph Stiller

Abstract: Currently, digital maps are indispensable for automated driving. However, due to the low precision and reliability of GNSS particularly in urban areas, fusing trajectories of independent recording sessions and different regions is a challenging task. To bypass the flaws from direct incorporation of GNSS measurements for geo-referencing, the usage of aerial imagery seems promising. Furthermore, mor… ▽ More Currently, digital maps are indispensable for automated driving. However, due to the low precision and reliability of GNSS particularly in urban areas, fusing trajectories of independent recording sessions and different regions is a challenging task. To bypass the flaws from direct incorporation of GNSS measurements for geo-referencing, the usage of aerial imagery seems promising. Furthermore, more accurate geo-referencing improves the global map accuracy and allows to estimate the sensor calibration error. In this paper, we present a novel geo-referencing approach to align trajectories to aerial imagery using poles and road markings. To match extracted features from sensor observations to aerial imagery landmarks robustly, a RANSAC-based matching approach is applied in a sliding window. For that, we assume that the trajectories are roughly referenced to the imagery which can be achieved by rough GNSS measurements from a low-cost GNSS receiver. Finally, we align the initial trajectories precisely to the aerial imagery by minimizing a geometric cost function comprising all determined matches. Evaluations performed on data recorded in Karlsruhe, Germany show that our algorithm yields trajectories which are accurately referenced to the used aerial imagery. △ Less

Submitted 25 March, 2019; originally announced March 2019.

Comments: 6 packages, 6 figures, conference

arXiv:1901.11284 [pdf, other]

Capturing Object Detection Uncertainty in Multi-Layer Grid Maps

Authors: Sascha Wirges, Marcel Reith-Braun, Martin Lauer, Christoph Stiller

Abstract: We propose a deep convolutional object detector for automated driving applications that also estimates classification, pose and shape uncertainty of each detected object. The input consists of a multi-layer grid map which is well-suited for sensor fusion, free-space estimation and machine learning. Based on the estimated pose and shape uncertainty we approximate object hulls with bounded collision… ▽ More We propose a deep convolutional object detector for automated driving applications that also estimates classification, pose and shape uncertainty of each detected object. The input consists of a multi-layer grid map which is well-suited for sensor fusion, free-space estimation and machine learning. Based on the estimated pose and shape uncertainty we approximate object hulls with bounded collision probability which we find helpful for subsequent trajectory planning tasks. We train our models based on the KITTI object detection data set. In a quantitative and qualitative evaluation some models show a similar performance and superior robustness compared to previously developed object detectors. However, our evaluation also points to undesired data set properties which should be addressed when training data-driven models or creating new data sets. △ Less

Submitted 31 January, 2019; originally announced January 2019.

Comments: 8 pages, 8 figures, 2 tables

arXiv:1810.13001 [pdf, other]

doi 10.1109/IVS.2018.8500369

Limited Visibility and Uncertainty Aware Motion Planning for Automated Driving

Authors: Omer Sahin Tas, Christoph Stiller

Abstract: Adverse weather conditions and occlusions in urban environments result in impaired perception. The uncertainties are handled in different modules of an automated vehicle, ranging from sensor level over situation prediction until motion planning. This paper focuses on motion planning given an uncertain environment model with occlusions. We present a method to remain collision free for the worst-cas… ▽ More Adverse weather conditions and occlusions in urban environments result in impaired perception. The uncertainties are handled in different modules of an automated vehicle, ranging from sensor level over situation prediction until motion planning. This paper focuses on motion planning given an uncertain environment model with occlusions. We present a method to remain collision free for the worst-case evolution of the given scene. We define criteria that measure the available margins to a collision while considering visibility and interactions, and consequently integrate conditions that apply these criteria into an optimization-based motion planner. We show the generality of our method by validating it in several distinct urban scenarios. △ Less

Submitted 30 October, 2018; originally announced October 2018.

Journal ref: In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Changshu-Suzhou China, 26-30 June 2018, pp. 1171--1178

arXiv:1805.08689 [pdf, other]

Object Detection and Classification in Occupancy Grid Maps using Deep Convolutional Networks

Authors: Sascha Wirges, Tom Fischer, Jesus Balado Frias, Christoph Stiller

Abstract: A detailed environment perception is a crucial component of automated vehicles. However, to deal with the amount of perceived information, we also require segmentation strategies. Based on a grid map environment representation, well-suited for sensor fusion, free-space estimation and machine learning, we detect and classify objects using deep convolutional neural networks. As input for our network… ▽ More A detailed environment perception is a crucial component of automated vehicles. However, to deal with the amount of perceived information, we also require segmentation strategies. Based on a grid map environment representation, well-suited for sensor fusion, free-space estimation and machine learning, we detect and classify objects using deep convolutional neural networks. As input for our networks we use a multi-layer grid map efficiently encoding 3D range sensor information. The inference output consists of a list of rotated bounding boxes with associated semantic classes. We conduct extensive ablation studies, highlight important design considerations when using grid maps and evaluate our models on the KITTI Bird's Eye View benchmark. Qualitative and quantitative benchmark results show that we achieve robust detection and state of the art accuracy solely using top-view grid maps from range sensor data. △ Less

Submitted 5 December, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

Comments: 6 pages, 4 tables, 4 figures

Journal ref: 2018 IEEE Intelligent Transportation Systems Conference (ITSC)

arXiv:1805.05374 [pdf, other]

doi 10.1109/ITSC.2018.8569658

Generating Comfortable, Safe and Comprehensible Trajectories for Automated Vehicles in Mixed Traffic

Authors: Maximilian Naumann, Martin Lauer, Christoph Stiller

Abstract: While motion planning approaches for automated driving often focus on safety and mathematical optimality with respect to technical parameters, they barely consider convenience, perceived safety for the passenger and comprehensibility for other traffic participants. For automated driving in mixed traffic, however, this is key to reach public acceptance. In this paper, we revise the problem statemen… ▽ More While motion planning approaches for automated driving often focus on safety and mathematical optimality with respect to technical parameters, they barely consider convenience, perceived safety for the passenger and comprehensibility for other traffic participants. For automated driving in mixed traffic, however, this is key to reach public acceptance. In this paper, we revise the problem statement of motion planning in mixed traffic: Instead of largely simplifying the motion planning problem to a convex optimization problem, we keep a more complex probabilistic multi agent model and strive for a near optimal solution. We assume cooperation of other traffic participants, yet being aware of violations of this assumption. This approach yields solutions that are provably safe in all situations, and convenient and comprehensible in situations that are also unambiguous for humans. Thus, it outperforms existing approaches in mixed traffic scenarios, as we show in simulation. △ Less

Submitted 10 May, 2019; v1 submitted 14 May, 2018; originally announced May 2018.

Journal ref: Proc. IEEE Intl. Conf. Intelligent Transportation Systems, pp. 575-582, Hawaii, USA, Nov 2018

arXiv:1801.05297 [pdf, other]

doi 10.1109/IVS.2018.8500635

Evidential Occupancy Grid Map Augmentation using Deep Learning

Authors: Sascha Wirges, Felix Hartenbach, Christoph Stiller

Abstract: A detailed environment representation is a crucial component of automated vehicles. Using single range sensor scans, data is often too sparse and subject to occlusions. Therefore, we present a method to augment occupancy grid maps from single views to be similar to evidential occupancy maps acquired from different views using Deep Learning. To accomplish this, we estimate motion between subsequent… ▽ More A detailed environment representation is a crucial component of automated vehicles. Using single range sensor scans, data is often too sparse and subject to occlusions. Therefore, we present a method to augment occupancy grid maps from single views to be similar to evidential occupancy maps acquired from different views using Deep Learning. To accomplish this, we estimate motion between subsequent range sensor measurements and create an evidential 3D voxel map in an extensive post-processing step. Within this voxel map, we explicitly model uncertainty using evidence theory and create a 2D projection using combination rules. As input for our neural networks, we use a multi-layer grid map consisting of the three features detections, transmissions and intensity, each for ground and non-ground measurements. Finally, we perform a quantitative and qualitative evaluation which shows that different network architectures accurately infer evidential measures in real-time. △ Less

Submitted 5 December, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: 6 pages, 5 figures

Journal ref: 2018 IEEE Intelligent Vehicles Symposium (IV)

arXiv:1709.05273 [pdf, other]

Cooperative Motion Planning for Non-Holonomic Agents with Value Iteration Networks

Authors: Eike Rehder, Maximilian Naumann, Niels Ole Salscheider, Christoph Stiller

Abstract: Cooperative motion planning is still a challenging task for robots. Recently, Value Iteration Networks (VINs) were proposed to model motion planning tasks as Neural Networks. In this work, we extend VINs to solve cooperative planning tasks under non-holonomic constraints. For this, we interconnect multiple VINs to pay respect to each other's outputs. Policies for cooperation are generated via iter… ▽ More Cooperative motion planning is still a challenging task for robots. Recently, Value Iteration Networks (VINs) were proposed to model motion planning tasks as Neural Networks. In this work, we extend VINs to solve cooperative planning tasks under non-holonomic constraints. For this, we interconnect multiple VINs to pay respect to each other's outputs. Policies for cooperation are generated via iterative gradient descend. Validation in simulation shows that the resulting networks can resolve non-holonomic motion planning problems that require cooperation. △ Less

Submitted 15 September, 2017; originally announced September 2017.

arXiv:1708.06962 [pdf, other]

Towards Cooperative Motion Planning for Automated Vehicles in Mixed Traffic

Authors: Maximilian Naumann, Christoph Stiller

Abstract: While motion planning techniques for automated vehicles in a reactive and anticipatory manner are already widely presented, approaches to cooperative motion planning are still remaining. In this paper, we present an approach to enhance common motion planning algorithms, that allows for cooperation with human-driven vehicles. Unlike previous approaches, we integrate the prediction of other traffic… ▽ More While motion planning techniques for automated vehicles in a reactive and anticipatory manner are already widely presented, approaches to cooperative motion planning are still remaining. In this paper, we present an approach to enhance common motion planning algorithms, that allows for cooperation with human-driven vehicles. Unlike previous approaches, we integrate the prediction of other traffic participants into the motion planning, such that the influence of the ego vehicle's behavior on the other traffic participants can be taken into account. For this purpose, a new cost functional is presented, containing the cost for all relevant traffic participants in the scene. Finally, we propose a path-velocity-decomposing sampling-based implementation of our approach for selected scenarios, which is evaluated in a simulation. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Comments: Accepted for 9th Workshop on Planning, Perception and Navigation for Intelligent Vehicles at IROS 2017

arXiv:1707.03167 [pdf, other]

RegNet: Multimodal Sensor Registration Using Deep Neural Networks

Authors: Nick Schneider, Florian Piewak, Christoph Stiller, Uwe Franke

Abstract: In this paper, we present RegNet, the first deep convolutional neural network (CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between multimodal sensors, exemplified using a scanning LiDAR and a monocular camera. Compared to existing approaches, RegNet casts all three conventional calibration steps (feature extraction, feature matching and global regression) into a single real-ti… ▽ More In this paper, we present RegNet, the first deep convolutional neural network (CNN) to infer a 6 degrees of freedom (DOF) extrinsic calibration between multimodal sensors, exemplified using a scanning LiDAR and a monocular camera. Compared to existing approaches, RegNet casts all three conventional calibration steps (feature extraction, feature matching and global regression) into a single real-time capable CNN. Our method does not require any human interaction and bridges the gap between classical offline and target-less online calibration approaches as it provides both a stable initial estimation as well as a continuous online correction of the extrinsic parameters. During training we randomly decalibrate our system in order to train RegNet to infer the correspondence between projected depth measurements and RGB image and finally regress the extrinsic calibration. Additionally, with an iterative execution of multiple CNNs, that are trained on different magnitudes of decalibration, our approach compares favorably to state-of-the-art methods in terms of a mean calibration error of 0.28 degrees for the rotational and 6 cm for the translation components even for large decalibrations up to 1.5 m and 20 degrees. △ Less

Submitted 11 July, 2017; originally announced July 2017.

Comments: published in IEEE Intelligent Vehicles Symposium, 2017

arXiv:1706.05904 [pdf, other]

Pedestrian Prediction by Planning using Deep Neural Networks

Authors: Eike Rehder, Florian Wirth, Martin Lauer, Christoph Stiller

Abstract: Accurate traffic participant prediction is the prerequisite for collision avoidance of autonomous vehicles. In this work, we predict pedestrians by emulating their own motion planning. From online observations, we infer a mixture density function for possible destinations. We use this result as the goal states of a planning stage that performs motion prediction based on common behavior patterns. T… ▽ More Accurate traffic participant prediction is the prerequisite for collision avoidance of autonomous vehicles. In this work, we predict pedestrians by emulating their own motion planning. From online observations, we infer a mixture density function for possible destinations. We use this result as the goal states of a planning stage that performs motion prediction based on common behavior patterns. The entire system is modeled as one monolithic neural network and trained via inverse reinforcement learning. Experimental validation on real world data shows the system's ability to predict both, destinations and trajectories accurately. △ Less

Submitted 20 June, 2017; v1 submitted 19 June, 2017; originally announced June 2017.

arXiv:1608.00753 [pdf, other]

Semantically Guided Depth Upsampling

Authors: Nick Schneider, Lukas Schneider, Peter Pinggera, Uwe Franke, Marc Pollefeys, Christoph Stiller

Abstract: We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving dept… ▽ More We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth in- terpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines glob- ally consistent solutions and preserves fine details and sharp depth bound- aries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements. △ Less

Submitted 2 August, 2016; originally announced August 2016.

Comments: German Conference on Pattern Recognition 2016 (Oral)

Showing 1–50 of 50 results for author: Stiller, C