-
Certified Human Trajectory Prediction
Authors:
Mohammadhossein Bahari,
Saeed Saadatnejad,
Amirhossein Asgari Farsangi,
Seyed-Mohsen Moosavi-Dezfooli,
Alexandre Alahi
Abstract:
Trajectory prediction plays an essential role in autonomous vehicles. While numerous strategies have been developed to enhance the robustness of trajectory prediction models, these methods are predominantly heuristic and do not offer guaranteed robustness against adversarial attacks and noisy observations. In this work, we propose a certification approach tailored for the task of trajectory predic…
▽ More
Trajectory prediction plays an essential role in autonomous vehicles. While numerous strategies have been developed to enhance the robustness of trajectory prediction models, these methods are predominantly heuristic and do not offer guaranteed robustness against adversarial attacks and noisy observations. In this work, we propose a certification approach tailored for the task of trajectory prediction. To this end, we address the inherent challenges associated with trajectory prediction, including unbounded outputs, and mutli-modality, resulting in a model that provides guaranteed robustness. Furthermore, we integrate a denoiser into our method to further improve the performance. Through comprehensive evaluations, we demonstrate the effectiveness of the proposed technique across various baselines and using standard trajectory prediction datasets. The code will be made available online: https://s-attack.github.io/
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Social-Transmotion: Promptable Human Trajectory Prediction
Authors:
Saeed Saadatnejad,
Yang Gao,
Kaouther Messaoud,
Alexandre Alahi
Abstract:
Accurate human trajectory prediction is crucial for applications such as autonomous vehicles, robotics, and surveillance systems. Yet, existing models often fail to fully leverage the non-verbal social cues human subconsciously communicate when navigating the space. To address this, we introduce Social-Transmotion, a generic Transformer-based model that exploits diverse and numerous visual cues to…
▽ More
Accurate human trajectory prediction is crucial for applications such as autonomous vehicles, robotics, and surveillance systems. Yet, existing models often fail to fully leverage the non-verbal social cues human subconsciously communicate when navigating the space. To address this, we introduce Social-Transmotion, a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior. We translate the idea of a prompt from Natural Language Processing (NLP) to the task of human trajectory prediction, where a prompt can be a sequence of x-y coordinates on the ground, bounding boxes in the image plane, or body pose keypoints in either 2D or 3D. This, in turn, augments trajectory data, leading to enhanced human trajectory prediction. Using masking technique, our model exhibits flexibility and adaptability by capturing spatiotemporal interactions between agents based on the available visual cues. We delve into the merits of using 2D versus 3D poses, and a limited set of poses. Additionally, we investigate the spatial and temporal attention map to identify which keypoints and time-steps in the sequence are vital for optimizing human trajectory prediction. Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/social-transmotion.
△ Less
Submitted 16 April, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
JRDB-Traj: A Dataset and Benchmark for Trajectory Forecasting in Crowds
Authors:
Saeed Saadatnejad,
Yang Gao,
Hamid Rezatofighi,
Alexandre Alahi
Abstract:
Predicting future trajectories is critical in autonomous navigation, especially in preventing accidents involving humans, where a predictive agent's ability to anticipate in advance is of utmost importance. Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios, often due to the isolation of model components.…
▽ More
Predicting future trajectories is critical in autonomous navigation, especially in preventing accidents involving humans, where a predictive agent's ability to anticipate in advance is of utmost importance. Trajectory forecasting models, employed in fields such as robotics, autonomous vehicles, and navigation, face challenges in real-world scenarios, often due to the isolation of model components. To address this, we introduce a novel dataset for end-to-end trajectory forecasting, facilitating the evaluation of models in scenarios involving less-than-ideal preceding modules such as tracking. This dataset, an extension of the JRDB dataset, provides comprehensive data, including the locations of all agents, scene images, and point clouds, all from the robot's perspective. The objective is to predict the future positions of agents relative to the robot using raw sensory input data. It bridges the gap between isolated models and practical applications, promoting a deeper understanding of navigation dynamics. Additionally, we introduce a novel metric for assessing trajectory forecasting models in real-world scenarios where ground-truth identities are inaccessible, addressing issues related to undetected or over-detected agents. Researchers are encouraged to use our benchmark for model evaluation and benchmarking.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Toward Reliable Human Pose Forecasting with Uncertainty
Authors:
Saeed Saadatnejad,
Mehrshad Mirmohammadi,
Matin Daghyani,
Parham Saremi,
Yashar Zoroofchi Benisi,
Amirhossein Alimohammadi,
Zahra Tehraninasab,
Taylor Mordan,
Alexandre Alahi
Abstract:
Recently, there has been an arms race of pose forecasting methods aimed at solving the spatio-temporal task of predicting a sequence of future 3D poses of a person given a sequence of past observed ones. However, the lack of unified benchmarks and limited uncertainty analysis have hindered progress in the field. To address this, we first develop an open-source library for human pose forecasting, i…
▽ More
Recently, there has been an arms race of pose forecasting methods aimed at solving the spatio-temporal task of predicting a sequence of future 3D poses of a person given a sequence of past observed ones. However, the lack of unified benchmarks and limited uncertainty analysis have hindered progress in the field. To address this, we first develop an open-source library for human pose forecasting, including multiple models, supporting several datasets, and employing standardized evaluation metrics, with the aim of promoting research and moving toward a unified and consistent evaluation. Second, we devise two types of uncertainty in the problem to increase performance and convey better trust: 1) we propose a method for modeling aleatoric uncertainty by using uncertainty priors to inject knowledge about the pattern of uncertainty. This focuses the capacity of the model in the direction of more meaningful supervision while reducing the number of learned parameters and improving stability; 2) we introduce a novel approach for quantifying the epistemic uncertainty of any model through clustering and measuring the entropy of its assignments. Our experiments demonstrate up to $25\%$ improvements in forecasting at short horizons, with no loss on longer horizons on Human3.6M, AMSS, and 3DPW datasets, and better performance in uncertainty estimation. The code is available online at https://github.com/vita-epfl/UnPOSed.
△ Less
Submitted 12 April, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
A generic diffusion-based approach for 3D human pose prediction in the wild
Authors:
Saeed Saadatnejad,
Ali Rasekh,
Mohammadreza Mofayezi,
Yasamin Medghalchi,
Sara Rajabzadeh,
Taylor Mordan,
Alexandre Alahi
Abstract:
Predicting 3D human poses in real-world scenarios, also known as human pose forecasting, is inevitably subject to noisy inputs arising from inaccurate 3D pose estimations and occlusions. To address these challenges, we propose a diffusion-based approach that can predict given noisy observations. We frame the prediction task as a denoising problem, where both observation and prediction are consider…
▽ More
Predicting 3D human poses in real-world scenarios, also known as human pose forecasting, is inevitably subject to noisy inputs arising from inaccurate 3D pose estimations and occlusions. To address these challenges, we propose a diffusion-based approach that can predict given noisy observations. We frame the prediction task as a denoising problem, where both observation and prediction are considered as a single sequence containing missing elements (whether in the observation or prediction horizon). All missing elements are treated as noise and denoised with our conditional diffusion model. To better handle long-term forecasting horizon, we present a temporal cascaded diffusion model. We demonstrate the benefits of our approach on four publicly available datasets (Human3.6M, HumanEva-I, AMASS, and 3DPW), outperforming the state-of-the-art. Additionally, we show that our framework is generic enough to improve any 3D pose prediction model as a pre-processing step to repair their inputs and a post-processing step to refine their outputs. The code is available online: \url{https://github.com/vita-epfl/DePOSit}.
△ Less
Submitted 15 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Pedestrian 3D Bounding Box Prediction
Authors:
Saeed Saadatnejad,
Yi Zhou Ju,
Alexandre Alahi
Abstract:
Safety is still the main issue of autonomous driving, and in order to be globally deployed, they need to predict pedestrians' motions sufficiently in advance. While there is a lot of research on coarse-grained (human center prediction) and fine-grained predictions (human body keypoints prediction), we focus on 3D bounding boxes, which are reasonable estimates of humans without modeling complex mot…
▽ More
Safety is still the main issue of autonomous driving, and in order to be globally deployed, they need to predict pedestrians' motions sufficiently in advance. While there is a lot of research on coarse-grained (human center prediction) and fine-grained predictions (human body keypoints prediction), we focus on 3D bounding boxes, which are reasonable estimates of humans without modeling complex motion details for autonomous vehicles. This gives the flexibility to predict in longer horizons in real-world settings. We suggest this new problem and present a simple yet effective model for pedestrians' 3D bounding box prediction. This method follows an encoder-decoder architecture based on recurrent neural networks, and our experiments show its effectiveness in both the synthetic (JTA) and real-world (NuScenes) datasets. The learned representation has useful information to enhance the performance of other tasks, such as action anticipation. Our code is available online: https://github.com/vita-epfl/bounding-box-prediction
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
A Shared Representation for Photorealistic Driving Simulators
Authors:
Saeed Saadatnejad,
Siyuan Li,
Taylor Mordan,
Alexandre Alahi
Abstract:
A powerful simulator highly decreases the need for real-world tests when training and evaluating autonomous vehicles. Data-driven simulators flourished with the recent advancement of conditional Generative Adversarial Networks (cGANs), providing high-fidelity images. The main challenge is synthesizing photorealistic images while following given constraints. In this work, we propose to improve the…
▽ More
A powerful simulator highly decreases the need for real-world tests when training and evaluating autonomous vehicles. Data-driven simulators flourished with the recent advancement of conditional Generative Adversarial Networks (cGANs), providing high-fidelity images. The main challenge is synthesizing photorealistic images while following given constraints. In this work, we propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We build on successful cGAN models to propose a new semantically-aware discriminator that better guides the generator. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning. The achieved improvements are generic and simple enough to be applied to any architecture of conditional image synthesis. We demonstrate the strength of our method on the scene, building, and human synthesis tasks across three different datasets. The code is available at https://github.com/vita-epfl/SemDisc.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Vehicle trajectory prediction works, but not everywhere
Authors:
Mohammadhossein Bahari,
Saeed Saadatnejad,
Ahmad Rahimi,
Mohammad Shaverdikondori,
Amir-Hossein Shahidzadeh,
Seyed-Mohsen Moosavi-Dezfooli,
Alexandre Alahi
Abstract:
Vehicle trajectory prediction is nowadays a fundamental pillar of self-driving cars. Both the industry and research communities have acknowledged the need for such a pillar by providing public benchmarks. While state-of-the-art methods are impressive, i.e., they have no off-road prediction, their generalization to cities outside of the benchmark remains unexplored. In this work, we show that those…
▽ More
Vehicle trajectory prediction is nowadays a fundamental pillar of self-driving cars. Both the industry and research communities have acknowledged the need for such a pillar by providing public benchmarks. While state-of-the-art methods are impressive, i.e., they have no off-road prediction, their generalization to cities outside of the benchmark remains unexplored. In this work, we show that those methods do not generalize to new scenes. We present a method that automatically generates realistic scenes causing state-of-the-art models to go off-road. We frame the problem through the lens of adversarial scene generation. The method is a simple yet effective generative model based on atomic scene generation functions along with physical constraints. Our experiments show that more than 60% of existing scenes from the current benchmarks can be modified in a way to make prediction methods fail (i.e., predicting off-road). We further show that the generated scenes (i) are realistic since they do exist in the real world, and (ii) can be used to make existing models more robust, yielding 30-40 reductions in the off-road rate. The code is available online: https://s-attack.github.io/.
△ Less
Submitted 29 March, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
SVG-Net: An SVG-based Trajectory Prediction Model
Authors:
Mohammadhossein Bahari,
Vahid Zehtab,
Sadegh Khorasani,
Sana Ayromlou,
Saeed Saadatnejad,
Alexandre Alahi
Abstract:
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with a rasterized format and some of the more recent approaches leverage custom vectorized formats. In contrast, we propose…
▽ More
Anticipating motions of vehicles in a scene is an essential problem for safe autonomous driving systems. To this end, the comprehension of the scene's infrastructure is often the main clue for predicting future trajectories. Most of the proposed approaches represent the scene with a rasterized format and some of the more recent approaches leverage custom vectorized formats. In contrast, we propose representing the scene's information by employing Scalable Vector Graphics (SVG). SVG is a well-established format that matches the problem of trajectory prediction better than rasterized formats while being more general than arbitrary vectorized formats. SVG has the potential to provide the convenience and generality of raster-based solutions if coupled with a powerful tool such as CNNs, for which we introduce SVG-Net. SVG-Net is a Transformer-based Neural Network that can effectively capture the scene's information from SVG inputs. Thanks to the self-attention mechanism in its Transformers, SVG-Net can also adequately apprehend relations amongst the scene and the agents. We demonstrate SVG-Net's effectiveness by evaluating its performance on the publicly available Argoverse forecasting dataset. Finally, we illustrate how, by using SVG, one can benefit from datasets and advancements in other research fronts that also utilize the same input format. Our code is available at https://vita-epfl.github.io/SVGNet/.
△ Less
Submitted 11 October, 2021; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Are socially-aware trajectory prediction models really socially-aware?
Authors:
Saeed Saadatnejad,
Mohammadhossein Bahari,
Pedram Khorsandi,
Mohammad Saneian,
Seyed-Mohsen Moosavi-Dezfooli,
Alexandre Alahi
Abstract:
Our field has recently witnessed an arms race of neural network-based trajectory predictors. While these predictors are at the core of many applications such as autonomous navigation or pedestrian flow simulations, their adversarial robustness has not been carefully studied. In this paper, we introduce a socially-attended attack to assess the social understanding of prediction models in terms of c…
▽ More
Our field has recently witnessed an arms race of neural network-based trajectory predictors. While these predictors are at the core of many applications such as autonomous navigation or pedestrian flow simulations, their adversarial robustness has not been carefully studied. In this paper, we introduce a socially-attended attack to assess the social understanding of prediction models in terms of collision avoidance. An attack is a small yet carefully-crafted perturbations to fail predictors. Technically, we define collision as a failure mode of the output, and propose hard- and soft-attention mechanisms to guide our attack. Thanks to our attack, we shed light on the limitations of the current models in terms of their social understanding. We demonstrate the strengths of our method on the recent trajectory prediction models. Finally, we show that our attack can be employed to increase the social understanding of state-of-the-art models. The code is available online: https://s-attack.github.io/
△ Less
Submitted 11 February, 2022; v1 submitted 24 August, 2021;
originally announced August 2021.
-
Pedestrian Intention Prediction: A Multi-task Perspective
Authors:
Smail Ait Bouhsain,
Saeed Saadatnejad,
Alexandre Alahi
Abstract:
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians. This is the reason why forecasting pedestrians' intentions sufficiently in advance is one of the most critical and challenging tasks for autonomous vehicles. This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians. In terms of visual states, whereas previou…
▽ More
In order to be globally deployed, autonomous cars must guarantee the safety of pedestrians. This is the reason why forecasting pedestrians' intentions sufficiently in advance is one of the most critical and challenging tasks for autonomous vehicles. This work tries to solve this problem by jointly predicting the intention and visual states of pedestrians. In terms of visual states, whereas previous work focused on x-y coordinates, we will also predict the size and indeed the whole bounding box of the pedestrian. The method is a recurrent neural network in a multi-task learning approach. It has one head that predicts the intention of the pedestrian for each one of its future position and another one predicting the visual states of the pedestrian. Experiments on the JAAD dataset show the superiority of the performance of our method compared to previous works for intention prediction. Also, although its simple architecture (more than 2 times faster), the performance of the bounding box prediction is comparable to the ones yielded by much more complex architectures. Our code is available online.
△ Less
Submitted 20 May, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
LSTM-Based ECG Classification for Continuous Monitoring on Personal Wearable Devices
Authors:
Saeed Saadatnejad,
Mohammadhosein Oveisi,
Matin Hashemi
Abstract:
Objective: A novel ECG classification algorithm is proposed for continuous cardiac monitoring on wearable devices with limited processing capacity. Methods: The proposed solution employs a novel architecture consisting of wavelet transform and multiple LSTM recurrent neural networks. Results: Experimental evaluations show superior ECG classification performance compared to previous works. Measurem…
▽ More
Objective: A novel ECG classification algorithm is proposed for continuous cardiac monitoring on wearable devices with limited processing capacity. Methods: The proposed solution employs a novel architecture consisting of wavelet transform and multiple LSTM recurrent neural networks. Results: Experimental evaluations show superior ECG classification performance compared to previous works. Measurements on different hardware platforms show the proposed algorithm meets timing requirements for continuous and real-time execution on wearable devices. Conclusion: In contrast to many compute-intensive deep-learning based approaches, the proposed algorithm is lightweight, and therefore, brings continuous monitoring with accurate LSTM-based ECG classification to wearable devices. Significance: The proposed algorithm is both accurate and lightweight. The source code is available online [1].
△ Less
Submitted 11 May, 2019; v1 submitted 12 December, 2018;
originally announced December 2018.