neuROSym: Deployment and Evaluation of a ROS-based Neuro-Symbolic Model for Human Motion Prediction

Sariah Mghames¹, Luca Castri¹, Marc Hanheide¹, Nicola Bellotto^1,2 ¹School of Computer Science, University of Lincoln, UK.
²Dept. of Information Engineering, University of Padua, Italy.
This project has received funding from the EU’s Horizon 2020 Research and Innovation programme under grant agreement No 101017274

Abstract

Autonomous mobile robots can rely on several human motion detection and prediction systems for safe and efficient navigation in human environments, but the underline model architectures can have different impacts on the trustworthiness of the robot in the real world. Among existing solutions for context-aware human motion prediction, some approaches have shown the benefit of integrating symbolic knowledge with state-of-the-art neural networks. In particular, a recent neuro-symbolic architecture (NeuroSyM) has successfully embedded context with a Qualitative Trajectory Calculus (QTC) for spatial interactions representation. This work achieved better performance than neural-only baseline architectures on offline datasets. In this paper, we extend the original architecture to provide neuROSym, a ROS package for robot deployment in real-world scenarios, which can run, visualise, and evaluate previous neural-only and neuro-symbolic models for motion prediction online. We evaluated these models, NeuroSyM and a baseline SGAN, on a TIAGo robot in two scenarios with different human motion patterns. We assessed accuracy and runtime performance of the prediction models, showing a general improvement in case our neuro-symbolic architecture is used. We make the neuROSym package¹¹1https://github.com/sariahmghames/neuROSym publicly available to the robotics community.

I Introduction

The integration of autonomous mobile robots in logistics, transportation, and healthcare, is rapidly increasing, as is user trust in the technologies used to build them. One key requirement for mobile robots’ trustworthiness is the ability to navigate safely among humans, in addition to other capabilities such as intelligent interaction and successful task completion. Safe navigation usually requires the detection and prediction of human motion. This is necessary not only for autonomous navigation, but also for action and intent recognition, anomaly detection, and other tasks. Existing methods for human motion detection and prediction can be divided into context-agnostic and (static or dynamic) context-aware. Taking context into account (e.g. knowing whether the robot is in a warehouse or a supermarket) can have a significant impact on the accuracy of the motion prediction system.

Refer to caption — Figure 1: Deployment of *neuROSym* for online and context-aware human motion prediction, with real-time visualisation. The three blocks are ROS nodes, while the filled arrows and the dashed ones represent the online and offline inference, respectively. Each arrows label indicates the type of messages published and/or subscribed to by each node.

Context reasoning can include human-human and/or human-objects spatial interactions. The latter can be described by quantitative or qualitative representations. In the quantitative approach, interactions are typically embeddings of absolute or relative agents pose in a neural network model. The authors in [1] have jointly modeled human-robot and human-human interactions in a deep reinforcement learning framework for mobile robot navigation. In [2], instead, the authors learn an optimal local trajectory from a global plan by fusing human trajectories, LiDAR features, global path, and odometry features in particular attention layer. Context-awareness methods have also been proposed to deal with the challenges faced by the long-term prediction of single human motion [3, 4, 5, 6, 7, 8].

Qualitative representations of spatial interactions, though intuitive and computationally efficient, are less explored for context-aware human motion prediction. Recently, a new approach has been proposed that exploits a qualitative representation of spatial interactions to improve human motion prediction [9]. This consists in a neuro-symbolic model that has been proved to be effective in predicting human motion trajectories. The symbolic part is indeed a qualitative representation of spatial interactions between pairs of agents using the so-called Qualitative Trajectory Calculus (QTC) [10, 11]. QTC-based models of moving agent pairs can be described by different combinations of QTC symbols, which depends on properties of the interaction such as relative distance changes (i.e moving towards/away), velocity (i.e moving faster/slower), and orientation (i.e. moving to the left/right side).

However, deploying and validating these methods for context-aware human motion prediction on real-world robot domains has been only partly explored [5, 9, 7, 12, 6]. While offline experimental evaluation is essential for model comparison and selection, the actual performance of any chosen architecture can only be validated during the deployment phase, since real-world problems such as domain-shift and inference time can significantly affect the human prediction system. Based on the above considerations, and extending our previous NeuroSyM model for context-aware human motion prediction [9], the main contribution of this paper is two-fold:

•

a new ROS package for online human motion prediction and visualisation, called neuROSym, which is publicly available and includes the three blocks (inference, post-processing, and visualisation) depicted in Fig. 1;
•

a performance evaluation of the package on a real-world robot scenario with two different experimental settings: (i) people moving in the same directions, and (ii) people crossing each other’s paths.

The remainder of the paper is as follows: Sec. II presents an overview of the related works; Sec. III explains the approach adopted for human-context reasoning and deployment; Sec. IV illustrates and discusses the experimental results; finally, Sec. V concludes by summarising the main outcomes and suggesting future research directions.

II Related Work

Context-aware human motion prediction. Among the existing works in the area of context-aware human motion prediction [13], some incorporate spatio-temporal dependencies between interactions [14, 15, 16], while others consider spatial relations only [3, 4, 5, 6, 7, 8, 12, 17]. These can be further grouped in solutions that take into account static context [8], dynamic context [3, 5, 6, 15, 16], or both [14, 4, 7, 12, 17].

Two of the most popular architectures for human motion prediction are Social-LSTM (S-LSTM) [3] and Social Generative Adversarial Network (SGAN) [5]. Both use a spatially-aware pooling mechanism for incorporating the hidden states of nearby agents as a way to overcome the problem of variable and (potentially) large number of people in the scene. SGAN, however, is generally better than S-LSTM in terms of accuracy and time complexity, since it avoids grid-based pooling. SGAN features also lower time complexity and number of parameters compared to the Spatial-Temporal Graph Attention (STGAT) network [16]. Recently, we propsed a new neuro-symbolic approach [9] for context-aware human motion prediction, called NeuroSyM, which showed higher prediction accuracy on public datasets compared to SGAN. Other promising approaches have also demonstrated to improve networks performance, like the endpoint conditioned trajectories prediction in [6], the combined future activities and location prediction in [7], and the dynamic and static context-aware motion predictor in [14].

In this paper, we focus on the dynamic aspect of context, since it is typically the most challenging part for a mobile robot. We study in particular the real-world performance in human motion prediction of NeuroSyM against an SGAN baseline, implemented in a common ROS-based software framework for robot deployment.

Human-human interaction modeling: The methods to represent the interactions of nearby agents can be divided into one-to-one modeling and crowd modeling [18, 19, 20]. One-to-one interactions can be described by quantitative or qualitative representations. While the former have recently been modeled by multi-layer perceptrons embedding relative positions or velocities of agent pairs [1, 5, 3], qualitative approaches use symbolic representations, for example QTC-based [11, 21, 22]. Similar models were used in [23] to implement human-aware robot navigation strategies, where the prediction of interactions was limited to a Bayesian temporal model of single human-robot pairs, without taking into account nearby static or dynamic objects.

In our study, we consider one-to-one (i.e. pairwise) interactions through our previous NeuroSyM prediction system [9], which exploits QTC relations to weight the quantitative embedding of spatial interactions. However, we consider all the pairwise interactions in the neighbourhood, not just a single human-robot one.

Human motion prediction deployment. Most of the research on human motion prediction is evaluated on public datasets. In [24], the authors compared four different types of online human motion prediction for safe and efficient human-robot collaboration. Their models use linear regression and neural networks, with or without parameters adaptation. The authors showed that adaptable prediction models parameterized by neural networks achieved the best performance. They did not consider context-aware and long-term predictions though.

In [25], human motion prediction with Social Force Models (SFMs) was exploited for real-world people tracking. The work in [26] proposes a GAN-based solution that teaches the robot to mimic and predict human motion, but with a focus on actions rather than walking trajectories, and without taking into account context. In [27] instead, the authors validated a probabilistic framework, based on SFM and intention estimation, for human motion prediction with moving obstacles. While these solutions worked effectively on real-world systems, they rely significantly on the optimisation of their model parameters for each separate pair of interactions, and on the clustering methods to represent the observed scene.

In [28], the authors proposed real-time human motion prediction for robotics applications using physics-informed neural networks that embed SFM dynamic equations. Their model was trained on synthetic data and validated offline on one person only, without any domain-shift tests. While this latter is perhaps the closest to our current work, we deploy and evaluate both neural-only and neuro-symbolic approaches for context-aware human motion prediction considering multiple moving agents and walking patterns.

III neuROSym Architecture

In our previous work [9], we proposed a neuro-symbolic architecture for context-aware human motion prediction and validated its accuracy performance against baseline models in an offline setting, where public datasets were locally stored and used to train and test offline the models under investigation. It is well known, however, that the prediction performance may degrade when pre-trained models are transferred to real-world settings, especially due to domain-shift changes. Therefore, in this section, we present the neuROSym package for deployment and validation on real robots, which extends our previous work and provides an online evaluation tool, in terms of accuracy and runtime performance, for neural-only and neuro-symbolic prediction models.

III-A Main neuROSym Components

The new ROS package neuROSym, illustrated in Fig. 1, consists of the following three nodes:

•

Inference model node: it implements two subscribers to the same observational data topic whose messages are generated by a human tracker library. In parallel, it implements two publishers for the data visualisation and analytics node. Each pair subscriber-publisher corresponds to either ground truth or predicted samples. The node implements also the inference model for the prediction method under investigation. In this paper, we benchmark two state-of-the-art models: SGAN [5] and NeuroSyM [9]. We re-trained both models on two public datasets: Zara01 from the UCY dataset [29] and, for the first time, on the THOR²²2http://thor.oru.se/ dataset [30], where human motion patterns differ from the previous one.
•

Data visualisation and analytics node: this node runs in parallel to the inference node in order to generate, online, plots of the ground truth and predicted trajectories. It also generates average performance metrics, simultaneously to the visual plots.
•

Data post-processing node: this node is required to perform corrections in case the human tracking system misses some detections. If that happens, some people would be assigned different IDs over time. This node uses an offline ROS-Rviz visualiser to help matching different IDs of the same person in the scene.

III-B Inference Model

The inference model node of neuROSym is based on our previous work [9]. Here the generator part of both the NeuroSyM and the SGAN architectures consists of an encoder-pooling-decoder set of layers. In the following, we present briefly how the pooling mechanism of NeurSyM-SGAN (Fig. 2) incorporates the symbolic QTC knowledge in one of its layers. For a detailed explanation of the NeuroSyM architecture, including its performance on desktop experiments, we remand to the original paper [9].

Qualitative formulation of spatial interactions. A spatial interaction is represented by a vector of $m$ QTC relations [10], which consist of qualitative symbols ${q_{i}\in\{-,0,+\}}$ , for $i=1,...,m$ . Among the different QTC versions, NeuroSyM adopts the double-cross QTC_C1, since it better represents the dynamics of the agents in our application scenario. More specifically, NeuroSyM was tested in [9] with the four symbols $\{q_{1},\,q_{2},\,q_{3},\,q_{4}\}$ , where $q_{1}$ and $q_{2}$ represent the relative motion between a pair of agents (moving towards or away from), while $q_{3}$ and $q_{4}$ represent the side relation (moving to the left or to the right of). An example of QTC_C1 relations is shown in Fig. 3.

Figure 3: An example of QTC_C1 representation of interactions between three body points

P_{h}

P_{r}

, and

P_{s}

Neuro-symbolic architecture. To label interactions, NeuroSyM exploits QTC_C1 and the related concept of Conceptual Neighbourhood Diagram (CND) described in [10, 31]. The nodes of a CND are different QTC states, while edges represent the “closeness” of two QTC states at time $t$ and $t+1$ . In [9], we formulated the interaction label $\alpha_{CND}$ for each QTC state as the likelihood of a transition in the CND as follows:

\alpha_{CND}=P(\mathrm{QTC}_{t+1}|\mathrm{QTC}_{t})=\frac{1}{N_{Tr}}

(1)

where $N_{Tr}$ is the number of possible transitions from the current state. In practice, $\alpha_{CND}$ represents the level of stability, or reliability, of a QTC state. The higher the number of possible neighbour states, the lower the likelihood to transition into one of them. Given an interaction at time $t$ , we associate its label to the next one (observed or predicted) at $t+1$ . Typically, an interaction between agents A and B is calculated as an embedding of their relative pose as follows:

I_{AB}=\emph{Dense}(X_{B}-X_{A})

(2)

where $Dense(\cdot)$ is a fully connected layer. The symbolic processing transforms Eq. 2 into $\alpha_{CND}~{}I_{AB}$ . The QTC knowledge is domain-agnostic and therefore it can be applied to any neural network for time-series data modeling and prediction.

IV Experiments

IV-A Experimental Setup

We used a TIAGo³³3https://pal-robotics.com/robots/tiago/ mobile robot to monitor the motion of two people over a time period of 2 minutes. The robot was positioned at the corner of the experimental room (5m $\times$ 8.2m) and was equipped with a Velodyne VLP-16 3D LiDAR sensor, as shown in Fig. 4(a). This LiDAR features 16 scan channels, providing $360^{\circ}$ horizontal and $30^{\circ}$ vertical fields-of-view. The robot’s torso was set at the minimum height of approximately $1.2$ m to maximise the LiDAR’s chance of detecting nearby individuals. In order to track people in the scene, we run a Bayes People Tracker⁴⁴4https://github.com/LCAS/bayestracking [32] using point-cloud data from the LiDAR at a frequency of 10Hz. Fig. 4(c) shows an RViz screenshot with two humans tracked by the robot.

We conducted two types of experiments, illustrated in Figs. 4(a)-4(b). During these, we recorded the runtime of each inference model (SGAN baseline and NeuroSyM). We registered the rosbag file of each experiment (four in total) for offline processing. The system was running on a computer with 11th Gen Intel® Core™ i7-11800H processor and NVIDIA GeForce RTX 3080 16GB GPU. The two experiment settings are described next.

(a)

(b)

Figure 4: (a) Experimental Scenario A with two humans walking parallel to each other towards their goal (room end) and back, repetitively. The online trajectory prediction is performed by models trained on the UCY-Zara01 dataset. (b) Experimental Scenario B with two humans crossing each other’s path. Here the models are trained on the THOR dataset. (c) RViz visualization of the Bayes People Tracker with human bounding-boxes extracted from the robot’s LiDAR point-clouds.

Scenario A: “all-forward” motion behaviour. Both SGAN and NeuroSyM were trained on the UCY-Zara01 pedestrians dataset [29]. The UCY dataset consists of real pedestrian trajectories with rich multi-human interactions captured at 2.5Hz. It includes three sequences (Zara01, Zara02, and UCY), taken in public spaces from top-view videos. The motion pattern of the pedestrians resembles the all-forward motion pattern replicated in our experiments (i.e. people walking in parallel directions) and illustrated in Fig. 4(a).

Scenario B: “cross-path” motion behaviour. The inference models were trained on the THOR dataset [30], which was recorded by the authors with a motion capture system at 100Hz. This indoor dataset contains motion interactions among people and their environment, including avoidance of static and dynamic obstacles (e.g. humans, robot, static objects) by people trying to reach their goal locations. Similar motion patterns were replicated in our cross-path scenario, as illustrated in Figs. 4(b) and 5.

IV-B Data Processing

The ROS inference node processes the data sequentially, with an observed time window of 8 samples. Human trajectories affected by tracking errors (e.g. because of occlusions) were filtered out and not considered. The ROS inference node and the visualisation node run simultaneously, showing predicted and ground-truth trajectories at runtime. The performance comparison between the baseline SGAN model and the NeuroSyM architecture was conducted on the recorded rosbag files.

IV-C Results and Discussion

We evaluated the average accuracy of each inference model over the 2-minutes sessions of both experimental settings. The results are reported in Tables I and II, which include average displacement error (ADE) and final displacement error (FDE). These tables present 8 results in total, 4 for each scenario (A and B). We can see that, in all of the four cases, the higher accuracy achieved by NeuroSyM significantly reduced both ADE and FDE values compared to the SGAN baseline.

Scenario A	rosbag 1		rosbag 2
	SGAN	NeuroSyM	SGAN	NeuroSyM
Avg. ADE (m)	12.4	7.06	16.32	2.52
Avg. FDE (m)	2.28	1.31	3.24	0.68

TABLE I: Accuracy evaluation for Scenario A, in terms of average displacement error (ADE) and final displacement error (FDE), over 2-minutes long experiments.

Scenario B	rosbag 1		rosbag 2
	SGAN	NeuroSyM	SGAN	NeuroSyM
Avg. ADE (m)	10.88	5.7	24.27	9.87
Avg. FDE (m)	2.67	1.4	5	1.8

TABLE II: Accuracy evaluation for Scenario B, in terms of average displacement error (ADE) and final displacement error (FDE), over 2-minutes long experiments.

Fig. 6 illustrates some examples of ground truth and predicted trajectories in Scenarios A (top plot) and B (bottom plot), with the corresponding ADE and FDE metrics. We can clearly see that ADE and FDE are lower, in both plots, where the NeuroSyM model was used.

We also evaluated the average runtime of each inference model in both experimental scenarios. The results are reported in Table III, showing that NeuroSyM model is slightly slower than, but still comparable to, the SGAN baseline.

	Scenario	SGAN	NeuroSyM
Average time (s)	A	4.17	5.37
Average time (s)	B	5.19	7.36

TABLE III: Runtime evaluation for the two scenarios.

From Tables I, II, and III, we can conclude that, although the NeuroSyM architecture requires more time to predict human trajectories compared to the SGAN baseline, it is still relatively fast and, with some code optimisation, suitable for real-time deployment. In particular, the trade-off between runtime and accuracy is clearly in favour of the NeuroSyM solution, since its QTC-based context-awareness enables more accurate motion predictions.

V Conclusion

In this work, we implemented and deployed a ROS-based architecture, called neuROSym, for neural-only and neuro-symbolic motion prediction on real-world robotic systems. Using this framework, we experimentally evaluated the accuracy and runtime performance of two predictions models, SGAN and NeuroSyM, during online inference in two scenarios with different human motion patterns. The results show a trade-off between accuracy and runtime performance in favor of the NeuroSyM solution, which is particularly suitable for human-aware robot navigation. Our future work will extend the evaluation of neuROSym to more diverse and challenging scenarios, including complex human motion patterns with multiple people. We will also consider more robust people trackers and test against different baseline architectures with static- and dynamic-context awareness.

References

[1] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2019.
[2] A. Pokle, R. Martín-Martín, P. Goebel, V. Chow, H. M. Ewald, J. Yang, Z. Wang, A. Sadeghian, D. Sadigh, S. Savarese, et al., “Deep local trajectory replanning and control for robot navigation,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2019.
[3] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese, “Social lstm: Human trajectory prediction in crowded spaces,” in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
[4] M. Lisotto, P. Coscia, and L. Ballan, “Social and scene-aware trajectory prediction in crowded spaces,” in Proc. of the IEEE/CVF Int. Conf. on Computer Vision Workshops, 2019, pp. 0–0.
[5] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in IEEE conf. on computer vision and pattern recognition (CVPR), 2018.
[6] K. Mangalam, H. Girase, S. Agarwal, K.-H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” in European Conf. on Computer Vision (ECCV, 2020, pp. 759–776.
[7] J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and L. Fei-Fei, “Peeking into the future: Predicting future person activities and locations in videos,” in IEEE/CVF conf. on computer vision and pattern recognition (CVPR, 2019.
[8] Z. Cao, H. Gao, K. Mangalam, Q.-Z. Cai, M. Vo, and J. Malik, “Long-term human motion prediction with scene context,” in European Conf. on Computer Vision (ECCV), 2020, pp. 387–404.
[9] S. Mghames, L. Castri, M. Hanheide, and N. Bellotto, “A neuro-symbolic approach for enhanced human motion prediction,” in Int. Joint Conf. on Neural Networks (IJCNN), 2023, pp. 1–8.
[10] M. Delafontaine, S. H. Chavoshi, A. G. Cohn, and N. Van de Weghe, “A qualitative trajectory calculus to reason about moving point objects,” in Qualitative spatio-temporal representation and reasoning: Trends and future directions. IGI Global, 2012.
[11] N. Bellotto, M. Hanheide, and N. Van de Weghe, “Qualitative design and implementation of human-robot spatial interactions,” in Int. Conf. on Social Robotics (ICSR), 2013.
[12] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker, “Desire: Distant future prediction in dynamic scenes with interacting agents,” in Proc. of the IEEE Conf. on computer vision and pattern recognition (CVPR), 2017, pp. 336–345.
[13] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras, “Human motion trajectory prediction: A survey,” The Int. Journal of Robotics Research, vol. 39, no. 8, pp. 895–935, 2020.
[14] C. Tao, Q. Jiang, L. Duan, and P. Luo, “Dynamic and static context-aware lstm for multi-agent motion prediction,” in European Conf. on Computer Vision (ECCV), 2020.
[15] C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, “Spatio-temporal graph transformer networks for pedestrian trajectory prediction,” in European Conf. on Computer Vision, 2020, pp. 507–523.
[16] Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “Stgat: Modeling spatial-temporal interactions for human trajectory prediction,” in the IEEE/CVF Int. conf. on computer vision, 2019.
[17] N. Bisagno, C. Saltori, B. Zhang, F. G. De Natale, and N. Conci, “Embedding group and obstacle information in lstm networks for human trajectory prediction in crowded scenes,” Computer Vision and Image Understanding, 2021.
[18] K. Ijaz, S. Sohail, and S. Hashish, “A survey of latest approaches for crowd simulation and modeling using hybrid techniques,” in Proc. of the 17th UKSIM-AMSS Int. Conf. on Modelling and Simulation, 2015.
[19] H. Hedayati, A. Muehlbradt, D. J. Szafir, and S. Andrist, “Reform: Recognizing f-formations for social robots,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2020.
[20] S. Thompson, A. Gupta, A. W. Gupta, A. Chen, and M. Vázquez, “Conversational group detection with graph neural networks,” in Proc. of the Int. Conf. on Multimodal Interaction, 2021.
[21] M. Hanheide, A. Peters, and N. Bellotto, “Analysis of human-robot spatial behaviour applying a qualitative trajectory calculus,” in IEEE Int. Symposium on Robot and Human Interactive Communication (RO-MAN), 2012.
[22] C. Dondrup, N. Bellotto, and M. Hanheide, “A probabilistic model of human-robot spatial interaction using a qualitative trajectory calculus,” in AAAI Spring Symposium Series, 2014.
[23] C. Dondrup and M. Hanheide, “Qualitative constraints for human-aware robot navigation using velocity costmaps,” in IEEE Int. Symposium on Robot and Human Interactive Communication (RO-MAN), 2016.
[24] W. Zhao, L. Sun, C. Liu, and M. Tomizuka, “Experimental evaluation of human motion prediction toward safe and efficient human robot collaboration,” in American Control Conf. (ACC), 2020, pp. 4349–4354.
[25] M. Luber, J. A. Stork, G. D. Tipaldi, and K. O. Arras, “People tracking with human motion predictions from social forces,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2010, pp. 464–469.
[26] L.-Y. Gui, K. Zhang, Y.-X. Wang, X. Liang, J. M. Moura, and M. Veloso, “Teaching robots to predict human motion,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2018, pp. 562–567.
[27] G. Ferrer and A. Sanfeliu, “Behavior estimation for a complete framework for human motion prediction in crowded environments,” in IEEE Int. Conf. on Robotics and Automation (ICRA), 2014, pp. 5940–5945.
[28] A. Antonucci, G. P. R. Papini, P. Bevilacqua, L. Palopoli, and D. Fontanelli, “Efficient prediction of human motion for real-time robotics applications with physics-inspired neural networks,” IEEE Access, vol. 10, pp. 144–157, 2021.
[29] A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” in Computer graphics forum, vol. 26, no. 3, 2007, pp. 655–664.
[30] A. Rudenko, T. P. Kucner, C. S. Swaminathan, R. T. Chadalavada, K. O. Arras, and A. J. Lilienthal, “Thör: Human-robot navigation data collection and accurate motion trajectories dataset,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 676–682, 2020.
[31] N. Van de Weghe and P. De Maeyer, “Conceptual neighbourhood diagrams for representing moving objects,” in Perspectives in Conceptual Modeling, 2005, pp. 228–238.
[32] Z. Yan, T. Duckett, and N. Bellotto, “Online learning for human classification in 3d lidar-based tracking,” in IEEE/RSJ Int. Conf. on Intell. Robots & Systems (IROS), 2017, pp. 864–871.