skip to main content
research-article
Public Access

How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

Published: 22 September 2021 Publication History
  • Get Citation Alerts
  • Abstract

    We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure.
    Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality—what is known as the reality gap. To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.

    References

    [1]
    Team Betaflight. [n.d.]. Betaflight. https://betaflight.com/
    [2]
    Russell L. Smith. [n.d.]. Open Dynamics Engine. https://www.ode.org/
    [3]
    Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
    [4]
    Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, and Denis Steckelmacher. 2018. Dynamic weights in multi-objective deep reinforcement learning. arXiv preprint arXiv:1809.07803 (2018).
    [5]
    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete Problems in AI Safety. arxiv:cs.AI/1606.06565
    [6]
    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arxiv:1606.01540
    [7]
    Erwin Coumans and Yunfei Bai. 2016–2019. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org.
    [8]
    Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. 2017. OpenAI Baselines. https://github.com/openai/baselines.
    [9]
    Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1329–1338.
    [10]
    Philip J. Fleming and John J. Wallace. 1986. How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29 (1986), 218–221.
    [11]
    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML'18).
    [12]
    Petr Hájek. 1998. Product Logic, Gödel Logic (and Boolean Logic). Springer Netherlands, Dordrecht, 89–107. https://doi.org/10.1007/978-94-011-5300-3_4
    [13]
    Hado V. Hasselt. 2010. Double q-learning. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.). Curran Associates, Inc., 2613–2621.
    [14]
    Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. 2017. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters 2 (2017), 2096–2103.
    [15]
    Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, and Amar Shah. 2019. Learning to drive in a day. In 2019 International Conference on Robotics and Automation (ICRA'19). IEEE, 8248–8254.
    [16]
    William Koch. 2019. Flight Controller Synthesis via Deep Reinforcement Learning. Ph.D. Dissertation. Department of Computer Science, Boston University. arxiv:cs.LG/1909.06493
    [17]
    William Koch, Renato Mancuso, and Azer Bestavros. 2019. Neuroflight: Next generation flight control firmware. CoRR abs/1901.06553 (2019). arxiv:1901.06553
    [18]
    William Koch, Renato Mancuso, Richard West, and Azer Bestavros. 2019. Reinforcement Learning for UAV Attitude Control. 3, 2, Article 22 (2019). https://doi.org/10.1145/3301273
    [19]
    Nathan Koenig and Andrew Howard. [n.d.]. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'04) (IEEE Cat. No. 04CH37566), Vol. 3. IEEE, 2149–2154.
    [20]
    Jeongseok Lee, Michael Grey, Sehoon Ha, Tobias Kunz, Sumit Jain, Yuting Ye, Siddhartha Srinivasa, Mike Stilman, and C. Liu. 2018. DART: Dynamic animation and robotics toolkit. Journal of Open Source Software 3, 22 (2018), 500. https://doi.org/10.21105/joss.00500
    [21]
    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In International Conference on Learning Representations. arxiv:1509.02971
    [22]
    A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. 2018. Benchmarking reinforcement learning algorithms on real-world robots. In Conference on Robot Learning. arxiv:1809.07731
    [23]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
    [24]
    Artem Molchanov, Tao Chen, Wolfgang Hönig, James A. Preiss, Nora Ayanian, and Gaurav S. Sukhatme. 2019. Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. In International Conference on Intelligent Robots and Systems. arxiv:1903.04628
    [25]
    Ken Perlin. 1985. An image synthesizer. SIGGRAPH Comput. Graph. 19, 3 (July 1985), 287–296. https://doi.org/10.1145/325165.325247
    [26]
    Gavin A. Rummery and Mahesan Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Vol. 37.
    [27]
    Fereshteh Sadeghi, Alexander Toshev, Eric Jang, and Sergey Levine. 2017. Sim2Real view invariant visual servoing by recurrent control. CoRR abs/1712.07642 (2017). arxiv:1712.07642
    [28]
    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1889–1897.
    [29]
    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). arxiv:1707.06347
    [30]
    Christian R. Shelton. 2001. Balancing multiple sources of reward in reinforcement learning. In Advances in Neural Information Processing Systems. 1082–1088.
    [31]
    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484.
    [32]
    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017).
    [33]
    Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning (1st ed.). MIT Press, Cambridge, MA.
    [34]
    Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems. 1057–1063.
    [35]
    Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026–5033.
    [36]
    Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
    [37]
    Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3–4 (1992), 279–292.
    [38]
    Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 3–4 (1992), 229–256.
    [39]
    Amy X. Zhang, Nicolas Ballas, and Joelle Pineau. 2018. A dissection of overfitting and generalization in continuous reinforcement learning. CoRR abs/1806.07937 (2018).
    [40]
    Chiyuan Zhang, Oriol Vinyals, Rémi Munos, and Samy Bengio. 2018. A study on overfitting in deep reinforcement learning. CoRR abs/1804.06893 (2018).
    [41]
    J. G. Ziegler and N. B. Nichols. 1993. Optimum settings for automatic controllers. Journal of Dynamic Systems, Measurement, and Control 115, 2B (06 1993), 220–222.

    Cited By

    View all
    • (2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
    • (2024)Applying Reinforcement Learning to PID Flight Control of a Quadrotor Drone to Mitigate Wind Disturbances2024 10th International Conference on Automation, Robotics and Applications (ICARA)10.1109/ICARA60736.2024.10553186(285-293)Online publication date: 22-Feb-2024
    • (2023)Attitude and Altitude Control Design and Implementation of Quadrotor Using NI myRIOElectronics10.3390/electronics1207152612:7(1526)Online publication date: 23-Mar-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Cyber-Physical Systems
    ACM Transactions on Cyber-Physical Systems  Volume 5, Issue 4
    October 2021
    312 pages
    ISSN:2378-962X
    EISSN:2378-9638
    DOI:10.1145/3481689
    • Editor:
    • Chenyang Lu
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 22 September 2021
    Accepted: 01 May 2021
    Revised: 01 February 2021
    Received: 01 August 2020
    Published in TCPS Volume 5, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Neural networks
    2. continuous control
    3. quadrotor

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • NSF CCF
    • NSF

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)358
    • Downloads (Last 6 weeks)43

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.326343035:3(3168-3180)Online publication date: Mar-2024
    • (2024)Applying Reinforcement Learning to PID Flight Control of a Quadrotor Drone to Mitigate Wind Disturbances2024 10th International Conference on Automation, Robotics and Applications (ICARA)10.1109/ICARA60736.2024.10553186(285-293)Online publication date: 22-Feb-2024
    • (2023)Attitude and Altitude Control Design and Implementation of Quadrotor Using NI myRIOElectronics10.3390/electronics1207152612:7(1526)Online publication date: 23-Mar-2023
    • (2023)Symmetric actor–critic deep reinforcement learning for cascade quadrotor flight controlNeurocomputing10.1016/j.neucom.2023.126789559:COnline publication date: 28-Nov-2023
    • (2022)Cascade Flight Control of Quadrotors Based on Deep Reinforcement LearningIEEE Robotics and Automation Letters10.1109/LRA.2022.31964557:4(11134-11141)Online publication date: Oct-2022
    • (2022)L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS47612.2022.9981812(4032-4039)Online publication date: 23-Oct-2022
    • (2022)Machine learning for flow-informed aerodynamic control in turbulent wind conditionsCommunications Engineering10.1038/s44172-022-00046-z1:1Online publication date: 16-Dec-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media