subscribe to arXiv mailings

The SwaNNFlight System: On-the-Fly Sim-to-Real Adaptation via Anchored Learning

Authors: Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko, Renato Mancuso

Abstract: Reinforcement Learning (RL) agents trained in simulated environments and then deployed in the real world are often sensitive to the differences in dynamics presented, commonly termed the sim-to-real gap. With the goal of minimizing this gap on resource-constrained embedded systems, we train and live-adapt agents on quadrotors built from off-the-shelf hardware. In achieving this we developed three… ▽ More Reinforcement Learning (RL) agents trained in simulated environments and then deployed in the real world are often sensitive to the differences in dynamics presented, commonly termed the sim-to-real gap. With the goal of minimizing this gap on resource-constrained embedded systems, we train and live-adapt agents on quadrotors built from off-the-shelf hardware. In achieving this we developed three novel contributions. (i) SwaNNFlight, an open-source firmware enabling wireless data capture and transfer of agents' observations. Fine-tuning agents with new data, and receiving and swapping onboard NN controllers -- all while in flight. We also design SwaNNFlight System (SwaNNFS) allowing new research in training and live-adapting learning agents on similar systems. (ii) Multiplicative value composition, a technique for preserving the importance of each policy optimization criterion, improving training performance and variability in learnt behavior. And (iii) anchor critics to help stabilize the fine-tuning of agents during sim-to-real transfer, online learning from real data while retaining behavior optimized in simulation. We train consistently flight-worthy control policies in simulation and deploy them on real quadrotors. We then achieve live controller adaptation via over-the-air updates of the onboard control policy from a ground station. Our results indicate that live adaptation unlocks a near-50\% reduction in power consumption, attributed to the sim-to-real gap. Finally, we tackle the issues of catastrophic forgetting and controller instability, showing the effectiveness of our novel methods. Project Website: https://github.com/BU-Cyber-Physical-Systems-Lab/SwaNNFS △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2206.00789 [pdf, other]

doi 10.1145/3552326.3587458

Unikernel Linux (UKL)

Authors: Ali Raza, Thomas Unger, Matthew Boyd, Eric Munson, Parul Sohal, Ulrich Drepper, Richard Jones, Daniel Bristot de Oliveira, Larry Woodman, Renato Mancuso, Jonathan Appavoo, Orran Krieger

Abstract: This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, sligh… ▽ More This paper presents Unikernel Linux (UKL), a path toward integrating unikernel optimization techniques in Linux, a general purpose operating system. UKL adds a configuration option to Linux allowing for a single, optimized process to link with the kernel directly, and run at supervisor privilege. This UKL process does not require application source code modification, only a re-link with our, slightly modified, Linux kernel and glibc. Unmodified applications show modest performance gains out of the box, and developers can further optimize applications for more significant gains (e.g. 26% throughput improvement for Redis). UKL retains support for co-running multiple user level processes capable of communicating with the UKL process using standard IPC. UKL preserves Linux's battle-tested codebase, community, and ecosystem of tools, applications, and hardware support. UKL runs both on bare-metal and virtual servers and supports multi-core execution. The changes to the Linux kernel are modest (1250 LOC). △ Less

Submitted 22 June, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Added more results in the evaluation section. Improved overall writing and added diagrams to explain the architecture

Journal ref: Proceedings of the Eighteenth European Conference on Computer Systems (EuroSys 23), May 2023, Pages 590 - 605

arXiv:2203.11423 [pdf, other]

doi 10.1145/3534879.3534888

RT-Bench: an Extensible Benchmark Framework for the Analysis and Management of Real-Time Applications

Authors: Mattia Nicolella, Shahin Roozkhosh, Denis Hoornaert, Andrea Bastoni, Renato Mancuso

Abstract: Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working set, and strive for stable response time avoiding non-predicable factors such as page faults. Unfortunately, available benchmark suites fail to reflect key charact… ▽ More Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working set, and strive for stable response time avoiding non-predicable factors such as page faults. Unfortunately, available benchmark suites fail to reflect key characteristics of real-time applications. Practitioners and researchers must resort to either benchmark heavily approximated real-time environments, or to re-engineer available benchmarks to add -- if possible -- the sought-after features. Additionally, the measuring and logging capabilities provided by most benchmark suites are not tailored "out-of-the-box" to real-time environments, and changing basic parameters such as the scheduling policy often becomes a tiring and error-prone exercise. In this paper, we present RT-bench, an open-source framework adding standard real-time features to virtually any existing benchmark. Furthermore, RT-bench provides an easy-to-use, unified command line interface to customize key aspects of the real-time execution of a set of benchmarks. Our framework is guided by four main criteria: 1) cohesive interface, 2) support for periodic application behavior and deadline semantics, 3) controllable memory footprint, and 4) extensibility and portability. We have integrated within the framework applications from the widely used SD-VBS and IsolBench suites. We showcase a set of use-cases that are representative of typical real-time system evaluation scenarios and that can be easily conducted via RT-Bench. △ Less

Submitted 30 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: 11 pages, 12 figures; code available at https://gitlab.com/rt-bench/rt-bench, documentation available at https://rt-bench.gitlab.io/rt-bench/

ACM Class: C.3; A.2

Journal ref: RTNS 2022: Proceedings of the 30th International Conference on Real-Time Networks and Systems June 2022 Pages 184-195

arXiv:2109.14349 [pdf, other]

Relational Memory: Native In-Memory Accesses on Rows and Columns

Authors: Shahin Roozkhosh, Denis Hoornaert, Ju Hyoung Mun, Tarikul Islam Papon, Ahmed Sanaullah, Ulrich Drepper, Renato Mancuso, Manos Athanassoulis

Abstract: Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate… ▽ More Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming rows to columns at runtime is expensive, hence, many analytical systems ingest data in row-first form and transform it in the background to columns to facilitate future analytical queries. How will this design change if we can always efficiently access only the desired set of columns? To address this question, we present a radically new approach to data transformation from rows to columns. We build upon recent advancements in embedded platforms with re-programmable logic to design native in-memory access on rows and columns. Our approach, termed Relational Memory, relies on an FPGA- based accelerator that sits between the CPU and main memory and transparently transforms base data to any group of columns with minimal overhead at runtime. This design allows accessing any group of columns as if it already exists in memory. We implement and deploy Relational Memory in real hardware, and we show that we can access the desired columns up to 1.63x faster than accessing them from their row-wise counterpart, while matching the performance of a pure columnar access for low projectivity, and outperforming it by up to 1.87x as projectivity (and tuple re-construction cost) increases. Moreover, our approach can be easily extended to support offloading of a number of operations to hardware, e.g., selection, group by, aggregation, and joins, having the potential to vastly simplify the software logic and accelerate the query execution. △ Less

Submitted 6 February, 2022; v1 submitted 29 September, 2021; originally announced September 2021.

arXiv:2102.11893 [pdf, other]

Honey, I Shrunk The Actor: A Case Study on Preserving Performance with Smaller Actors in Actor-Critic RL

Authors: Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

Abstract: Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures. This case study explores the performance impact of network sizes when considering actor and critic architectures independently. By relaxing the assumption of architectural symmetry, it is often possible for smaller actors to achieve comparable policy pe… ▽ More Actors and critics in actor-critic reinforcement learning algorithms are functionally separate, yet they often use the same network architectures. This case study explores the performance impact of network sizes when considering actor and critic architectures independently. By relaxing the assumption of architectural symmetry, it is often possible for smaller actors to achieve comparable policy performance to their symmetric counterparts. Our experiments show up to 99% reduction in the number of network weights with an average reduction of 77% over multiple actor-critic algorithms on 9 independent tasks. Given that reducing actor complexity results in a direct reduction of run-time inference cost, we believe configurations of actors and critics are aspects of actor-critic design that deserve to be considered independently, particularly in resource-constrained applications or when deploying multiple actors simultaneously. △ Less

Submitted 18 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: Accepted to the IEEE Conference on Games 2021

arXiv:2012.06656 [pdf, other]

doi 10.1145/3466618

How to Train your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning

Authors: Siddharth Mysore, Bassel Mabsout, Kate Saenko, Renato Mancuso

Abstract: We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a ma… ▽ More We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers %intended for deployment on real hardware as it can result in control instability and hardware failure. Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality - what is known as the reality gap. To combat issues of instability in RL agents, we propose a systematic framework, `REinforcement-based transferable Agents through Learning' (RE+AL), for designing simulated training environments which preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight-capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control and reduced power consumption. △ Less

Submitted 22 February, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Journal ref: ACM Transactions on Cyber-Physical Systems, Volume 5, Issue 4, October 2021

arXiv:2012.06644 [pdf, other]

Regularizing Action Policies for Smooth Control with Reinforcement Learning

Authors: Siddharth Mysore, Bassel Mabsout, Renato Mancuso, Kate Saenko

Abstract: A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies. This trend often presents itself in the form of control signal oscillation and can result in poor control, high power consumption, and undue system wear. We introduce Conditioning for Action Policy Smoothness (CAPS),… ▽ More A critical problem with the practical utility of controllers trained with deep Reinforcement Learning (RL) is the notable lack of smoothness in the actions learned by the RL policies. This trend often presents itself in the form of control signal oscillation and can result in poor control, high power consumption, and undue system wear. We introduce Conditioning for Action Policy Smoothness (CAPS), an effective yet intuitive regularization on action policies, which offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers, reflected in the elimination of high-frequency components in the control signal. Tested on a real system, improvements in controller smoothness on a quadrotor drone resulted in an almost 80% reduction in power consumption while consistently training flight-worthy controllers. Project website: http://ai.bu.edu/caps △ Less

Submitted 26 May, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: Accepted for publication to ICRA 2021

arXiv:2009.09104 [pdf, other]

Akita: A CPU scheduler for virtualized Clouds

Authors: Esmail Asyabi, Azer Bestavros, Renato Mancuso, Richard West, Erfan Sharafzadeh

Abstract: Clouds inherit CPU scheduling policies of operating systems. These policies enforce fairness while leveraging best-effort mechanisms to enhance responsiveness of all schedulable entities, irrespective of their service level objectives (SLOs). This leads to unpredictable performance that forces cloud providers to enforce strict reservation and isolation policies to prevent high-criticality services… ▽ More Clouds inherit CPU scheduling policies of operating systems. These policies enforce fairness while leveraging best-effort mechanisms to enhance responsiveness of all schedulable entities, irrespective of their service level objectives (SLOs). This leads to unpredictable performance that forces cloud providers to enforce strict reservation and isolation policies to prevent high-criticality services (e.g., Memcached) from being impacted by low-criticality ones (e.g., logging), which results in low utilization. In this paper, we present Akita, a hypervisor CPU scheduler that delivers predictable performance at high utilization. Akita allows virtual machines (VMs) to be categorized into high- and low-criticality VMs. Akita provides strong guarantees on the ability of cloud providers to meet SLOs of high-criticality VMs, by temporarily slowing down low-criticality VMs if necessary. Akita, therefore, allows the co-existence of high and low-criticality VMs on the same physical machine, leading to higher utilization. The effectiveness of Akita is demonstrated by a prototype implementation in the Xen hypervisor. We present experimental results that show the many advantages of adopting Akita as the hypervisor CPU scheduler. In particular, we show that high-criticality Memcached VMs are able to deliver predictable performance despite being co-located with low-criticality CPU-bound VMs. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2007.12271 [pdf, other]

doi 10.1109/TC.2021.3060650

Observing the Invisible: Live Cache Inspection for High-Performance Embedded Systems

Authors: Dharmesh Tarapore, Shahin Roozkhosh, Steven Brzozowski, Renato Mancuso

Abstract: The vast majority of high-performance embedded systems implement multi-level CPU cache hierarchies. But the exact behavior of these CPU caches has historically been opaque to system designers. Absent expensive hardware debuggers, an understanding of cache makeup remains tenuous at best. This enduring opacity further obscures the complex interplay among applications and OS-level components, particu… ▽ More The vast majority of high-performance embedded systems implement multi-level CPU cache hierarchies. But the exact behavior of these CPU caches has historically been opaque to system designers. Absent expensive hardware debuggers, an understanding of cache makeup remains tenuous at best. This enduring opacity further obscures the complex interplay among applications and OS-level components, particularly as they compete for the allocation of cache resources. Notwithstanding the relegation of cache comprehension to proxies such as static cache analysis, performance counter-based profiling, and cache hierarchy simulations, the underpinnings of cache structure and evolution continue to elude software-centric solutions. In this paper, we explore a novel method of studying cache contents and their evolution via snapshotting. Our method complements extant approaches for cache profiling to better formulate, validate, and refine hypotheses on the behavior of modern caches. We leverage cache introspection interfaces provided by vendors to perform live cache inspections without the need for external hardware. We present CacheFlow, a proof-of-concept Linux kernel module which snapshots cache contents on an NVIDIA Tegra TX1 SoC (system on chip). △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:1909.05349 [pdf, other]

doi 10.1109/MECO49872.2020.9134262

Cache Where you Want! Reconciling Predictability and Coherent Caching

Authors: Ayoosh Bansal, Jayati Singh, Yifan Hao, Jen-Yang Wen, Renato Mancuso, Marco Caccamo

Abstract: Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of unpredictability. Large fluctuations in latency to access data shared between multiple cores is an important contributor to the overall execution-time variabili… ▽ More Real-time and cyber-physical systems need to interact with and respond to their physical environment in a predictable time. While multicore platforms provide incredible computational power and throughput, they also introduce new sources of unpredictability. Large fluctuations in latency to access data shared between multiple cores is an important contributor to the overall execution-time variability. In addition to the temporal unpredictability introduced by caching, parallel applications with data shared across multiple cores also pay additional latency overheads due to data coherence. Analyzing the impact of data coherence on the worst-case execution-time of real-time applications is challenging because only scarce implementation details are revealed by manufacturers. This paper presents application level control for caching data at different levels of the cache hierarchy. The rationale is that by caching data only in shared cache it is possible to bypass private caches. The access latency to data present in caches becomes independent of its coherence state. We discuss the existing architectural support as well as the required hardware and OS modifications to support the proposed cacheability control. We evaluate the system on an architectural simulator. We show that the worst case execution time for a single memory write request is reduced by 52%. Benchmark evaluations show that proposed technique has a minimal impact on average performance. △ Less

Submitted 27 June, 2021; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: 13 pages, 10 figures, v2 update includes overview section with formal solution definition. This is a long version of a prior publication

ACM Class: C.0; C.3; C.4; D.4.7; J.7

Journal ref: 2020 9th Mediterranean Conference on Embedded Computing (MECO), 2020, pp. 1-6

arXiv:1901.06553 [pdf, other]

Neuroflight: Next Generation Flight Control Firmware

Authors: William Koch, Renato Mancuso, Azer Bestavros

Abstract: Little innovation has been made to low-level attitude flight control used by uncrewed aerial vehicles (UAVs), which still predominantly uses the classical PID controller. In this work we introduce Neuroflight, the first open source neuro-flight controller firmware. We present our toolchain for training a neural network in simulation and compiling it to run on embedded hardware. Challenges faced ju… ▽ More Little innovation has been made to low-level attitude flight control used by uncrewed aerial vehicles (UAVs), which still predominantly uses the classical PID controller. In this work we introduce Neuroflight, the first open source neuro-flight controller firmware. We present our toolchain for training a neural network in simulation and compiling it to run on embedded hardware. Challenges faced jumping from simulation to reality are discussed along with our solutions. Our evaluation shows the neural network can execute at over 2.67kHz on an Arm Cortex-M7 processor and flight tests demonstrate a quadcopter running Neuroflight can achieve stable flight and execute aerobatic maneuvers. △ Less

Submitted 16 September, 2019; v1 submitted 19 January, 2019; originally announced January 2019.

arXiv:1809.05921 [pdf, other]

Analysis of Dynamic Memory Bandwidth Regulation in Multi-core Real-Time Systems

Authors: Ankit Agrawal, Renato Mancuso, Rodolfo Pellizzoni, Gerhard Fohler

Abstract: One of the primary sources of unpredictability in modern multi-core embedded systems is contention over shared memory resources, such as caches, interconnects, and DRAM. Despite significant achievements in the design and analysis of multi-core systems, there is a need for a theoretical framework that can be used to reason on the worst-case behavior of real-time workload when both processors and me… ▽ More One of the primary sources of unpredictability in modern multi-core embedded systems is contention over shared memory resources, such as caches, interconnects, and DRAM. Despite significant achievements in the design and analysis of multi-core systems, there is a need for a theoretical framework that can be used to reason on the worst-case behavior of real-time workload when both processors and memory resources are subject to scheduling decisions. In this paper, we focus our attention on dynamic allocation of main memory bandwidth. In particular, we study how to determine the worst-case response time of tasks spanning through a sequence of time intervals, each with a different bandwidth-to-core assignment. We show that the response time computation can be reduced to a maximization problem over assignment of memory requests to different time intervals, and we provide an efficient way to solve such problem. As a case study, we then demonstrate how our proposed analysis can be used to improve the schedulability of Integrated Modular Avionics systems in the presence of memory-intensive workload. △ Less

Submitted 16 September, 2018; originally announced September 2018.

Comments: Accepted for publication in the IEEE Real-Time Systems Symposium (RTSS) 2018 conference

arXiv:1804.04154 [pdf, other]

Reinforcement Learning for UAV Attitude Control

Authors: William Koch, Renato Mancuso, Richard West, Azer Bestavros

Abstract: Autopilot systems are typically composed of an "inner loop" providing stability and control, while an "outer loop" is responsible for mission-level objectives, e.g. way-point navigation. Autopilot systems for UAVs are predominately implemented using Proportional, Integral Derivative (PID) control systems, which have demonstrated exceptional performance in stable environments. However more sophisti… ▽ More Autopilot systems are typically composed of an "inner loop" providing stability and control, while an "outer loop" is responsible for mission-level objectives, e.g. way-point navigation. Autopilot systems for UAVs are predominately implemented using Proportional, Integral Derivative (PID) control systems, which have demonstrated exceptional performance in stable environments. However more sophisticated control is required to operate in unpredictable, and harsh environments. Intelligent flight control systems is an active area of research addressing limitations of PID control most recently through the use of reinforcement learning (RL) which has had success in other applications such as robotics. However previous work has focused primarily on using RL at the mission-level controller. In this work, we investigate the performance and accuracy of the inner control loop providing attitude control when using intelligent flight control systems trained with the state-of-the-art RL algorithms, Deep Deterministic Gradient Policy (DDGP), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). To investigate these unknowns we first developed an open-source high-fidelity simulation environment to train a flight controller attitude control of a quadrotor through RL. We then use our environment to compare their performance to that of a PID controller to identify if using RL is appropriate in high-precision, time-critical flight control. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 13 pages, 9 figures

arXiv:1707.05260 [pdf, other]

Deterministic Memory Abstraction and Supporting Multicore System Architecture

Authors: Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, Heechul Yun

Abstract: Poor time predictability of multicore processors has been a long-standing challenge in the real-time systems community. In this paper, we make a case that a fundamental problem that prevents efficient and predictable real-time computing on multicore is the lack of a proper memory abstraction to express memory criticality, which cuts across various layers of the system: the application, OS, and har… ▽ More Poor time predictability of multicore processors has been a long-standing challenge in the real-time systems community. In this paper, we make a case that a fundamental problem that prevents efficient and predictable real-time computing on multicore is the lack of a proper memory abstraction to express memory criticality, which cuts across various layers of the system: the application, OS, and hardware. We, therefore, propose a new holistic resource management approach driven by a new memory abstraction, which we call Deterministic Memory. The key characteristic of deterministic memory is that the platform - the OS and hardware - guarantees small and tightly bounded worst-case memory access timing. In contrast, we call the conventional memory abstraction as best-effort memory in which only highly pessimistic worst-case bounds can be achieved. We propose to utilize both abstractions to achieve high time predictability but without significantly sacrificing performance. We present deterministic memory-aware OS and architecture designs, including OS-level page allocator, hardware-level cache, and DRAM controller designs. We implement the proposed OS and architecture extensions on Linux and gem5 simulator. Our evaluation results, using a set of synthetic and real-world benchmarks, demonstrate the feasibility and effectiveness of our approach. △ Less

Submitted 18 April, 2018; v1 submitted 17 July, 2017; originally announced July 2017.

Showing 1–14 of 14 results for author: Mancuso, R