-
Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding
Authors:
Daniel Bethell,
Simos Gerasimou,
Radu Calinescu,
Calum Imrie
Abstract:
Empowering safe exploration of reinforcement learning (RL) agents during training is a critical impediment towards deploying RL agents in many real-world scenarios. Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable. We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shiel…
▽ More
Empowering safe exploration of reinforcement learning (RL) agents during training is a critical impediment towards deploying RL agents in many real-world scenarios. Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable. We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training, thus protecting the RL agent from executing actions that yield potentially hazardous outcomes. Our comprehensive experimental evaluation against state-of-the-art safe RL exploration techniques demonstrates how ADVICE can significantly reduce safety violations during training while maintaining a competitive outcome reward.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Tree-Based versus Hybrid Graphical-Textual Model Editors: An Empirical Study of Testing Specifications
Authors:
Ionut Predoaia,
James Harbin,
Simos Gerasimou,
Christina Vasiliou,
Dimitris Kolovos,
Antonio García-Domínguez
Abstract:
Tree-based model editors and hybrid graphical-textual model editors have advantages and limitations when editing domain models. Data is displayed hierarchically in tree-based model editors, whereas hybrid graphical-textual model editors capture high-level domain concepts graphically and low-level domain details textually. We conducted an empirical user study with 22 participants, to evaluate the i…
▽ More
Tree-based model editors and hybrid graphical-textual model editors have advantages and limitations when editing domain models. Data is displayed hierarchically in tree-based model editors, whereas hybrid graphical-textual model editors capture high-level domain concepts graphically and low-level domain details textually. We conducted an empirical user study with 22 participants, to evaluate the implicit assumption of system modellers that hybrid notations are superior, and to investigate the tradeoffs between tree-based and hybrid model editors. The results of the user study indicate that users largely prefer hybrid editors and are more confident with hybrid notations for understanding the meaning of conditions. Furthermore, we found that tree editors provide superior performance for analysing ordered lists of model elements, whereas activities requiring the comprehension or modelling of complex conditions are carried out faster through a hybrid editor.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
DeepKnowledge: Generalisation-Driven Deep Learning Testing
Authors:
Sondess Missaoui,
Simos Gerasimou,
Nikolaos Matragkas
Abstract:
Despite their unprecedented success, DNNs are notoriously fragile to small shifts in data distribution, demanding effective testing techniques that can assess their dependability. Despite recent advances in DNN testing, there is a lack of systematic testing approaches that assess the DNN's capability to generalise and operate comparably beyond data in their training distribution. We address this g…
▽ More
Despite their unprecedented success, DNNs are notoriously fragile to small shifts in data distribution, demanding effective testing techniques that can assess their dependability. Despite recent advances in DNN testing, there is a lack of systematic testing approaches that assess the DNN's capability to generalise and operate comparably beyond data in their training distribution. We address this gap with DeepKnowledge, a systematic testing methodology for DNN-based systems founded on the theory of knowledge generalisation, which aims to enhance DNN robustness and reduce the residual risk of 'black box' models. Conforming to this theory, DeepKnowledge posits that core computational DNN units, termed Transfer Knowledge neurons, can generalise under domain shift. DeepKnowledge provides an objective confidence measurement on testing activities of DNN given data distribution shifts and uses this information to instrument a generalisation-informed test adequacy criterion to check the transfer knowledge capacity of a test set. Our empirical evaluation of several DNNs, across multiple datasets and state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepKnowledge and its ability to support the engineering of more dependable DNNs. We report improvements of up to 10 percentage points over state-of-the-art coverage criteria for detecting adversarial attacks on several benchmarks, including MNIST, SVHN, and CIFAR.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Quantitative Assurance and Synthesis of Controllers from Activity Diagrams
Authors:
Kangfeng Ye,
Fang Yan,
Simos Gerasimou
Abstract:
Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have…
▽ More
Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have extended UML activity diagrams (ADs), developed transformations, and implemented accompanying tools for automation. The research, however, is incomprehensive and not fully open, which makes it hard to be evaluated, extended, adapted, and accessed. In this paper, we propose a comprehensive verification framework for ADs, including a new profile for probability, time, and quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language, supported by PRISM and Storm. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification. We evaluated one case study where multiple robots are used for delivery in a hospital and further evaluated six other examples from the literature. With all these together, this work makes noteworthy contributions to the verification of ADs by improving evaluation, extensibility, adaptability, and accessibility.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Robust Uncertainty Quantification Using Conformalised Monte Carlo Prediction
Authors:
Daniel Bethell,
Simos Gerasimou,
Radu Calinescu
Abstract:
Deploying deep learning models in safety-critical applications remains a very challenging task, mandating the provision of assurances for the dependable operation of these models. Uncertainty quantification (UQ) methods estimate the model's confidence per prediction, informing decision-making by considering the effect of randomness and model misspecification. Despite the advances of state-of-the-a…
▽ More
Deploying deep learning models in safety-critical applications remains a very challenging task, mandating the provision of assurances for the dependable operation of these models. Uncertainty quantification (UQ) methods estimate the model's confidence per prediction, informing decision-making by considering the effect of randomness and model misspecification. Despite the advances of state-of-the-art UQ methods, they are computationally expensive or produce conservative prediction sets/intervals. We introduce MC-CP, a novel hybrid UQ method that combines a new adaptive Monte Carlo (MC) dropout method with conformal prediction (CP). MC-CP adaptively modulates the traditional MC dropout at runtime to save memory and computation resources, enabling predictions to be consumed by CP, yielding robust prediction sets/intervals. Throughout comprehensive experiments, we show that MC-CP delivers significant improvements over advanced UQ methods, like MC dropout, RAPS and CQR, both in classification and regression benchmarks. MC-CP can be easily added to existing models, making its deployment simple.
△ Less
Submitted 22 January, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Bayesian Learning for the Robust Verification of Autonomous Robots
Authors:
Xingyu Zhao,
Simos Gerasimou,
Radu Calinescu,
Calum Imrie,
Valentin Robu,
David Flynn
Abstract:
Autonomous robots used in infrastructure inspection, space exploration and other critical missions operate in highly dynamic environments. As such, they must continually verify their ability to complete the tasks associated with these missions safely and effectively. Here we present a Bayesian learning framework that enables this runtime verification of autonomous robots. The framework uses prior…
▽ More
Autonomous robots used in infrastructure inspection, space exploration and other critical missions operate in highly dynamic environments. As such, they must continually verify their ability to complete the tasks associated with these missions safely and effectively. Here we present a Bayesian learning framework that enables this runtime verification of autonomous robots. The framework uses prior knowledge and observations of the verified robot to learn expected ranges for the occurrence rates of regular and singular (e.g., catastrophic failure) events. Interval continuous-time Markov models defined using these ranges are then analysed to obtain expected intervals of variation for system properties such as mission duration and success probability. We apply the framework to an autonomous robotic mission for underwater infrastructure inspection and repair. The formal proofs and experiments presented in the paper show that our framework produces results that reflect the uncertainty intrinsic to many real-world systems, enabling the robust verification of their quantitative properties under parametric uncertainty.
△ Less
Submitted 11 December, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Software Performability Analysis Using Fast Parametric Model Checking
Authors:
Xinwei Fang,
Radu Calinescu,
Simos Gerasimou,
Faisal Alhwikem
Abstract:
We present an efficient parametric model checking (PMC) technique for the analysis of software performability, i.e., of the performance and dependability properties of software systems. The new PMC technique works by automatically decomposing a parametric discrete-time Markov chain (pDTMC) model of the software system under verification into fragments that can be analysed independently, yielding r…
▽ More
We present an efficient parametric model checking (PMC) technique for the analysis of software performability, i.e., of the performance and dependability properties of software systems. The new PMC technique works by automatically decomposing a parametric discrete-time Markov chain (pDTMC) model of the software system under verification into fragments that can be analysed independently, yielding results that are then combined to establish the required software performability properties. Our fast parametric model checking (fPMC) technique enables the formal analysis of software systems modelled by pDTMCs that are too complex to be handled by existing PMC methods. Furthermore, for many pDTMCs that state-of-the-art parametric model checkers can analyse, fPMC produces solutions (i.e., algebraic formulae) that are simpler and much faster to evaluate. We show experimentally that adding fPMC to the existing repertoire of PMC methods improves the efficiency of parametric model checking significantly, and extends its applicability to software systems with more complex behaviour than currently possible.
△ Less
Submitted 23 October, 2022; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Fast Parametric Model Checking through Model Fragmentation
Authors:
Xinwei Fang,
Radu Calinescu,
Simos Gerasimou,
Faisal Alhwikem
Abstract:
Parametric model checking (PMC) computes algebraic formulae that express key non-functional properties of a system (reliability, performance, etc.) as rational functions of the system and environment parameters. In software engineering, PMC formulae can be used during design, e.g., to analyse the sensitivity of different system architectures to parametric variability, or to find optimal system con…
▽ More
Parametric model checking (PMC) computes algebraic formulae that express key non-functional properties of a system (reliability, performance, etc.) as rational functions of the system and environment parameters. In software engineering, PMC formulae can be used during design, e.g., to analyse the sensitivity of different system architectures to parametric variability, or to find optimal system configurations. They can also be used at runtime, e.g., to check if non-functional requirements are still satisfied after environmental changes, or to select new configurations after such changes. However, current PMC techniques do not scale well to systems with complex behaviour and more than a few parameters. Our paper introduces a fast PMC (fPMC) approach that overcomes this limitation, extending the applicability of PMC to a broader class of systems than previously possible. To this end, fPMC partitions the Markov models that PMC operates with into \emph{fragments} whose reachability properties are analysed independently, and obtains PMC reachability formulae by combining the results of these fragment analyses. To demonstrate the effectiveness of fPMC, we show how our fPMC tool can analyse three systems (taken from the research literature, and belonging to different application domains) with which current PMC techniques and tools struggle.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Learning to Learn in Collective Adaptive Systems: Mining Design Patterns for Data-driven Reasoning
Authors:
Mirko D'Angelo,
Sona Ghahremani,
Simos Gerasimou,
Johannes Grohmann,
Ingrid Nunes,
Sven Tomforde,
Evangelos Pournaras
Abstract:
Engineering collective adaptive systems (CAS) with learning capabilities is a challenging task due to their multi-dimensional and complex design space. Data-driven approaches for CAS design could introduce new insights enabling system engineers to manage the CAS complexity more cost-effectively at the design-phase. This paper introduces a systematic approach to reason about design choices and patt…
▽ More
Engineering collective adaptive systems (CAS) with learning capabilities is a challenging task due to their multi-dimensional and complex design space. Data-driven approaches for CAS design could introduce new insights enabling system engineers to manage the CAS complexity more cost-effectively at the design-phase. This paper introduces a systematic approach to reason about design choices and patterns of learning-based CAS. Using data from a systematic literature review, reasoning is performed with a novel application of data-driven methodologies such as clustering, multiple correspondence analysis and decision trees. The reasoning based on past experience as well as supporting novel and innovative design choices are demonstrated.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Supporting Robotic Software Migration Using Static Analysis and Model-Driven Engineering
Authors:
Sophie Wood,
Nicholas Matragkas,
Dimitris Kolovos,
Richard Paige,
Simos Gerasimou
Abstract:
The wide use of robotic systems contributed to developing robotic software highly coupled to the hardware platform running the robotic system. Due to increased maintenance cost or changing business priorities, the robotic hardware is infrequently upgraded, thus increasing the risk for technology stagnation. Reducing this risk entails migrating the system and its software to a new hardware platform…
▽ More
The wide use of robotic systems contributed to developing robotic software highly coupled to the hardware platform running the robotic system. Due to increased maintenance cost or changing business priorities, the robotic hardware is infrequently upgraded, thus increasing the risk for technology stagnation. Reducing this risk entails migrating the system and its software to a new hardware platform. Conventional software engineering practices such as complete re-development and code-based migration, albeit useful in mitigating these obsolescence issues, they are time-consuming and overly expensive. Our RoboSMi model-driven approach supports the migration of the software controlling a robotic system between hardware platforms. First, RoboSMi executes static analysis on the robotic software of the source hardware platform to identify platform-dependent and platform-agnostic software constructs. By analysing a model that expresses the architecture of robotic components on the target platform, RoboSMi establishes the hardware configuration of those components and suggests software libraries for each component whose execution will enable the robotic software to control the components. Finally, RoboSMi through code-generation produces software for the target platform and indicates areas that require manual intervention by robotic engineers to complete the migration. We evaluate the applicability of RoboSMi and analyse the level of automation and performance provided from its use by migrating two robotic systems deployed for an environmental monitoring and a line following mission from a Propeller Activity Board to an Arduino Uno.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Genetic Improvement @ ICSE 2020
Authors:
William B. Langdon,
Westley Weimer,
Justyna Petke,
Erik Fredericks,
Seongmin Lee,
Emily Winter,
Michail Basios,
Myra B. Cohen,
Aymeric Blot,
Markus Wagner,
Bobby R. Bruce,
Shin Yoo,
Simos Gerasimou,
Oliver Krauss,
Yu Huang,
Michael Gerten
Abstract:
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (…
▽ More
Following Prof. Mark Harman of Facebook's keynote and formal presentations (which are recorded in the proceedings) there was a wide ranging discussion at the eighth international Genetic Improvement workshop, GI-2020 @ ICSE (held as part of the 42nd ACM/IEEE International Conference on Software Engineering on Friday 3rd July 2020). Topics included industry take up, human factors, explainabiloity (explainability, justifyability, exploitability) and GI benchmarks. We also contrast various recent online approaches (e.g. SBST 2020) to holding virtual computer science conferences and workshops via the WWW on the Internet without face-2-face interaction. Finally we speculate on how the Coronavirus Covid-19 Pandemic will affect research next year and into the future.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Importance-Driven Deep Learning System Testing
Authors:
Simos Gerasimou,
Hasan Ferit Eniser,
Alper Sen,
Alper Cakan
Abstract:
Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria fro…
▽ More
Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical applications requires to provide testing evidence for their dependable operation. Recent research in this direction focuses on adapting testing criteria from traditional software engineering as a means of increasing confidence for their correct behaviour. However, they are inadequate in capturing the intrinsic properties exhibited by these systems. We bridge this gap by introducing DeepImportance, a systematic testing methodology accompanied by an Importance-Driven (IDC) test adequacy criterion for DL systems. Applying IDC enables to establish a layer-wise functional understanding of the importance of DL system components and use this information to assess the semantic diversity of a test set. Our empirical evaluation on several DL systems, across multiple DL datasets and with state-of-the-art adversarial generation techniques demonstrates the usefulness and effectiveness of DeepImportance and its ability to support the engineering of more robust DL systems.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Software Engineering for Intelligent and Autonomous Systems: Report from the GI Dagstuhl Seminar 18343
Authors:
Simos Gerasimou,
Thomas Vogel,
Ada Diaconescu
Abstract:
Software systems are increasingly used in application domains characterised by uncertain environments, evolving requirements and unexpected failures; sudden system malfunctioning raises serious issues of security, safety, loss of comfort or revenue. During operation, these systems will likely need to deal with several unpredictable situations including variations in system performance, sudden chan…
▽ More
Software systems are increasingly used in application domains characterised by uncertain environments, evolving requirements and unexpected failures; sudden system malfunctioning raises serious issues of security, safety, loss of comfort or revenue. During operation, these systems will likely need to deal with several unpredictable situations including variations in system performance, sudden changes in system workload and component failures. These situations can cause deviation from the desired system behaviour and require dynamic adaptation of the system behaviour, parameters or architecture. Through using closed-loop control, typically realized with software, intelligent and autonomous software systems can dynamically adapt themselves, without any or with limited human involvement, by identifying abnormal situations, analysing alternative adaptation options, and finally, self-adapting to a suitable new configuration. This report summarises the research carried out during SEfIAS GI Dagstuhl seminar which provided a forum for strengthening interaction and collaboration for early-career researchers and practitioners from the research communities of SEAMS, ICAC/ICCAC, SASO, Self-Aware Computing and AAMAS.
△ Less
Submitted 2 April, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
DeepFault: Fault Localization for Deep Neural Networks
Authors:
Hasan Ferit Eniser,
Simos Gerasimou,
Alper Sen
Abstract:
Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault whitebox DNN testing approach presented in our paper addresses this challenge by employing suspiciou…
▽ More
Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault whitebox DNN testing approach presented in our paper addresses this challenge by employing suspiciousness measures inspired by fault localization to establish the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. DeepFault also uses a suspiciousness-guided algorithm to synthesize new inputs, from correctly classified inputs, that increase the activation values of suspicious neurons. Our empirical evaluation on several DNN instances trained on MNIST and CIFAR-10 datasets shows that DeepFault is effective in identifying suspicious neurons. Also, the inputs synthesized by DeepFault closely resemble the original inputs, exercise the identified suspicious neurons and are highly adversarial.
△ Less
Submitted 15 February, 2019;
originally announced February 2019.
-
Engineering Trustworthy Self-Adaptive Software with Dynamic Assurance Cases
Authors:
Radu Calinescu,
Danny Weyns,
Simos Gerasimou,
M. Usman Iftikhar,
Ibrahim Habli,
Tim Kelly
Abstract:
Building on concepts drawn from control theory, self-adaptive software handles environmental and internal uncertainties by dynamically adjusting its architecture and parameters in response to events such as workload changes and component failures. Self-adaptive software is increasingly expected to meet strict functional and non-functional requirements in applications from areas as diverse as manuf…
▽ More
Building on concepts drawn from control theory, self-adaptive software handles environmental and internal uncertainties by dynamically adjusting its architecture and parameters in response to events such as workload changes and component failures. Self-adaptive software is increasingly expected to meet strict functional and non-functional requirements in applications from areas as diverse as manufacturing, healthcare and finance. To address this need, we introduce a methodology for the systematic ENgineering of TRUstworthy Self-adaptive sofTware (ENTRUST). ENTRUST uses a combination of (1) design-time and runtime modelling and verification, and (2) industry-adopted assurance processes to develop trustworthy self-adaptive software and assurance cases arguing the suitability of the software for its intended application. To evaluate the effectiveness of our methodology, we present a tool-supported instance of ENTRUST and its use to develop proof-of-concept self-adaptive software for embedded and service-based systems from the oceanic monitoring and e-finance domains, respectively. The experimental results show that ENTRUST can be used to engineer self-adaptive software systems in different application domains and to generate dynamic assurance cases for these systems.
△ Less
Submitted 22 November, 2018; v1 submitted 18 March, 2017;
originally announced March 2017.