subscribe to arXiv mailings

Naming the Pain in Machine Learning-Enabled Systems Engineering

Authors: Marcos Kalinowski, Daniel Mendez, Görkem Giray, Antonio Pedro Santos Alves, Kelly Azevedo, Tatiana Escovedo, Hugo Villamizar, Helio Lopes, Teresa Baldassarre, Stefan Wagner, Stefan Biffl, Jürgen Musil, Michael Felderer, Niklas Lavesson, Tony Gorschek

Abstract: Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an internation… ▽ More Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems. △ Less

Submitted 20 May, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.06726

arXiv:2405.13244 [pdf, other]

Quantum Software Ecosystem Design

Authors: Achim Basermann, Michael Epping, Benedikt Fauseweh, Michael Felderer, Elisabeth Lobe, Melven Röhrig-Zöllner, Gary Schmiedinghoff, Peter K. Schuhmacher, Yoshinta Setyawati, Alexander Weinert

Abstract: The rapid advancements in quantum computing necessitate a scientific and rigorous approach to the construction of a corresponding software ecosystem, a topic underexplored and primed for systematic investigation. This chapter takes an important step in this direction: It presents scientific considerations essential for building a quantum software ecosystem that makes quantum computing available fo… ▽ More The rapid advancements in quantum computing necessitate a scientific and rigorous approach to the construction of a corresponding software ecosystem, a topic underexplored and primed for systematic investigation. This chapter takes an important step in this direction: It presents scientific considerations essential for building a quantum software ecosystem that makes quantum computing available for scientific and industrial problem solving. Central to this discourse is the concept of hardware-software co-design, which fosters a bidirectional feedback loop from the application layer at the top of the software stack down to the hardware. This approach begins with compilers and low-level software that are specifically designed to align with the unique specifications and constraints of the quantum processor, proceeds with algorithms developed with a clear understanding of underlying hardware and computational model features, and extends to applications that effectively leverage the capabilities to achieve a quantum advantage. We analyze the ecosystem from two critical perspectives: the conceptual view, focusing on theoretical foundations, and the technical infrastructure, addressing practical implementations around real quantum devices necessary for a functional ecosystem. This approach ensures that the focus is towards promising applications with optimized algorithm-circuit synergy, while ensuring a user-friendly design, an effective data management and an overall orchestration. Our chapter thus offers a guide to the essential concepts and practical strategies necessary for developing a scientifically grounded quantum software ecosystem. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.14364 [pdf, other]

Toward Research Software Categories

Authors: Wilhelm Hasselbring, Stephan Druskat, Jan Bernoth, Philine Betker, Michael Felderer, Stephan Ferenz, Anna-Lena Lamprecht, Jan Linxweiler, Bernhard Rumpe

Abstract: Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-bas… ▽ More Research software has been categorized in different contexts to serve different goals. We start with a look at what research software is, before we discuss the purpose of research software categories. We propose a multi-dimensional categorization of research software. We present a template for characterizing such categories. As selected dimensions, we present our proposed role-based, developer-based, and maturity-based categories. Since our work has been inspired by various previous efforts to categorize research software, we discuss them as related works. We characterize all these categories via the previously introduced template, to enable a systematic comparison. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages

ACM Class: D.2

arXiv:2402.05333 [pdf]

ML-Enabled Systems Model Deployment and Monitoring: Status Quo and Problems

Authors: Eduardo Zimelewicz, Marcos Kalinowski, Daniel Mendez, Görkem Giray, Antonio Pedro Santos Alves, Niklas Lavesson, Kelly Azevedo, Hugo Villamizar, Tatiana Escovedo, Helio Lopes, Stefan Biffl, Juergen Musil, Michael Felderer, Stefan Wagner, Teresa Baldassarre, Tony Gorschek

Abstract: [Context] Systems incorporating Machine Learning (ML) models, often called ML-enabled systems, have become commonplace. However, empirical evidence on how ML-enabled systems are engineered in practice is still limited, especially for activities surrounding ML model dissemination. [Goal] We investigate contemporary industrial practices and problems related to ML model dissemination, focusing on the… ▽ More [Context] Systems incorporating Machine Learning (ML) models, often called ML-enabled systems, have become commonplace. However, empirical evidence on how ML-enabled systems are engineered in practice is still limited, especially for activities surrounding ML model dissemination. [Goal] We investigate contemporary industrial practices and problems related to ML model dissemination, focusing on the model deployment and the monitoring of ML life cycle phases. [Method] We conducted an international survey to gather practitioner insights on how ML-enabled systems are engineered. We gathered a total of 188 complete responses from 25 countries. We analyze the status quo and problems reported for the model deployment and monitoring phases. We analyzed contemporary practices using bootstrapping with confidence intervals and conducted qualitative analyses on the reported problems applying open and axial coding procedures. [Results] Practitioners perceive the model deployment and monitoring phases as relevant and difficult. With respect to model deployment, models are typically deployed as separate services, with limited adoption of MLOps principles. Reported problems include difficulties in designing the architecture of the infrastructure for production deployment and legacy application integration. Concerning model monitoring, many models in production are not monitored. The main monitored aspects are inputs, outputs, and decisions. Reported problems involve the absence of monitoring practices, the need to create custom monitoring tools, and the selection of suitable metrics. [Conclusion] Our results help provide a better understanding of the adopted practices and problems in practice and support guiding ML deployment and monitoring research in a problem-driven manner. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.06726

arXiv:2310.06939 [pdf, other]

On the Role of Font Formats in Building Efficient Web Applications

Authors: Benedikt Dornauer, Wolfgang Vigl, Michael Felderer

Abstract: The success of a web application is closely linked to its performance, which positively impacts user satisfaction and contributes to energy-saving efforts. Among the various optimization techniques, one specific subject focuses on improving the utilization of web fonts. This study investigates the impact of different font formats on client-side resource consumption, such as CPU, memory, load time,… ▽ More The success of a web application is closely linked to its performance, which positively impacts user satisfaction and contributes to energy-saving efforts. Among the various optimization techniques, one specific subject focuses on improving the utilization of web fonts. This study investigates the impact of different font formats on client-side resource consumption, such as CPU, memory, load time, and energy. In a controlled experiment, we evaluate performance metrics using the four font formats: OTF, TTF, WOFF, and WOFF2. The results of the study show that there are significant differences between all pair-wise format comparisons regarding all performance metrics. Overall, WOFF2 performs best, except in terms of memory allocation. Through the study and examination of literature, this research contributes (1) an overview of methodologies to enhance web performance through font utilization, (2) a specific exploration of the four prevalent font formats in an experimental setup, and (3) practical recommendations for scientific professionals and practitioners. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Preprint: Product-Focused Software Process Improvement 24th International Conference, PROFES 2023, Dornbirn, Austria, December 10-13, 2023, Proceedings

arXiv:2310.06726 [pdf]

Status Quo and Problems of Requirements Engineering for Machine Learning: Results from an International Survey

Authors: Antonio Pedro Santos Alves, Marcos Kalinowski, Görkem Giray, Daniel Mendez, Niklas Lavesson, Kelly Azevedo, Hugo Villamizar, Tatiana Escovedo, Helio Lopes, Stefan Biffl, Jürgen Musil, Michael Felderer, Stefan Wagner, Teresa Baldassarre, Tony Gorschek

Abstract: Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case s… ▽ More Systems that use Machine Learning (ML) have become commonplace for companies that want to improve their products and processes. Literature suggests that Requirements Engineering (RE) can help address many problems when engineering ML-enabled systems. However, the state of empirical evidence on how RE is applied in practice in the context of ML-enabled systems is mainly dominated by isolated case studies with limited generalizability. We conducted an international survey to gather practitioner insights into the status quo and problems of RE in ML-enabled systems. We gathered 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems involving open and axial coding procedures. We found significant differences in RE practices within ML projects. For instance, (i) RE-related activities are mostly conducted by project leaders and data scientists, (ii) the prevalent requirements documentation format concerns interactive Notebooks, (iii) the main focus of non-functional requirements includes data quality, model reliability, and model explainability, and (iv) main challenges include managing customer expectations and aligning requirements with data. The qualitative analyses revealed that practitioners face problems related to lack of business domain understanding, unclear goals and requirements, low customer engagement, and communication issues. These results help to provide a better understanding of the adopted practices and of which problems exist in practical environments. We put forward the need to adapt further and disseminate RE-related practices for engineering ML-enabled systems. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted for Publication at PROFES 2023

arXiv:2310.00788 [pdf, other]

Web Image Formats: Assessment of Their Real-World-Usage and Performance across Popular Web Browsers

Authors: Benedikt Dornauer, Michael Felderer

Abstract: In 2023, images on the web make up 41% of transmitted data, significantly impacting the performance of web apps. Fortunately, image formats like WEBP and AVIF could offer advanced compression and faster page loading, but may face performance disparities across browsers. Therefore, we conducted performance evaluations on five major browsers - Chrome, Edge, Safari, Opera, and Firefox - while compari… ▽ More In 2023, images on the web make up 41% of transmitted data, significantly impacting the performance of web apps. Fortunately, image formats like WEBP and AVIF could offer advanced compression and faster page loading, but may face performance disparities across browsers. Therefore, we conducted performance evaluations on five major browsers - Chrome, Edge, Safari, Opera, and Firefox - while comparing four image formats. The results indicate that the newer formats exhibited notable performance enhancements across all browsers, leading to shorter loading times. Compared to the compressed JPEG format, WEBP and AVIF improved the Page Load Time by 21% and 15%, respectively. However, web scraping revealed that JPEG and PNG still dominate web image choices, with WEBP at 4% as the most used new format. Through the web scraping and web performance evaluation, this research serves to (1) explore image format preferences in web applications and analyze distribution and characteristics across frequently-visited sites in 2023 and (2) assess the performance impact of distinct web image formats on application load times across popular web browsers. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: Preprint: Product-Focused Software Process Improvement 24th International Conference, PROFES 2023, Dornbirn, Austria , Dezember 10-13, 2023, Proceedings

arXiv:2310.00654 [pdf, other]

Streamlining Attack Tree Generation: A Fragment-Based Approach

Authors: Irdin Pekaric, Markus Frick, Jubril Gbolahan Adigun, Raffaela Groner, Thomas Witte, Alexander Raschke, Michael Felderer, Matthias Tichy

Abstract: Attack graphs are a tool for analyzing security vulnerabilities that capture different and prospective attacks on a system. As a threat modeling tool, it shows possible paths that an attacker can exploit to achieve a particular goal. However, due to the large number of vulnerabilities that are published on a daily basis, they have the potential to rapidly expand in size. Consequently, this necessi… ▽ More Attack graphs are a tool for analyzing security vulnerabilities that capture different and prospective attacks on a system. As a threat modeling tool, it shows possible paths that an attacker can exploit to achieve a particular goal. However, due to the large number of vulnerabilities that are published on a daily basis, they have the potential to rapidly expand in size. Consequently, this necessitates a significant amount of resources to generate attack graphs. In addition, generating composited attack models for complex systems such as self-adaptive or AI is very difficult due to their nature to continuously change. In this paper, we present a novel fragment-based attack graph generation approach that utilizes information from publicly available information security databases. Furthermore, we also propose a domain-specific language for attack modeling, which we employ in the proposed attack graph generation approach. Finally, we present a demonstrator example showcasing the attack generator's capability to replicate a verified attack chain, as previously confirmed by security experts. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: To appear at the 57th Hawaii International Conference on Social Systems (HICSS-57), Honolulu, Hawaii. 2024

arXiv:2309.09941 [pdf, other]

doi 10.1007/978-3-031-40923-3_9

Model-Based Generation of Attack-Fault Trees

Authors: Raffaela Groner, Thomas Witte, Alexander Raschke, Sophie Hirn, Irdin Pekaric, Markus Frick, Matthias Tichy, Michael Felderer

Abstract: Joint safety and security analysis of cyber-physical systems is a necessary step to correctly capture inter-dependencies between these properties. Attack-Fault Trees represent a combination of dynamic Fault Trees and Attack Trees and can be used to model and model-check a holistic view on both safety and security. Manually creating a complete AFT for the whole system is, however, a daunting task.… ▽ More Joint safety and security analysis of cyber-physical systems is a necessary step to correctly capture inter-dependencies between these properties. Attack-Fault Trees represent a combination of dynamic Fault Trees and Attack Trees and can be used to model and model-check a holistic view on both safety and security. Manually creating a complete AFT for the whole system is, however, a daunting task. It needs to span multiple abstraction layers, e.g., abstract application architecture and data flow as well as system and library dependencies that are affected by various vulnerabilities. We present an AFT generation tool-chain that facilitates this task using partial Fault and Attack Trees that are either manually created or mined from vulnerability databases. We semi-automatically create two system models that provide the necessary information to automatically combine these partial Fault and Attack Trees into complete AFTs using graph transformation rules. △ Less