subscribe to arXiv mailings

Exploring Human-AI Collaboration in Agile: Customised LLM Meeting Assistants

Authors: Beatriz Cabrero-Daniel, Tomas Herda, Victoria Pichler, Martin Eder

Abstract: This action research study focuses on the integration of "AI assistants" in two Agile software development meetings: the Daily Scrum and a feature refinement, a planning meeting that is part of an in-house Scaled Agile framework. We discuss the critical drivers of success, and establish a link between the use of AI and team collaboration dynamics. We conclude with a list of lessons learnt during t… ▽ More This action research study focuses on the integration of "AI assistants" in two Agile software development meetings: the Daily Scrum and a feature refinement, a planning meeting that is part of an in-house Scaled Agile framework. We discuss the critical drivers of success, and establish a link between the use of AI and team collaboration dynamics. We conclude with a list of lessons learnt during the interventions in an industrial context, and provide a assessment checklist for companies and teams to reflect on their readiness level. This paper is thus a road-map to facilitate the integration of AI tools in Agile setups. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: International Conference on Agile Software Development (XP 2024), 14 pages

arXiv:2403.16289 [pdf, other]

Engineering Safety Requirements for Autonomous Driving with Large Language Models

Authors: Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Hȧkan Sivencrona, Christian Berger

Abstract: Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts an… ▽ More Changes and updates in the requirement artifacts, which can be frequent in the automotive domain, are a challenge for SafetyOps. Large Language Models (LLMs), with their impressive natural language understanding and generating capabilities, can play a key role in automatically refining and decomposing requirements after each update. In this study, we propose a prototype of a pipeline of prompts and LLMs that receives an item definition and outputs solutions in the form of safety requirements. This pipeline also performs a review of the requirement dataset and identifies redundant or contradictory requirements. We first identified the necessary characteristics for performing HARA and then defined tests to assess an LLM's capability in meeting these criteria. We used design science with multiple iterations and let experts from different companies evaluate each cycle quantitatively and qualitatively. Finally, the prototype was implemented at a case company and the responsible team evaluated its efficiency. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted in 32nd IEEE International Requirements Engineering 2024 conference, Iceland

arXiv:2403.09565 [pdf, other]

doi 10.1145/3644815.3644953

Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models

Authors: Ali Nouri, Beatriz Cabrero-Daniel, Fredrik Törner, Hȧkan Sivencrona, Christian Berger

Abstract: DevOps is a necessity in many industries, including the development of Autonomous Vehicles. In those settings, there are iterative activities that reduce the speed of SafetyOps cycles. One of these activities is "Hazard Analysis & Risk Assessment" (HARA), which is an essential step to start the safety requirements specification. As a potential approach to increase the speed of this step in SafetyO… ▽ More DevOps is a necessity in many industries, including the development of Autonomous Vehicles. In those settings, there are iterative activities that reduce the speed of SafetyOps cycles. One of these activities is "Hazard Analysis & Risk Assessment" (HARA), which is an essential step to start the safety requirements specification. As a potential approach to increase the speed of this step in SafetyOps, we have delved into the capabilities of Large Language Models (LLMs). Our objective is to systematically assess their potential for application in the field of safety engineering. To that end, we propose a framework to support a higher degree of automation of HARA with LLMs. Despite our endeavors to automate as much of the process as possible, expert review remains crucial to ensure the validity and correctness of the analysis results, with necessary modifications made accordingly. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted in CAIN 2024, 6 pages, 1 figure

arXiv:2401.12611 [pdf, other]

Prompt Smells: An Omen for Undesirable Generative AI Outputs

Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Christian Berger

Abstract: Recent Generative Artificial Intelligence (GenAI) trends focus on various applications, including creating stories, illustrations, poems, articles, computer code, music compositions, and videos. Extrinsic hallucinations are a critical limitation of such GenAI, which can lead to significant challenges in achieving and maintaining the trustworthiness of GenAI. In this paper, we propose two new conce… ▽ More Recent Generative Artificial Intelligence (GenAI) trends focus on various applications, including creating stories, illustrations, poems, articles, computer code, music compositions, and videos. Extrinsic hallucinations are a critical limitation of such GenAI, which can lead to significant challenges in achieving and maintaining the trustworthiness of GenAI. In this paper, we propose two new concepts that we believe will aid the research community in addressing limitations associated with the application of GenAI models. First, we propose a definition for the "desirability" of GenAI outputs and three factors which are observed to influence it. Second, drawing inspiration from Martin Fowler's code smells, we propose the concept of "prompt smells" and the adverse effects they are observed to have on the desirability of GenAI outputs. We expect our work will contribute to the ongoing conversation about the desirability of GenAI outputs and help advance the field in a meaningful way. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted at CAIN 2024: Poster Track

arXiv:2311.10736 [pdf, other]

Systematic Evaluation of Applying Space-Filling Curves to Automotive Maneuver Detection

Authors: Christian Berger, Beatriz Cabrero-Daniel, M. Cagri Kaya, Maryam Esmaeili Darestani, Hannah Shiels

Abstract: Identifying driving maneuvers plays an essential role on-board vehicles to monitor driving and driver states, as well as off-board to train and evaluate machine learning algorithms for automated driving for example. Maneuvers can be characterized by vehicle kinematics or data from its surroundings including other traffic participants. Extracting relevant maneuvers therefore requires analyzing time… ▽ More Identifying driving maneuvers plays an essential role on-board vehicles to monitor driving and driver states, as well as off-board to train and evaluate machine learning algorithms for automated driving for example. Maneuvers can be characterized by vehicle kinematics or data from its surroundings including other traffic participants. Extracting relevant maneuvers therefore requires analyzing time-series of (i) structured, multi-dimensional kinematic data, and (ii) unstructured, large data samples for video, radar, or LiDAR sensors. However, such data analysis requires scalable and computationally efficient approaches, especially for non-annotated data. In this paper, we are presenting a maneuver detection approach based on two variants of space-filling curves (Z-order and Hilbert) to detect maneuvers when passing roundabouts that do not use GPS data. We systematically evaluate their respective performance by including permutations of selections of kinematic signals at varying frequencies and compare them with two alternative baselines: All manually identified roundabouts, and roundabouts that are marked by geofences. We find that encoding just longitudinal and lateral accelerations sampled at 10Hz using a Hilbert space-filling curve is already successfully identifying roundabout maneuvers, which allows to avoid the use of potentially sensitive signals such as GPS locations to comply with data protection and privacy regulations like GDPR. △ Less

Submitted 23 October, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures

arXiv:2311.03832 [pdf, other]

Requirements Engineering using Generative AI: Prompts and Prompting Patterns

Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Jennifer Horkoff, Christian Berger

Abstract: [Context]: Companies are increasingly recognizing the importance of automating Requirements Engineering (RE) tasks due to their resource-intensive nature. The advent of GenAI has made these tasks more amenable to automation, thanks to its ability to understand and interpret context effectively. [Problem]: However, in the context of GenAI, prompt engineering is a critical factor for success. Despit… ▽ More [Context]: Companies are increasingly recognizing the importance of automating Requirements Engineering (RE) tasks due to their resource-intensive nature. The advent of GenAI has made these tasks more amenable to automation, thanks to its ability to understand and interpret context effectively. [Problem]: However, in the context of GenAI, prompt engineering is a critical factor for success. Despite this, we currently lack tools and methods to systematically assess and determine the most effective prompt patterns to employ for a particular RE task. [Method]: Two tasks related to requirements, specifically requirement classification and tracing, were automated using the GPT-3.5 turbo API. The performance evaluation involved assessing various prompts created using 5 prompt patterns and implemented programmatically to perform the selected RE tasks, focusing on metrics such as precision, recall, accuracy, and F-Score. [Results]: This paper evaluates the effectiveness of the 5 prompt patterns' ability to make GPT-3.5 turbo perform the selected RE tasks and offers recommendations on which prompt pattern to use for a specific RE task. Additionally, it also provides an evaluation framework as a reference for researchers and practitioners who want to evaluate different prompt patterns for different RE tasks. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.18648 [pdf, other]

Generative Artificial Intelligence for Software Engineering -- A Research Agenda

Authors: Anh Nguyen-Duc, Beatriz Cabrero-Daniel, Adam Przybylek, Chetan Arora, Dron Khanna, Tomas Herda, Usman Rafiq, Jorge Melegati, Eduardo Guerra, Kai-Kristian Kemell, Mika Saari, Zheying Zhang, Huy Le, Tho Quan, Pekka Abrahamsson

Abstract: Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in software development, offering assistance to various managerial and technical project activities. Notable examples of these tools include OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent publications have explored and evaluated the application of GenAI, a comprehensive understandin… ▽ More Generative Artificial Intelligence (GenAI) tools have become increasingly prevalent in software development, offering assistance to various managerial and technical project activities. Notable examples of these tools include OpenAIs ChatGPT, GitHub Copilot, and Amazon CodeWhisperer. Although many recent publications have explored and evaluated the application of GenAI, a comprehensive understanding of the current development, applications, limitations, and open challenges remains unclear to many. Particularly, we do not have an overall picture of the current state of GenAI technology in practical software engineering usage scenarios. We conducted a literature review and focus groups for a duration of five months to develop a research agenda on GenAI for Software Engineering. We identified 78 open Research Questions (RQs) in 11 areas of Software Engineering. Our results show that it is possible to explore the adoption of GenAI in partial automation and support decision-making in all software development activities. While the current literature is skewed toward software implementation, quality assurance and software maintenance, other areas, such as requirements engineering, software design, and software engineering education, would need further research attention. Common considerations when implementing GenAI include industry-level assessment, dependability and accuracy, data accessibility, transparency, and sustainability aspects associated with the technology. GenAI is bringing significant changes to the field of software engineering. Nevertheless, the state of research on the topic still remains immature. We believe that this research agenda holds significance and practical value for informing both researchers and practitioners about current applications and guiding future research. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2306.12132 [pdf, other]

ChatGPT as a tool for User Story Quality Evaluation: Trustworthy Out of the Box?

Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Christian Berger

Abstract: In Agile software development, user stories play a vital role in capturing and conveying end-user needs, prioritizing features, and facilitating communication and collaboration within development teams. However, automated methods for evaluating user stories require training in NLP tools and can be time-consuming to develop and integrate. This study explores using ChatGPT for user story quality eva… ▽ More In Agile software development, user stories play a vital role in capturing and conveying end-user needs, prioritizing features, and facilitating communication and collaboration within development teams. However, automated methods for evaluating user stories require training in NLP tools and can be time-consuming to develop and integrate. This study explores using ChatGPT for user story quality evaluation and compares its performance with an existing benchmark. Our study shows that ChatGPT's evaluation aligns well with human evaluation, and we propose a ``best of three'' strategy to improve its output stability. We also discuss the concept of trustworthiness in AI and its implications for non-experts using ChatGPT's unprocessed outputs. Our research contributes to understanding the reliability and applicability of AI in user story evaluation and offers recommendations for future research. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 9 Pages, 2 Tables, 1 Figure. Accepted at AI-Assisted Agile Software Development Workshop (Co-located with XP 2023)

arXiv:2306.10054 [pdf]

doi 10.1162/leon_a_02523

A Shift In Artistic Practices through Artificial Intelligence

Authors: Kıvanç Tatar, Petter Ericson, Kelsey Cotton, Paola Torres Núñez del Prado, Roser Batlle-Roca, Beatriz Cabrero-Daniel, Sara Ljungblad, Georgios Diapoulis, Jabbar Hussain

Abstract: The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globall… ▽ More The explosion of content generated by artificial intelligence (AI) models has initiated a cultural shift in arts, music, and media, whereby roles are changing, values are shifting, and conventions are challenged. The vast, readily available dataset of the Internet has created an environment for AI models to be trained on any content on the Web. With AI models shared openly and used by many globally, how does this new paradigm shift challenge the status quo in artistic practices? What kind of changes will AI technology bring to music, arts, and new media? △ Less

Submitted 10 April, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

Journal ref: Leonardo, 2024, 293-97

arXiv:2306.01774 [pdf, other]

doi 10.1145/3597512.3599697

RE-centric Recommendations for the Development of Trustworthy(er) Autonomous Systems

Authors: Krishna Ronanki, Beatriz Cabrero-Daniel, Jennifer Horkoff, Christian Berger

Abstract: Complying with the EU AI Act (AIA) guidelines while developing and implementing AI systems will soon be mandatory within the EU. However, practitioners lack actionable instructions to operationalise ethics during AI systems development. A literature review of different ethical guidelines revealed inconsistencies in the principles addressed and the terminology used to describe them. Furthermore, re… ▽ More Complying with the EU AI Act (AIA) guidelines while developing and implementing AI systems will soon be mandatory within the EU. However, practitioners lack actionable instructions to operationalise ethics during AI systems development. A literature review of different ethical guidelines revealed inconsistencies in the principles addressed and the terminology used to describe them. Furthermore, requirements engineering (RE), which is identified to foster trustworthiness in the AI development process from the early stages was observed to be absent in a lot of frameworks that support the development of ethical and trustworthy AI. This incongruous phrasing combined with a lack of concrete development practices makes trustworthy AI development harder. To address this concern, we formulated a comparison table for the terminology used and the coverage of the ethical AI principles in major ethical AI guidelines. We then examined the applicability of ethical AI development frameworks for performing effective RE during the development of trustworthy AI systems. A tertiary review and meta-analysis of literature discussing ethical AI frameworks revealed their limitations when developing trustworthy AI. Based on our findings, we propose recommendations to address such limitations during the development of trustworthy AI. △ Less

Submitted 5 January, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

Comments: Accepted at [TAS '23]{First International Symposium on Trustworthy Autonomous Systems}

arXiv:2305.18176 [pdf, other]

doi 10.1145/3597512.3599715

Perceived Trustworthiness of Natural Language Generators

Authors: Beatriz Cabrero-Daniel, Andrea Sanagustín Cabrero

Abstract: Natural Language Generation tools, such as chatbots that can generate human-like conversational text, are becoming more common both for personal and professional use. However, there are concerns about their trustworthiness and ethical implications. The paper addresses the problem of understanding how different users (e.g., linguists, engineers) perceive and adopt these tools and their perception o… ▽ More Natural Language Generation tools, such as chatbots that can generate human-like conversational text, are becoming more common both for personal and professional use. However, there are concerns about their trustworthiness and ethical implications. The paper addresses the problem of understanding how different users (e.g., linguists, engineers) perceive and adopt these tools and their perception of machine-generated text quality. It also discusses the perceived advantages and limitations of Natural Language Generation tools, as well as users' beliefs on governance strategies. The main findings of this study include the impact of users' field and level of expertise on the perceived trust and adoption of Natural Language Generation tools, the users' assessment of the accuracy, fluency, and potential biases of machine-generated text in comparison to human-written text, and an analysis of the advantages and ethical risks associated with these tools as identified by the participants. Moreover, this paper discusses the potential implications of these findings for enhancing the AI development process. The paper sheds light on how different user characteristics shape their beliefs on the quality and overall trustworthiness of machine-generated text. Furthermore, it examines the benefits and risks of these tools from the perspectives of different users. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 16 pages, 5 figures, First International Symposium on Trustworthy Autonomous Systems (TAS '23)

arXiv:2305.08093 [pdf, other]

AI for Agile development: a Meta-Analysis

Authors: Beatriz Cabrero-Daniel

Abstract: This study explores the benefits and challenges of integrating Artificial Intelligence with Agile software development methodologies, focusing on improving continuous integration and delivery. A systematic literature review and longitudinal meta-analysis of the retrieved studies was conducted to analyse the role of Artificial Intelligence and it's future applications within Agile software developm… ▽ More This study explores the benefits and challenges of integrating Artificial Intelligence with Agile software development methodologies, focusing on improving continuous integration and delivery. A systematic literature review and longitudinal meta-analysis of the retrieved studies was conducted to analyse the role of Artificial Intelligence and it's future applications within Agile software development. The review helped identify critical challenges, such as the need for specialised socio-technical expertise. While Artificial Intelligence holds promise for improved software development practices, further research is needed to better understand its impact on processes and practitioners, and to address the indirect challenges associated with its implementation. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: 8 pages, 2 figures, 24th International Conference on Agile Software Development. AI-Assisted Agile Software Development Research Workshop

MSC Class: 94-02

Showing 1–12 of 12 results for author: Cabrero-Daniel, B