subscribe to arXiv mailings

arXiv:2405.10250 [pdf, other]

IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers

Authors: Hao Yan, Thomas D. Latoza, Ziyu Yao

Abstract: Large language models (LLMs) have exhibited a strong promise in automatically generating executable code from natural language descriptions, particularly with interactive features that allow users to engage in the code-generation process by instructing the LLM with iterative feedback. However, existing interaction paradigms often assume that users have expert knowledge to debug source code and are… ▽ More Large language models (LLMs) have exhibited a strong promise in automatically generating executable code from natural language descriptions, particularly with interactive features that allow users to engage in the code-generation process by instructing the LLM with iterative feedback. However, existing interaction paradigms often assume that users have expert knowledge to debug source code and are not optimized for non-professional programmers' use. This raises challenges in making interactive code generation more accessible for individuals with varying levels of programming expertise. To tackle these challenges, we present IntelliExplain, which offers a novel human-LLM interaction paradigm to enhance non-professional programmers' experience by enabling them to interact with source code via natural language explanations. Users interact with IntelliExplain by providing natural language corrective feedback on errors they identify from the explanations. Feedback is used by the system to revise the code, until the user is satisfied with explanations by the system of the code. Our user study demonstrates that users with IntelliExplain achieve a significantly higher success rate 11.6% and 25.3% better than with vanilla GPT-3.5, while also requiring 39.0% and 15.6% less time in Text-to-SQL and Python code generation tasks, respectively. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2302.03287 [pdf, other]

doi 10.1109/ICSTW58534.2023.00078

ChatGPT and Software Testing Education: Promises & Perils

Authors: Sajed Jalil, Suzzana Rafi, Thomas D. LaToza, Kevin Moran, Wing Lam

Abstract: Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demons… ▽ More Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct responses. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors. △ Less

Submitted 11 March, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 8 pages, 2 tables, 6 figures

ACM Class: D.2.5

Journal ref: 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

arXiv:2301.09789 [pdf, other]

A Qualitative Study on the Implementation Design Decisions of Developers

Authors: Jenny T. Liang, Maryam Arab, Minhyuk Ko, Amy J. Ko, Thomas D. LaToza

Abstract: Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specifi… ▽ More Decision-making is a key software engineering skill. Developers constantly make choices throughout the software development process, from requirements to implementation. While prior work has studied developer decision-making, the choices made while choosing what solution to write in code remain understudied. In this mixed-methods study, we examine the phenomenon where developers select one specific way to implement a behavior in code, given many potential alternatives. We call these decisions implementation design decisions. Our mixed-methods study includes 46 survey responses and 14 semi-structured interviews with professional developers about their decision types, considerations, processes, and expertise for implementation design decisions. We find that implementation design decisions, rather than being a natural outcome from higher levels of design, require constant monitoring of higher level design choices, such as requirements and architecture. We also show that developers have a consistent general structure to their implementation decision-making process, but no single process is exactly the same. We discuss the implications of our findings on research, education, and practice, including insights on teaching developers how to make implementation design decisions. △ Less

Submitted 23 January, 2023; originally announced January 2023.

arXiv:2109.02682 [pdf, other]

Edit-Run Behavior in Programming and Debugging

Authors: Abdulaziz Alaboudi, Thomas D. LaToza

Abstract: As developers program and debug, they continuously edit and run their code, a behavior known as edit-run cycles. While techniques such as live programming are intended to support this behavior, little is known about the characteristics of edit-run cycles themselves. To bridge this gap, we analyzed 28 hours of programming and debugging work from 11 professional developers which encompassed over thr… ▽ More As developers program and debug, they continuously edit and run their code, a behavior known as edit-run cycles. While techniques such as live programming are intended to support this behavior, little is known about the characteristics of edit-run cycles themselves. To bridge this gap, we analyzed 28 hours of programming and debugging work from 11 professional developers which encompassed over three thousand development activities. We mapped activities to edit or run steps, constructing 581 debugging and 207 programming edit-run cycles. We found that edit-run cycles are frequent. Developers edit and run the program, on average, 7 times before fixing a defect and twice before introducing a defect. Developers waited longer before again running the program when programming than debugging, with a mean cycle length of 3 minutes for programming and 1 minute for debugging. Most cycles involved an edit to a single file after which a developer ran the program to observe the impact on the final output. Edit-run cycles which included activities beyond edit and run, such as navigating between files, consulting resources, or interacting with other IDE features, were much longer, with a mean length of 5 minutes, rather than 1.5 minutes. We conclude with a discussion of design recommendations for tools to enable more fluidity in edit-run cycles. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: VL/HCC 2021

arXiv:2105.02162 [pdf, other]

An Exploratory Study of Debugging Episodes

Authors: Abdulaziz Alaboudi, Thomas D. LaToza

Abstract: Many studies have long investigated how developers debug, shaping our understanding of debugging and helping motivate the creation of more effective tools. However, less is known about the typical progression of debugging in real world settings. In this study, we focus on characterizing debugging episodes from the moment at which developers first encounter a defect to the moment at which it is res… ▽ More Many studies have long investigated how developers debug, shaping our understanding of debugging and helping motivate the creation of more effective tools. However, less is known about the typical progression of debugging in real world settings. In this study, we focus on characterizing debugging episodes from the moment at which developers first encounter a defect to the moment at which it is resolved. We investigate the typical duration and frequency of debugging episodes and the typical activities which occur. We observed developers by watching professional developers at work in live-streamed programming sessions. Using this data source, we curated 15 sessions in which 11 professional developers worked for 30 hours. We then systematically coded the debugging episodes and activities that occurred within these videos, yielding a dataset of 2137 debugging activities and 1407 programming activities. We found that debugging was frequent, even in programming work, occurring once every eight minutes. Debugging episodes vary greatly in time, with most being less than a few minutes and a few as more than 100 minutes. However, most debugging time is spent in long debugging episodes. We found no single activity that dominated debugging time, and long debugging episodes often involved many diverse activities. Finally, we found that,in terms of the activities developers did, programming and debugging were remarkably similar, particularly in the frequency of editing and browsing code. △ Less

Submitted 5 May, 2021; originally announced May 2021.

arXiv:2009.05207 [pdf, other]

Can Microtask Programming Work in Industry?

Authors: Shinobu Saito, Yukako Iimura, Emad Aghayi, Thomas D. LaToza

Abstract: A critical issue in software development projects in IT service companies is finding the right people at the right time. By enabling assignments of tasks to people to be more fluid, the use of crowdsourcing approaches within a company offers a potential solution to this challenge. Inside a company, as multiple system development projects are ongoing separately, developers with slack time on one pr… ▽ More A critical issue in software development projects in IT service companies is finding the right people at the right time. By enabling assignments of tasks to people to be more fluid, the use of crowdsourcing approaches within a company offers a potential solution to this challenge. Inside a company, as multiple system development projects are ongoing separately, developers with slack time on one project might use this time to contribute to other projects. In this paper, we report on a case study of the application of crowdsourcing within an industrial web application system development project in a large telecommunications company. Developers worked with system specifications which were organized into a set of microtasks, offering a set of short and self-contained descriptions. When crowd workers in other projects had slack time, they fetched and completed microtasks. Our results offer initial evidence for the potential value of microtask programming in increasing the fluidity of team assignments within a company. Crowd contributors to the project were able to onboard and contribute to a new project in less than 2 hours. After onboarding, the crowd workers were together able to successfully implement a small program which contained only a small number of defects. Interview and survey data gathered from project participants revealed that crowd workers reported that they perceived onboarding costs to be reduced and did not experience issues with the reduced face to face communication, but experienced challenges with motivation. △ Less

Submitted 10 September, 2020; originally announced September 2020.

arXiv:2007.05902 [pdf, other]

doi 10.1109/VLHCC.2019.8818871

Editable AI: Mixed Human-AI Authoring of Code Patterns

Authors: Kartik Chugh, Andrea Y. Solis, Thomas D. LaToza

Abstract: Developers authoring HTML documents define elements following patterns which establish and reflect the visual structure of a document, such as making all images in a footer the same height by applying a class to each. To surface these patterns to developers and support developers in authoring consistent with these patterns, we propose a mixed human-AI technique for creating code patterns. Patterns… ▽ More Developers authoring HTML documents define elements following patterns which establish and reflect the visual structure of a document, such as making all images in a footer the same height by applying a class to each. To surface these patterns to developers and support developers in authoring consistent with these patterns, we propose a mixed human-AI technique for creating code patterns. Patterns are first learned from individual HTML documents through a decision tree, generating a representation which developers may view and edit. Code patterns are used to offer developers autocomplete suggestions, list examples, and flag violations. To evaluate our technique, we conducted a user study in which 24 participants wrote, edited, and corrected HTML documents. We found that our technique enabled developers to edit and correct documents more quickly and create, edit, and correct documents more successfully. △ Less

Submitted 11 July, 2020; originally announced July 2020.

Journal ref: 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Memphis, TN, USA, 2019, pp. 35-43

arXiv:2007.05046 [pdf, other]

doi 10.1145/3368089.3409751

RulePad: Interactive Authoring of Checkable Design Rules

Authors: Sahar Mehrpour, Thomas D. LaToza, Hamed Sarvari

Abstract: Good documentation offers the promise of enabling developers to easily understand design decisions. Unfortunately, in practice, design documents are often rarely updated, becoming inaccurate, incomplete, and untrustworthy. A better solution is to enable developers to write down design rules which are checked against code for consistency. But existing rule checkers require learning specialized quer… ▽ More Good documentation offers the promise of enabling developers to easily understand design decisions. Unfortunately, in practice, design documents are often rarely updated, becoming inaccurate, incomplete, and untrustworthy. A better solution is to enable developers to write down design rules which are checked against code for consistency. But existing rule checkers require learning specialized query languages or program analysis frameworks, creating a barrier to writing project-specific rules. We introduce two new techniques for authoring design rules: snippet-based authoring and semi-natural-language authoring. In snippet-based authoring, developers specify characteristics of elements to match by writing partial code snippets. In semi-natural language authoring, a textual representation offers a representation for understanding design rules and resolving ambiguities. We implemented these approaches in RulePad. To evaluate RulePad, we conducted a between-subjects study with 14 participants comparing RulePad to the PMD Designer, a utility for writing rules in a popular rule checker. We found that those with RulePad were able to successfully author 13 times more query elements in significantly less time and reported being significantly more willing to use RulePad in their everyday work. △ Less

Submitted 23 November, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

arXiv:2005.13652 [pdf, other]

Using Hypotheses as a Debugging Aid

Authors: Abdulaziz Alaboudi, Thomas D. LaToza

Abstract: As developers debug, developers formulate hypotheses about the cause of the defect and gather evidence to test these hypotheses. To better understand the role of hypotheses in debugging, we conducted two studies. In a preliminary study, we found that, even with the benefit of modern internet resources, incorrect hypotheses can cause developers to investigate irrelevant information and block progre… ▽ More As developers debug, developers formulate hypotheses about the cause of the defect and gather evidence to test these hypotheses. To better understand the role of hypotheses in debugging, we conducted two studies. In a preliminary study, we found that, even with the benefit of modern internet resources, incorrect hypotheses can cause developers to investigate irrelevant information and block progress. We then conducted a controlled experiment where 20 developers debugged and recorded their hypotheses. We found that developers have few hypotheses, two per defect. Having a correct hypothesis early strongly predicted later success. We also studied the impact of two debugging aids: fault locations and potential hypotheses. Offering fault locations did not help developers formulate more correct hypotheses or debug more successfully. In contrast, offering potential hypotheses made developers six times more likely to succeed. These results demonstrate the potential of future debugging tools that enable finding and sharing relevant hypotheses. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Journal ref: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020

arXiv:2004.00701 [pdf, other]

An Exploratory Study of Writing and Revising Explicit Programming Strategies

Authors: Maryam Arab, Thomas D LaToza, Amy J Ko

Abstract: Knowledge sharing plays a crucial role throughout all software application development activities. When programmers learn and share through media like Stack overflow, GitHub, Meetups, videos, discussion forums, wikis, and blogs, every developer benefits. However, there is one kind of knowledge that developers share far less often: strategic knowledge for how to approach programming problems (e.g.,… ▽ More Knowledge sharing plays a crucial role throughout all software application development activities. When programmers learn and share through media like Stack overflow, GitHub, Meetups, videos, discussion forums, wikis, and blogs, every developer benefits. However, there is one kind of knowledge that developers share far less often: strategic knowledge for how to approach programming problems (e.g., how to debug server-side Python errors, how to resolve a merge conflict, how to evaluate the stability of an API one is considering for adoption). In this paper, we investigate the feasibility of developers articulating and sharing their strategic knowledge, and the use of these strategies to support other developers in their problem-solving. We specifically investigate challenges that developers face in articulating strategies in a form in which other developers can use to increase their productivity. To observe this, we simulated a knowledge-sharing platform, asking experts to articulate one of their own strategies and then asked the second set of developers to try to use the strategies and provide feedback on the strategies to authors. During the study, we asked both strategy authors and users to reflect on the challenges they faced. In analyzing the strategies authors created, the use of the strategies, the feedback that users provided to authors, and the difficulties that authors faced addressing this feedback, we found that developers can share strategic knowledge, but authoring strategies require substantial feedback from diverse audiences to be helpful to programmers with varying prior knowledge. Our results also raise challenging questions about how future work should support searching and browsing for strategies that support varying prior knowledge. △ Less

Submitted 21 October, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:1911.00046 [pdf, other]

doi 10.1007/s10664-020-09810-1

Explicit Programming Strategies

Authors: Thomas D. LaToza, Maryam Arab, Dastyni Loksa, Amy J. Ko

Abstract: Software developers solve a diverse and wide range of problems. While software engineering research often focuses on tools to support this problem solving, the strategies that developers use to solve problems are at least as important. In this paper, we offer a novel approach for enabling developers to follow explicit programming strategies that describe how an expert tackles a common programming… ▽ More Software developers solve a diverse and wide range of problems. While software engineering research often focuses on tools to support this problem solving, the strategies that developers use to solve problems are at least as important. In this paper, we offer a novel approach for enabling developers to follow explicit programming strategies that describe how an expert tackles a common programming problem. We define explicit programming strategies, grounding our definition in prior work both within software engineering and in other professions which have adopted more explicit procedures for problem solving. We then present a novel notation called Roboto and a novel StrategyTracker tool that explicitly represents programming strategies and frame executing strategies as a collaborative effort between human abilities to make decisions and computer abilities to structure process and persist information. Ina formative evaluation, 28 software developers of varying expertise completed a design task and a debugging task. We found that, compared to developers who are free to choose their strategies, developers gave explicit strategies experienced their work as more organized, systematic, and predictable, but also more constrained. Developers using explicit strategies were objectively more successful at the design and debugging tasks. We discuss the implications of Roboto and these findings, envisioning a thriving ecosystem of explicit strategies that accelerate and improve developers programming problem solving. △ Less

Submitted 6 November, 2019; v1 submitted 31 October, 2019; originally announced November 2019.

Comments: 48 pages, 8 figures, To appear in the proceedings of Empirical Software Engineering Journal

arXiv:1907.05931 [pdf, other]

An Exploratory Study of Live-Streamed Programming

Authors: Abdulaziz Alaboudi, Thomas D. LaToza

Abstract: In live-streamed programming, developers broadcast their development work on open source projects using streaming media such as YouTube or Twitch. Sessions are first announced by a developer acting as the streamer, inviting other developers to join and interact as watchers using chat. To better understand the characteristics, motivations, and challenges in live-streamed programming, we analyzed 20… ▽ More In live-streamed programming, developers broadcast their development work on open source projects using streaming media such as YouTube or Twitch. Sessions are first announced by a developer acting as the streamer, inviting other developers to join and interact as watchers using chat. To better understand the characteristics, motivations, and challenges in live-streamed programming, we analyzed 20 hours of live-streamed programming videos and surveyed 7 streamers about their experiences. The results reveal that live-streamed programming shares some of the characteristics and benefits of pair programming, but differs in the nature of the relationship between the streamer and watchers. We also found that streamers are motivated by knowledge sharing, socializing, and building an online identity, but face challenges with tool limitations and maintaining engagement with watchers. We discuss the implications of these findings, identify limitations with current tools, and propose design recommendations for new forms of tools to better supporting live-streamed programming. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1905.11366 [pdf, other]

Supporting Software Engineering Research and Education by Annotating Public Videos of Developers Programming

Authors: Abdulaziz Alaboudi, Thomas D. LaToza

Abstract: Software engineering has long studied how software developers work, building a body of work which forms the foundation of many software engineering best practices, tools, and theories. Recently, some developers have begun recording videos of themselves engaged in programming tasks contributing to open source projects, enabling them to share knowledge and socialize with other developers. We believe… ▽ More Software engineering has long studied how software developers work, building a body of work which forms the foundation of many software engineering best practices, tools, and theories. Recently, some developers have begun recording videos of themselves engaged in programming tasks contributing to open source projects, enabling them to share knowledge and socialize with other developers. We believe that these videos offer an important opportunity for both software engineering research and education. In this paper, we discuss the potential use of these videos as well as open questions for how to best enable this envisioned use. We propose creating a central repository of programming videos, enabling analyzing and annotating videos to illustrate specific behaviors of interest such as asking and answering questions, employing strategies, and software engineering theories. Such a repository would offer an important new way in which both software engineering researchers and students can understand how software developers work. △ Less

Submitted 9 May, 2019; originally announced May 2019.

arXiv:1903.01977 [pdf, other]

Crowdsourced Behavior-Driven Development: Implementing Microservices through Microtasks

Authors: Emad Aghayi, Thomas D. LaToza, Paurav Surendra, Seyedmeysam Abolghasemi

Abstract: Key to the effectiveness of crowdsourcing approaches for software engineering is workflow design, describing how complex work is organized into small, relatively independent microtasks. In this paper, we introduce a Behavior-Driven Development (BDD) workflow for accomplishing programming work through self-contained microtasks, implemented as a preconfigured environment called Crowd Microservices.… ▽ More Key to the effectiveness of crowdsourcing approaches for software engineering is workflow design, describing how complex work is organized into small, relatively independent microtasks. In this paper, we introduce a Behavior-Driven Development (BDD) workflow for accomplishing programming work through self-contained microtasks, implemented as a preconfigured environment called Crowd Microservices. In our approach, a client, acting on behalf of a software team, describes a microservice as a set of endpoints with paths, requests, and responses. A crowd then implements the endpoints, identifying individual endpoint behaviors which they test, implement, and debug, creating new functions and interacting with persistence APIs as needed. To evaluate our approach, we conducted a feasibility study in which a small crowd worked to implement a small ToDo microservice. The crowd created an implementation with only four defects, completing 350 microtasks and implementing 13 functions. We discuss the implications of these findings for incorporating crowdsourced programming contributions into traditional software projects. △ Less

Submitted 2 September, 2020; v1 submitted 5 March, 2019; originally announced March 2019.

Showing 1–14 of 14 results for author: LaToza, T D