subscribe to arXiv mailings

Human Factors in Model-Driven Engineering: Future Research Goals and Initiatives for MDE

Authors: Grischa Liebel, Jil Klünder, Regina Hebig, Christopher Lazik, Inês Nunes, Isabella Graßl, Jan-Philipp Steghöfer, Joeri Exelmans, Julian Oertel, Kai Marquardt, Katharina Juhnke, Kurt Schneider, Lucas Gren, Lucia Happe, Marc Herrmann, Marvin Wyrich, Matthias Tichy, Miguel Goulão, Rebekka Wohlrab, Reyhaneh Kalantari, Robert Heinrich, Sandra Greiner, Satrio Adi Rukmono, Shalini Chakraborty, Silvia Abrahão , et al. (1 additional authors not shown)

Abstract: Purpose: Software modelling and Model-Driven Engineering (MDE) is traditionally studied from a technical perspective. However, one of the core motivations behind the use of software models is inherently human-centred. Models aim to enable practitioners to communicate about software designs, make software understandable, or make software easier to write through domain-specific modelling languages.… ▽ More Purpose: Software modelling and Model-Driven Engineering (MDE) is traditionally studied from a technical perspective. However, one of the core motivations behind the use of software models is inherently human-centred. Models aim to enable practitioners to communicate about software designs, make software understandable, or make software easier to write through domain-specific modelling languages. Several recent studies challenge the idea that these aims can always be reached and indicate that human factors play a role in the success of MDE. However, there is an under-representation of research focusing on human factors in modelling. Methods: During a GI-Dagstuhl seminar, topics related to human factors in modelling were discussed by 26 expert participants from research and industry. Results: In breakout groups, five topics were covered in depth, namely modelling human aspects, factors of modeller experience, diversity and inclusion in MDE, collaboration and MDE, and teaching human-aware MDE. Conclusion: We summarise our insights gained during the discussions on the five topics. We formulate research goals, questions, and propositions that support directing future initiatives towards an MDE community that is aware of and supportive of human factors and values. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2402.08706 [pdf, ps, other]

doi 10.1145/3643664.3648213

Apples, Oranges, and Software Engineering: Study Selection Challenges for Secondary Research on Latent Variables

Authors: Marvin Wyrich, Marvin Muñoz Barón, Justus Bogner

Abstract: Software engineering (SE) is full of abstract concepts that are crucial for both researchers and practitioners, such as programming experience, team productivity, code comprehension, and system security. Secondary studies aimed at summarizing research on the influences and consequences of such concepts would therefore be of great value. However, the inability to measure abstract concepts directl… ▽ More Software engineering (SE) is full of abstract concepts that are crucial for both researchers and practitioners, such as programming experience, team productivity, code comprehension, and system security. Secondary studies aimed at summarizing research on the influences and consequences of such concepts would therefore be of great value. However, the inability to measure abstract concepts directly poses a challenge for secondary studies: primary studies in SE can operationalize such concepts in many ways. Standardized measurement instruments are rarely available, and even if they are, many researchers do not use them or do not even provide a definition for the studied concept. SE researchers conducting secondary studies therefore have to decide a) which primary studies intended to measure the same construct, and b) how to compare and aggregate vastly different measurements for the same construct. In this experience report, we discuss the challenge of study selection in SE secondary research on latent variables. We report on two instances where we found it particularly challenging to decide which primary studies should be included for comparison and synthesis, so as not to end up comparing apples with oranges. Our report aims to spark a conversation about developing strategies to address this issue systematically and pave the way for more efficient and rigorous secondary studies in software engineering. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted to WSESE 2024, an ICSE co-located workshop on methodological issues with empirical studies in software engineering

arXiv:2402.08608 [pdf, ps, other]

doi 10.1145/3643664.3648203

Evidence Tetris in the Pixelated World of Validity Threats

Authors: Marvin Wyrich, Sven Apel

Abstract: Valid empirical studies build confidence in scientific findings. Fortunately, it is now common for software engineering researchers to consider threats to validity when designing their studies and to discuss them as part of their publication. Yet, in complex experiments with human participants, there is often an overwhelming number of intuitively plausible threats to validity -- more than a resear… ▽ More Valid empirical studies build confidence in scientific findings. Fortunately, it is now common for software engineering researchers to consider threats to validity when designing their studies and to discuss them as part of their publication. Yet, in complex experiments with human participants, there is often an overwhelming number of intuitively plausible threats to validity -- more than a researcher can feasibly cover. Therefore, prioritizing potential threats to validity becomes crucial. We suggest moving away from relying solely on intuition for prioritizing validity threats, and propose that evidence on the actual impact of suspected threats to validity should complement intuition. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Accepted to WSESE 2024, an ICSE co-located workshop on methodological issues with empirical studies in software engineering

arXiv:2401.02268 [pdf, other]

doi 10.1145/3639475.3640113

Beyond Self-Promotion: How Software Engineering Research Is Discussed on LinkedIn

Authors: Marvin Wyrich, Justus Bogner

Abstract: LinkedIn is the largest professional network in the world. As such, it can serve to build bridges between practitioners, whose daily work is software engineering (SE), and researchers, who work to advance the field of software engineering. We know that such a metaphorical bridge exists: SE research findings are sometimes shared on LinkedIn and commented on by software practitioners. Yet, we do not… ▽ More LinkedIn is the largest professional network in the world. As such, it can serve to build bridges between practitioners, whose daily work is software engineering (SE), and researchers, who work to advance the field of software engineering. We know that such a metaphorical bridge exists: SE research findings are sometimes shared on LinkedIn and commented on by software practitioners. Yet, we do not know what state the bridge is in. Therefore, we quantitatively and qualitatively investigate how SE practitioners and researchers approach each other via public LinkedIn discussions and what both sides can contribute to effective science communication. We found that a considerable proportion of LinkedIn posts on SE research are written by people who are not the paper authors (39%). Further, 71% of all comments in our dataset are from people in the industry, but only every second post receives at least one comment at all. Based on our findings, we formulate concrete advice for researchers and practitioners to make sharing new research findings on LinkedIn more fruitful. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted for publication at the 46th International Conference on Software Engineering (ICSE 2024), Software Engineering in Society (SEIS) track

arXiv:2310.11301 [pdf, other]

Source Code Comprehension: A Contemporary Definition and Conceptual Model for Empirical Investigation

Authors: Marvin Wyrich

Abstract: Be it in debugging, testing, code review or, more recently, pair programming with AI assistance: in all these activities, software engineers need to understand source code. Accordingly, plenty of research is taking place in the field to find out, for example, what makes code easy to understand and which tools can best support developers in their comprehension process. And while any code comprehens… ▽ More Be it in debugging, testing, code review or, more recently, pair programming with AI assistance: in all these activities, software engineers need to understand source code. Accordingly, plenty of research is taking place in the field to find out, for example, what makes code easy to understand and which tools can best support developers in their comprehension process. And while any code comprehension researcher certainly has a rough idea of what they mean when they mention a developer having a good understanding of a piece of code, to date, the research community has not managed to define source code comprehension as a concept. Instead, in primary research on code comprehension, an implicit definition by task prevails, i.e., code comprehension is what the experimental tasks measure. This approach has two negative consequences. First, it makes it difficult to conduct secondary research. Currently, each code comprehension primary study uses different comprehension tasks and measures, and thus it is not clear whether different studies intend to measure the same construct. Second, authors of a primary study run into the difficulty of justifying their design decisions without a definition of what they attempt to measure. An operationalization of an insufficiently described construct occurs, which poses a threat to construct validity. The task of defining code comprehension considering the theory of the past fifty years is not an easy one. Nor is it a task that every author of a primary study must accomplish on their own. Therefore, this paper constitutes a reference work that defines source code comprehension and presents a conceptual framework in which researchers can anchor their empirical code comprehension research. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Submission under review

arXiv:2307.02850 [pdf, other]

doi 10.1109/MS.2023.3277034

Resist the Hype! Practical Recommendations to Cope With Résumé-Driven Development

Authors: Jonas Fritzsch, Marvin Wyrich, Justus Bogner, Stefan Wagner

Abstract: Technology trends play an important role in the hiring process for software and IT professionals. In a recent study of 591 software professionals in both hiring (130) and technical (558) roles, we found empirical support for a tendency to overemphasize technology trends in résumés and the application process. 60% of the hiring professionals agreed that such trends would influence their job adverti… ▽ More Technology trends play an important role in the hiring process for software and IT professionals. In a recent study of 591 software professionals in both hiring (130) and technical (558) roles, we found empirical support for a tendency to overemphasize technology trends in résumés and the application process. 60% of the hiring professionals agreed that such trends would influence their job advertisements. Among the software professionals, 82% believed that using trending technologies in their daily work would make them more attractive for potential future employers. This phenomenon has previously been reported anecdotally and somewhat humorously under the label Résumé-Driven Development (RDD). Our article seeks to initiate a more serious debate about the consequences of RDD on software development practice. We explain how the phenomenon may constitute a harmful self-sustaining dynamic, and provide practical recommendations for both the hiring and applicant perspectives to change the current situation for the better. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 8 pages, 4 figures

arXiv:2301.10563 [pdf, other]

doi 10.1109/ICSE48619.2023.00162

Evidence Profiles for Validity Threats in Program Comprehension Experiments

Authors: Marvin Muñoz Barón, Marvin Wyrich, Daniel Graziotin, Stefan Wagner

Abstract: Searching for clues, gathering evidence, and reviewing case files are all techniques used by criminal investigators to draw sound conclusions and avoid wrongful convictions. Similarly, in software engineering (SE) research, we can develop sound methodologies and mitigate threats to validity by basing study design decisions on evidence. Echoing a recent call for the empirical evaluation of design… ▽ More Searching for clues, gathering evidence, and reviewing case files are all techniques used by criminal investigators to draw sound conclusions and avoid wrongful convictions. Similarly, in software engineering (SE) research, we can develop sound methodologies and mitigate threats to validity by basing study design decisions on evidence. Echoing a recent call for the empirical evaluation of design decisions in program comprehension experiments, we conducted a 2-phases study consisting of systematic literature searches, snowballing, and thematic synthesis. We found out (1) which validity threat categories are most often discussed in primary studies of code comprehension, and we collected evidence to build (2) the evidence profiles for the three most commonly reported threats to validity. We discovered that few mentions of validity threats in primary studies (31 of 409) included a reference to supporting evidence. For the three most commonly mentioned threats, namely the influence of programming experience, program length, and the selected comprehension measures, almost all cited studies (17 of 18) did not meet our criteria for evidence. We show that for many threats to validity that are currently assumed to be influential across all studies, their actual impact may depend on the design and context of each specific study. Researchers should discuss threats to validity within the context of their particular study and support their discussions with evidence. The present paper can be one resource for evidence, and we call for more meta-studies of this type to be conducted, which will then inform design decisions in primary studies. Further, although we have applied our methodology in the context of program comprehension, our approach can also be used in other SE research areas to enable evidence-based experiment design decisions and meaningful discussions of threats to validity. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 13 pages, 4 figures, 5 tables. To be published at ICSE 2023: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering

Journal ref: In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). IEEE Press, 1907-1919

arXiv:2301.10025 [pdf, other]

Teaching Computer Science Students to Communicate Scientific Findings More Effectively

Authors: Marvin Wyrich, Stefan Wagner

Abstract: Science communication forms the bridge between computer science researchers and their target audience. Researchers who can effectively draw attention to their research findings and communicate them comprehensibly not only help their target audience to actually learn something, but also benefit themselves from the increased visibility of their work and person. However, the necessary skills for good… ▽ More Science communication forms the bridge between computer science researchers and their target audience. Researchers who can effectively draw attention to their research findings and communicate them comprehensibly not only help their target audience to actually learn something, but also benefit themselves from the increased visibility of their work and person. However, the necessary skills for good science communication must also be taught, and this has so far been neglected in the field of software engineering education. We therefore designed and implemented a science communication seminar for bachelor students of computer science curricula. Students take the position of a researcher who, shortly after publication, is faced with having to draw attention to the paper and effectively communicate the contents of the paper to one or more target audiences. Based on this scenario, each student develops a communication strategy for an already published software engineering research paper and tests the resulting ideas with the other seminar participants. We explain our design decisions for the seminar, and combine our experiences with responses to a participant survey into lessons learned. With this experience report, we intend to motivate and enable other lecturers to offer a similar seminar at their university. Collectively, university lecturers can prepare the next generation of computer science researchers to not only be experts in their field, but also to communicate research findings more effectively. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: To be published in the proceedings of 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET '23), May 14-20, 2023, Melbourne, Australia

arXiv:2206.11102 [pdf, other]

doi 10.1145/3626522

40 Years of Designing Code Comprehension Experiments: A Systematic Mapping Study

Authors: Marvin Wyrich, Justus Bogner, Stefan Wagner

Abstract: The relevance of code comprehension in a developer's daily work was recognized more than 40 years ago. Consequently, many experiments were conducted to find out how developers could be supported during code comprehension and which code characteristics contribute to better comprehension. Today, such studies are more common than ever. While this is great for advancing the field, the number of public… ▽ More The relevance of code comprehension in a developer's daily work was recognized more than 40 years ago. Consequently, many experiments were conducted to find out how developers could be supported during code comprehension and which code characteristics contribute to better comprehension. Today, such studies are more common than ever. While this is great for advancing the field, the number of publications makes it difficult to keep an overview. Additionally, designing rigorous code comprehension experiments with human participants is a challenging task, and the multitude of design options can make it difficult for researchers, especially newcomers to the field, to select a suitable design. We therefore conducted a systematic mapping study of 95 source code comprehension experiments published between 1979 and 2019. By structuring the design characteristics of code comprehension studies, we provide a basis for subsequent discussion of the huge diversity of design options in the face of a lack of basic research on their consequences and comparability. We describe what topics have been studied, as well as how these studies have been designed, conducted, and reported. Frequently chosen design options and deficiencies are pointed out to support researchers of all levels of domain expertise in designing their own studies. △ Less

Submitted 2 October, 2023; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: Accepted for publication at ACM Computing Surveys

arXiv:2203.13705 [pdf, other]

doi 10.1145/3524610.3527904

Anchoring Code Understandability Evaluations Through Task Descriptions

Authors: Marvin Wyrich, Lasse Merz, Daniel Graziotin

Abstract: In code comprehension experiments, participants are usually told at the beginning what kind of code comprehension task to expect. Describing experiment scenarios and experimental tasks will influence participants in ways that are sometimes hard to predict and control. In particular, describing or even mentioning the difficulty of a code comprehension task might anchor participants and their percep… ▽ More In code comprehension experiments, participants are usually told at the beginning what kind of code comprehension task to expect. Describing experiment scenarios and experimental tasks will influence participants in ways that are sometimes hard to predict and control. In particular, describing or even mentioning the difficulty of a code comprehension task might anchor participants and their perception of the task itself. In this study, we investigated in a randomized, controlled experiment with 256 participants (50 software professionals and 206 computer science students) whether a hint about the difficulty of the code to be understood in a task description anchors participants in their own code comprehensibility ratings. Subjective code evaluations are a commonly used measure for how well a developer in a code comprehension study understood code. Accordingly, it is important to understand how robust these measures are to cognitive biases such as the anchoring effect. Our results show that participants are significantly influenced by the initial scenario description in their assessment of code comprehensibility. An initial hint of hard to understand code leads participants to assess the code as harder to understand than participants who received no hint or a hint of easy to understand code. This affects students and professionals alike. We discuss examples of design decisions and contextual factors in the conduct of code comprehension experiments that can induce an anchoring effect, and recommend the use of more robust comprehension measures in code comprehension studies to enhance the validity of results. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: 8 pages, 2 figures. To appear in ICPC '22: IEEE/ACM International Conference on Program Comprehension, May 21-22, 2022, Pittsburgh, Pennsylvania, United States

Journal ref: In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension (ICPC 2022). Association for Computing Machinery, New York, NY, USA, 133-140

arXiv:2109.13546 [pdf, other]

Code Comprehension Confounders: A Study of Intelligence and Personal

Authors: Stefan Wagner, Marvin Wyrich

Abstract: Literature and intuition suggest that a developer's intelligence and personality have an impact on their performance in comprehending source code. Researchers made this suggestion in the past when discussing threats to validity of their study results. However, the lack of studies investigating the relationship of intelligence and personality to performance in code comprehension makes scientificall… ▽ More Literature and intuition suggest that a developer's intelligence and personality have an impact on their performance in comprehending source code. Researchers made this suggestion in the past when discussing threats to validity of their study results. However, the lack of studies investigating the relationship of intelligence and personality to performance in code comprehension makes scientifically sound reasoning about their influence difficult. We conduct the first empirical evaluation, a correlational study with undergraduates, to investigate the correlation of intelligence and personality with performance in code comprehension, that is with correctness in answering comprehension questions on code snippets. We found that personality traits are unlikely to impact code comprehension performance, at least not considered in isolation. Conscientiousness, in combination with other factors, however, explains some of the variance in code comprehension performance. For intelligence, significant small to moderate positive effects on code comprehension performance were found for three of four factors measured, i.e., fluid intelligence, visual perception, and cognitive speed. Crystallized intelligence has a positive but statistically insignificant effect on code comprehension performance. According to our results, several intelligence facets as well as the personality trait conscientiousness are potential confounders that should not be neglected in code comprehension studies of individual performance and should be controlled for via an appropriate study design. We call for the conduct of further studies on the relationship between intelligence and personality with code comprehension, in part because code comprehension involves more facets than we can measure in a single study and because our regression model explains only a small portion of the variance in code comprehension performance. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 13 pages, 6 figures

ACM Class: D.2.7

arXiv:2103.03591 [pdf, other]

Bots Don't Mind Waiting, Do They? Comparing the Interaction With Automatically and Manually Created Pull Requests

Authors: Marvin Wyrich, Raoul Ghit, Tobias Haller, Christian Müller

Abstract: As a maintainer of an open source software project, you are usually happy about contributions in the form of pull requests that bring the project a step forward. Past studies have shown that when reviewing a pull request, not only its content is taken into account, but also, for example, the social characteristics of the contributor. Whether a contribution is accepted and how long this takes there… ▽ More As a maintainer of an open source software project, you are usually happy about contributions in the form of pull requests that bring the project a step forward. Past studies have shown that when reviewing a pull request, not only its content is taken into account, but also, for example, the social characteristics of the contributor. Whether a contribution is accepted and how long this takes therefore depends not only on the content of the contribution. What we only have indications for so far, however, is that pull requests from bots may be prioritized lower, even if the bots are explicitly deployed by the development team and are considered useful. One goal of the bot research and development community is to design helpful bots to effectively support software development in a variety of ways. To get closer to this goal, in this GitHub mining study, we examine the measurable differences in how maintainers interact with manually created pull requests from humans compared to those created automatically by bots. About one third of all pull requests on GitHub currently come from bots. While pull requests from humans are accepted and merged in 72.53% of all cases, this applies to only 37.38% of bot pull requests. Furthermore, it takes significantly longer for a bot pull request to be interacted with and for it to be merged, even though they contain fewer changes on average than human pull requests. These results suggest that bots have yet to realize their full potential. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: To be published in Proceedings of 2021 IEEE/ACM International Workshop on Bots in Software Engineering (BotSE)

arXiv:2101.12703 [pdf, other]

doi 10.1109/ICSE-SEIS52602.2021.00011

Résumé-Driven Development: A Definition and Empirical Characterization

Authors: Jonas Fritzsch, Marvin Wyrich, Justus Bogner, Stefan Wagner

Abstract: Technologies play an important role in the hiring process for software professionals. Within this process, several studies revealed misconceptions and bad practices which lead to suboptimal recruitment experiences. In the same context, grey literature anecdotally coined the term Résumé-Driven Development (RDD), a phenomenon describing the overemphasis of trending technologies in both job offerings… ▽ More Technologies play an important role in the hiring process for software professionals. Within this process, several studies revealed misconceptions and bad practices which lead to suboptimal recruitment experiences. In the same context, grey literature anecdotally coined the term Résumé-Driven Development (RDD), a phenomenon describing the overemphasis of trending technologies in both job offerings and resumes as an interaction between employers and applicants. While RDD has been sporadically mentioned in books and online discussions, there are so far no scientific studies on the topic, despite its potential negative consequences. We therefore empirically investigated this phenomenon by surveying 591 software professionals in both hiring (130) and technical (558) roles and identified RDD facets in substantial parts of our sample: 60% of our hiring professionals agreed that trends influence their job offerings, while 82% of our software professionals believed that using trending technologies in their daily work makes them more attractive for prospective employers. Grounded in the survey results, we conceptualize a theory to frame and explain Résumé-Driven Development. Finally, we discuss influencing factors and consequences and propose a definition of the term. Our contribution provides a foundation for future research and raises awareness for a potentially systemic trend that may broadly affect the software industry. △ Less

Submitted 29 January, 2021; originally announced January 2021.

Comments: 10 pages, 5 figures

arXiv:2012.09590 [pdf, other]

doi 10.1109/ICSE43902.2021.00055

The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

Authors: Marvin Wyrich, Andreas Preikschat, Daniel Graziotin, Stefan Wagner

Abstract: Static code analysis tools and integrated development environments present developers with quality-related software metrics, some of which describe the understandability of source code. Software metrics influence overarching strategic decisions that impact the future of companies and the prioritization of everyday software development tasks. Several software metrics, however, lack in validation: w… ▽ More Static code analysis tools and integrated development environments present developers with quality-related software metrics, some of which describe the understandability of source code. Software metrics influence overarching strategic decisions that impact the future of companies and the prioritization of everyday software development tasks. Several software metrics, however, lack in validation: we just choose to trust that they reflect what they are supposed to measure. Some of them were even shown to not measure the quality aspects they intend to measure. Yet, they influence us through biases in our cognitive-driven actions. In particular, they might anchor us in our decisions. Whether the anchoring effect exists with software metrics has not been studied yet. We conducted a randomized and double-blind experiment to investigate the extent to which a displayed metric value for source code comprehensibility anchors developers in their subjective rating of source code comprehensibility, whether performance is affected by the anchoring effect when working on comprehension tasks, and which individual characteristics might play a role in the anchoring effect. We found that the displayed value of a comprehensibility metric has a significant and large anchoring effect on a developer's code comprehensibility rating. The effect does not seem to affect the time or correctness when working on comprehension questions related to the code snippets under study. Since the anchoring effect is one of the most robust cognitive biases, and we have limited understanding of the consequences of the demonstrated manipulation of developers by non-validated metrics, we call for an increased awareness of the responsibility in code quality reporting and for corresponding tools to be based on scientific evidence. △ Less

Submitted 10 February, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

Comments: To appear in: Proceedings of the 43rd International Conference on Software Engineering (ICSE '21), Madrid, Spain, 12 pages. 12 pages, 1 figure. Postprint, after peer review

Journal ref: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021 pp. 512-523

arXiv:2011.06244 [pdf, other]

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Authors: Steffen Herbold, Alexander Trautsch, Benjamin Ledel, Alireza Aghamohammadi, Taher Ahmed Ghaleb, Kuljit Kaur Chahal, Tim Bossenmaier, Bhaveet Nagaria, Philip Makedonski, Matin Nili Ahmadabadi, Kristof Szabados, Helge Spieker, Matej Madeja, Nathaniel Hoy, Valentina Lenarduzzi, Shangwen Wang, Gema Rodríguez-Pérez, Ricardo Colomo-Palacios, Roberto Verdecchia, Paramvir Singh, Yihao Qin, Debasish Chakroborti, Willard Davis, Vijay Walunj, Hongjun Wu , et al. (23 additional authors not shown)

Abstract: Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Metho… ▽ More Context: Tangled commits are changes to software that address multiple concerns at once. For researchers interested in bugs, tangled commits mean that they actually study not only bugs, but also other concerns irrelevant for the study of bugs. Objective: We want to improve our understanding of the prevalence of tangling and the types of changes that are tangled within bug fixing commits. Methods: We use a crowd sourcing approach for manual labeling to validate which changes contribute to bug fixes for each line in bug fixing commits. Each line is labeled by four participants. If at least three participants agree on the same label, we have consensus. Results: We estimate that between 17% and 32% of all changes in bug fixing commits modify the source code to fix the underlying problem. However, when we only consider changes to the production code files this ratio increases to 66% to 87%. We find that about 11% of lines are hard to label leading to active disagreements between participants. Due to confirmed tangling and the uncertainty in our data, we estimate that 3% to 47% of data is noisy without manual untangling, depending on the use case. Conclusion: Tangled commits have a high prevalence in bug fixes and can lead to a large amount of noise in the data. Prior research indicates that this noise may alter results. As researchers, we should be skeptics and assume that unvalidated data is likely very noisy, until proven otherwise. △ Less

Submitted 13 October, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: Status: Accepted at Empirical Software Engineering

arXiv:2007.12520 [pdf, other]

doi 10.1145/3382494.3410636

An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability

Authors: Marvin Muñoz Barón, Marvin Wyrich, Stefan Wagner

Abstract: Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated metrics, which can lead to confusion and code, that is hard to understand, not being identified. Aims: In this work, we validate a metric called Cognitive Compl… ▽ More Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated metrics, which can lead to confusion and code, that is hard to understand, not being identified. Aims: In this work, we validate a metric called Cognitive Complexity which was explicitly designed to measure code understandability and which is already widely used due to its integration in well-known static code analysis tools. Method: We conducted a systematic literature search to obtain data sets from studies which measured code understandability. This way we obtained about 24,000 understandability evaluations of 427 code snippets. We calculated the correlations of these measurements with the corresponding metric values and statistically summarized the correlation coefficients through a meta-analysis. Results: Cognitive Complexity positively correlates with comprehension time and subjective ratings of understandability. The metric showed mixed results for the correlation with the correctness of comprehension tasks and with physiological measures. Conclusions: It is the first validated and solely code-based metric which is able to reflect at least some aspects of code understandability. Moreover, due to its methodology, this work shows that code understanding is currently measured in many different ways, which we also do not know how they are related. This makes it difficult to compare the results of individual studies as well as to develop a metric that measures code understanding in all its facets. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: 12 pages. To be published at ESEM '20: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement

arXiv:2001.02553 [pdf, other]

doi 10.5220/0009168803030310

Perception and Acceptance of an Autonomous Refactoring Bot

Authors: Marvin Wyrich, Regina Hebig, Stefan Wagner, Riccardo Scandariato

Abstract: The use of autonomous bots for automatic support in software development tasks is increasing. In the past, however, they were not always perceived positively and sometimes experienced a negative bias compared to their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring bot for 41 days in a student software development project. In between and at the e… ▽ More The use of autonomous bots for automatic support in software development tasks is increasing. In the past, however, they were not always perceived positively and sometimes experienced a negative bias compared to their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring bot for 41 days in a student software development project. In between and at the end, we conducted semi-structured interviews to find out how developers perceive the bot and whether they are more or less critical when reviewing the contributions of a bot compared to human contributions. Our findings show that the bot was perceived as a useful and unobtrusive contributor, and developers were no more critical of it than they were about their human colleagues, but only a few team members felt responsible for the bot. △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: 8 pages, 2 figures. To be published at 12th International Conference on Agents and Artificial Intelligence (ICAART 2020)

arXiv:1703.10813 [pdf, other]

Improving Communication in Scrum Teams

Authors: Marvin Wyrich, Ivan Bogicevic, Stefan Wagner

Abstract: Communication in teams is an important but difficult issue. In a Scrum development process, we use the Daily Scrum meetings to inform others about important problems, news and events in the project. When persons are absent due to holiday, illness or travel, they miss relevant information because there is no document that protocols the content of these meetings. We present a concept and a Twitter-l… ▽ More Communication in teams is an important but difficult issue. In a Scrum development process, we use the Daily Scrum meetings to inform others about important problems, news and events in the project. When persons are absent due to holiday, illness or travel, they miss relevant information because there is no document that protocols the content of these meetings. We present a concept and a Twitter-like tool that improves communication in a Scrum development process. We take advantage out of the observation that many people do not like to create documentation but they do like to share what they did. We used the tool in industrial practice and observed an improvement in communication. △ Less

Submitted 31 March, 2017; originally announced March 2017.

Showing 1–18 of 18 results for author: Wyrich, M