-
Impermanent Identifiers: Enhanced Source Code Comprehension and Refactoring
Authors:
Eduardo Martins Guerra,
Andre A. S. Ivo,
Fernando O. Pereira,
Romain Robbes,
Andrea Janes,
Fabio Fagundes Silveira
Abstract:
In response to the prevailing challenges in contemporary software development, this article introduces an innovative approach to code augmentation centered around Impermanent Identifiers. The primary goal is to enhance the software development experience by introducing dynamic identifiers that adapt to changing contexts, facilitating more efficient interactions between developers and source code,…
▽ More
In response to the prevailing challenges in contemporary software development, this article introduces an innovative approach to code augmentation centered around Impermanent Identifiers. The primary goal is to enhance the software development experience by introducing dynamic identifiers that adapt to changing contexts, facilitating more efficient interactions between developers and source code, ultimately advancing comprehension, maintenance, and collaboration in software development. Additionally, this study rigorously evaluates the adoption and acceptance of Impermanent Identifiers within the software development landscape. Through a comprehensive empirical examination, we investigate how developers perceive and integrate this approach into their daily programming practices, exploring perceived benefits, potential barriers, and factors influencing its adoption. In summary, this article charts a new course for code augmentation, proposing Impermanent Identifiers as its cornerstone while assessing their feasibility and acceptance among developers. This interdisciplinary research seeks to contribute to the continuous improvement of software development practices and the progress of code augmentation technology.
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
One Microservice per Developer: Is This the Trend in OSS?
Authors:
Dario Amoroso d'Aragona,
Xiaoxhou Li,
Tomas Cerny,
Andrea Janes,
Valentina Lenarduzzi,
Davide Taibi
Abstract:
When developing and managing microservice systems, practitioners suggest that each microservice should be owned by a particular team. In effect, there is only one team with the responsibility to manage a given service. Consequently, one developer should belong to only one team. This practice of "one-microservice-per-developer" is especially prevalent in large projects with an extensive development…
▽ More
When developing and managing microservice systems, practitioners suggest that each microservice should be owned by a particular team. In effect, there is only one team with the responsibility to manage a given service. Consequently, one developer should belong to only one team. This practice of "one-microservice-per-developer" is especially prevalent in large projects with an extensive development team. Based on the bazaar-style software development model of Open Source Projects, in which different programmers, like vendors at a bazaar, offer to help out developing different parts of the system, this article investigates whether we can observe the "one-microservice-per-developer" behavior, a strategy we assume anticipated within microservice based Open Source Projects. We conducted an empirical study among 38 microservice-based OS projects. Our findings indicate that the strategy is rarely respected by open-source developers except for projects that have dedicated DevOps teams.
△ Less
Submitted 5 August, 2023;
originally announced August 2023.
-
On the Empirical Evidence of Microservice Logical Coupling. A Registered Report
Authors:
Dario Amoroso d Aragona,
Luca Pascarella,
Andrea Janes,
Valentina Lenarduzzi,
Rafael Penaloza,
Davide Taibi
Abstract:
[Context] Coupling is a widely discussed metric by software engineers while developing complex software systems, often referred to as a crucial factor and symptom of a poor or good design. Nevertheless, measuring the logical coupling among microservices and analyzing the interactions between services is non-trivial because it demands runtime information in the form of log files, which are not alwa…
▽ More
[Context] Coupling is a widely discussed metric by software engineers while developing complex software systems, often referred to as a crucial factor and symptom of a poor or good design. Nevertheless, measuring the logical coupling among microservices and analyzing the interactions between services is non-trivial because it demands runtime information in the form of log files, which are not always accessible. [Objective and Method] In this work, we propose the design of a study aimed at empirically validating the Microservice Logical Coupling (MLC) metric presented in our previous study. In particular, we plan to empirically study Open Source Systems (OSS) built using a microservice architecture. [Results] The result of this work aims at corroborating the effectiveness and validity of the MLC metric. Thus, we will gather empirical evidence and develop a methodology to analyze and support the claims regarding the MLC metric. Furthermore, we establish its usefulness in evaluating and understanding the logical coupling among microservices.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Breaks and Code Quality: Investigating the Impact of Forgetting on Software Development. A Registered Report
Authors:
Dario Amoroso d'Aragona,
Luca Pascarella,
Andrea Janes,
Valentina Lenarduzzi,
Rafael Penaloza,
Davide Taibi
Abstract:
Developers interrupting their participation in a project might slowly forget critical information about the code, such as its intended purpose, structure, the impact of external dependencies, and the approach used for implementation. Forgetting the implementation details can have detrimental effects on software maintenance, comprehension, knowledge sharing, and developer productivity, resulting in…
▽ More
Developers interrupting their participation in a project might slowly forget critical information about the code, such as its intended purpose, structure, the impact of external dependencies, and the approach used for implementation. Forgetting the implementation details can have detrimental effects on software maintenance, comprehension, knowledge sharing, and developer productivity, resulting in bugs, and other issues that can negatively influence the software development process. Therefore, it is crucial to ensure that developers have a clear understanding of the codebase and can work efficiently and effectively even after long interruptions. This registered report proposes an empirical study aimed at investigating the impact of the developer's activity breaks duration and different code quality properties. In particular, we aim at understanding if the amount of activity in a project impact the code quality, and if developers with different activity profiles show different impacts on code quality. The results might be useful to understand if it is beneficial to promote the practice of developing multiple projects in parallel, or if it is more beneficial to reduce the number of projects each developer contributes.
△ Less
Submitted 28 August, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Early Career Developers' Perceptions of Code Understandability. A Study of Complexity Metrics
Authors:
Matteo Esposito,
Andrea Janes,
Terhi Kilamo,
Valentina Lenarduzzi
Abstract:
Context. Code understandability is fundamental. Developers need to understand the code they are modifying clearly. A low understandability can increase the amount of coding effort, and misinterpreting code impacts the entire development process. Ideally, developers should write clear and understandable code with the least effort. Aim. Our work investigates whether the McCabe Cyclomatic Complexity…
▽ More
Context. Code understandability is fundamental. Developers need to understand the code they are modifying clearly. A low understandability can increase the amount of coding effort, and misinterpreting code impacts the entire development process. Ideally, developers should write clear and understandable code with the least effort. Aim. Our work investigates whether the McCabe Cyclomatic Complexity or the Cognitive Complexity can be a good predictor for the developers' perceived code understandability to understand which of the two complexities can be used as criteria to evaluate if a piece of code is understandable. Method. We designed and conducted an empirical study among 216 early career developers with professional experience ranging from one to four years. We asked them to manually inspect and rate the understandability of 12 Java classes that exhibit different levels of Cyclomatic and Cognitive Complexity. Results. Our findings showed that while the old-fashioned McCabe Cyclomatic Complexity and the most recent Cognitive Complexity are modest predictors for code understandability when considering the complexity perceived by early-career developers, they are not for problem severity. Conclusions. Based on our results, early-career developers should not be left alone when performing code-reviewing tasks due to their scarce experience. Moreover, low complexity measures indicate good understandability, but having either CoC or CyC high makes understandability unpredictable. Nevertheless, there is no evidence that CyC or CoC are indicators of early-career perceived severity.Future research efforts will focus on expanding the population to experienced developers to confront whether seniority influences the predictive power of the chosen metrics.
△ Less
Submitted 15 July, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Lowering Detection in Sport Climbing Based on Orientation of the Sensor Enhanced Quickdraw
Authors:
Sadaf Moaveninejad,
Andrea Janes,
Camillo Porcaro
Abstract:
Tracking climbers' activity to improve services and make the best use of their infrastructure is a concern for climbing gyms. Each climbing session must be analyzed from beginning till lowering of the climber. Therefore, spotting the climbers descending is crucial since it indicates when the ascent has come to an end. This problem must be addressed while preserving privacy and convenience of the c…
▽ More
Tracking climbers' activity to improve services and make the best use of their infrastructure is a concern for climbing gyms. Each climbing session must be analyzed from beginning till lowering of the climber. Therefore, spotting the climbers descending is crucial since it indicates when the ascent has come to an end. This problem must be addressed while preserving privacy and convenience of the climbers and the costs of the gyms. To this aim, a hardware prototype is developed to collect data using accelerometer sensors attached to a piece of climbing equipment mounted on the wall, called quickdraw, that connects the climbing rope to the bolt anchors. The corresponding sensors are configured to be energy-efficient, hence become practical in terms of expenses and time consumption for replacement when using in large quantity in a climbing gym. This paper describes hardware specifications, studies data measured by the sensors in ultra-low power mode, detect sensors' orientation patterns during lowering different routes, and develop an supervised approach to identify lowering.
△ Less
Submitted 15 March, 2024; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Climbing Routes Clustering Using Energy-Efficient Accelerometers Attached to the Quickdraws
Authors:
Sadaf Moaveninejad,
Andrea Janes,
Camillo Porcaro,
Luca Barletta,
Lorenzo Mucchi,
Massimiliano Pierobon
Abstract:
One of the challenges for climbing gyms is to find out popular routes for the climbers to improve their services and optimally use their infrastructure. This problem must be addressed preserving both the privacy and convenience of the climbers and the costs of the gyms. To this aim, a hardware prototype is developed to collect data using accelerometer sensors attached to a piece of climbing equipm…
▽ More
One of the challenges for climbing gyms is to find out popular routes for the climbers to improve their services and optimally use their infrastructure. This problem must be addressed preserving both the privacy and convenience of the climbers and the costs of the gyms. To this aim, a hardware prototype is developed to collect data using accelerometer sensors attached to a piece of climbing equipment mounted on the wall, called quickdraw, that connects the climbing rope to the bolt anchors. The corresponding sensors are configured to be energy-efficient, hence becoming practical in terms of expenses and time consumption for replacement when used in large quantities in a climbing gym. This paper describes hardware specifications, studies data measured by the sensors in ultra-low power mode, detect patterns in data during climbing different routes, and develops an unsupervised approach for route clustering.
△ Less
Submitted 7 March, 2024; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Open Tracing Tools: Overview and Critical Comparison
Authors:
Andrea Janes,
Xiaozhou Li,
Valentina Lenarduzzi
Abstract:
Background. Coping with the rapid growing complexity in contemporary software architecture, tracing has become an increasingly critical practice and been adopted widely by software engineers. By adopting tracing tools, practitioners are able to monitor, debug, and optimize distributed software architectures easily. However, with excessive number of valid candidates, researchers and practitioners h…
▽ More
Background. Coping with the rapid growing complexity in contemporary software architecture, tracing has become an increasingly critical practice and been adopted widely by software engineers. By adopting tracing tools, practitioners are able to monitor, debug, and optimize distributed software architectures easily. However, with excessive number of valid candidates, researchers and practitioners have a hard time finding and selecting the suitable tracing tools by systematically considering their features and advantages.Objective. To such a purpose, this paper aims to provide an overview of popular Open tracing tools via comparison. Method. Herein, we first identified \ra{30} tools in an objective, systematic, and reproducible manner adopting the Systematic Multivocal Literature Review protocol. Then, we characterized each tool looking at the 1) measured features, 2) popularity both in peer-reviewed literature and online media, and 3) benefits and issues. We used topic modeling and sentiment analysis to extract and summarize the benefits and issues. Specially, we adopted ChatGPT to support the topic interpretation. Results. As a result, this paper presents a systematic comparison amongst the selected tracing tools in terms of their features, popularity, benefits and issues. Conclusion. The result mainly shows that each tracing tool provides a unique combination of features with also different pros and cons. The contribution of this paper is to provide the practitioners better understanding of the tracing tools facilitating their adoption.
△ Less
Submitted 23 June, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
CATTO: Just-in-time Test Case Selection and Execution
Authors:
Dario Amoroso d'Aragona,
Fabiano Pecorelli,
Simone Romano,
Giuseppe Scanniello,
Maria Teresa Baldassarre,
Andrea Janes,
Valentina Lenarduzzi
Abstract:
Regression testing ensures a System Under Test (SUT) still works as expected after changes to it. The simplest approach for regression testing consists of re-running the entire test suite against the changed version of the SUT. However, this might result in a time- and resource-consuming process; \eg when dealing with large and/or complex SUTs and test suits. To work around this problem, test Case…
▽ More
Regression testing ensures a System Under Test (SUT) still works as expected after changes to it. The simplest approach for regression testing consists of re-running the entire test suite against the changed version of the SUT. However, this might result in a time- and resource-consuming process; \eg when dealing with large and/or complex SUTs and test suits. To work around this problem, test Case Selection (TCS) strategies can be used. Such strategies seek to build a temporary test suite comprising only those test cases that are relevant to the changes made to the SUT, so avoiding executing those test cases that do not exercise the changed parts. In this paper, we introduce CATTO (Commit Adaptive Tool for Test suite Optimization) and CATTO INTELLIJ PLUGIN. The former is a tool implementing a TCS strategy for SUTs written in Java, while the latter is a wrapper to allow developers to use \toolName directly in IntelliJ. We also conducted a preliminary evaluation of CATTO on seven open-source Java SUTs in terms of reductions in test-suite size, fault-reveling test cases, and fault-detection capability. The results are promising and suggest that CATTO can be of help to developers when performing regression testing. The video demo and the documentation of the tool is available at: \url{https://catto-tool.github.io/}
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Mining Software Repositories with a Collaborative Heuristic Repository
Authors:
Hlib Babii,
Julian Aron Prenner,
Laurin Stricker,
Anjan Karmakar,
Andrea Janes,
Romain Robbes
Abstract:
Many software engineering studies or tasks rely on categorizing software engineering artifacts. In practice, this is done either by defining simple but often imprecise heuristics, or by manual labelling of the artifacts. Unfortunately, errors in these categorizations impact the tasks that rely on them. To improve the precision of these categorizations, we propose to gather heuristics in a collabor…
▽ More
Many software engineering studies or tasks rely on categorizing software engineering artifacts. In practice, this is done either by defining simple but often imprecise heuristics, or by manual labelling of the artifacts. Unfortunately, errors in these categorizations impact the tasks that rely on them. To improve the precision of these categorizations, we propose to gather heuristics in a collaborative heuristic repository, to which researchers can contribute a large amount of diverse heuristics for a variety of tasks on a variety of SE artifacts. These heuristics are then leveraged by state-of-the-art weak supervision techniques to train high-quality classifiers, thus improving the categorizations. We present an initial version of the heuristic repository, which we applied to the concrete task of commit classification.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Improving Predictability of User-Affecting Metrics to Support Anomaly Detection in Cloud Services
Authors:
Vilc Rufino,
Mateus Nogueira,
Alberto Avritzer,
Daniel Menasché,
Barbara Russo,
Andrea Janes,
Vincenzo Ferme,
André Van Hoorn,
Henning Schulz,
Cabral Lima
Abstract:
Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatur…
▽ More
Anomaly detection systems aim to detect and report attacks or unexpected behavior in networked systems. Previous work has shown that anomalies have an impact on system performance, and that performance signatures can be effectively used for implementing an IDS. In this paper, we present an analytical and an experimental study on the trade-off between anomaly detection based on performance signatures and system scalability. The proposed approach combines analytical modeling and load testing to find optimal configurations for the signature-based IDS. We apply a heavy-tail bi-modal modeling approach, where "long" jobs represent large resource consuming transactions, e.g., generated by DDoS attacks; the model was parametrized using results obtained from controlled experiments. For performance purposes, mean response time is the key metric to be minimized, whereas for security purposes, response time variance and classification accuracy must be taken into account. The key insights from our analysis are: (i) there is an optimal number of servers which minimizes the response time variance, (ii) the sweet-spot number of servers that minimizes response time variance and maximizes classification accuracy is typically smaller than or equal to the one that minimizes mean response time. Therefore, for security purposes, it may be worth slightly sacrificing performance to increase classification accuracy.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.
-
Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code
Authors:
Rafael-Michael Karampatsis,
Hlib Babii,
Romain Robbes,
Charles Sutton,
Andrea Janes
Abstract:
Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large…
▽ More
Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large vocabularies and out-of-vocabulary issues severely affect Neural Language Models (NLMs) of source code, degrading their performance and rendering them unable to scale.
In this paper, we address this issue by: 1) studying how various modelling choices impact the resulting vocabulary on a large-scale corpus of 13,362 projects; 2) presenting an open vocabulary source code NLM that can scale to such a corpus, 100 times larger than in previous work; and 3) showing that such models outperform the state of the art on three distinct code corpora (Java, C, Python). To our knowledge, these are the largest NLMs for code that have been reported.
All datasets, code, and trained models used in this work are publicly available.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Modeling Vocabulary for Big Code Machine Learning
Authors:
Hlib Babii,
Andrea Janes,
Romain Robbes
Abstract:
When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling…
▽ More
When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling choices for source code vocabulary, and explores their impact on the resulting vocabulary on a large-scale corpus of 14,436 projects. We show that a subset of decisions have decisive characteristics, allowing to train accurate Neural Language Models quickly on a large corpus of 10,106 projects.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.