-
Swarm Debugging: the Collective Intelligence on Interactive Debugging
Authors:
Fabio Petrillo,
Yann-Gaël Guéhéneuc,
Marcelo Pimenta,
Carla Dal Sasso Freitas,
Foutse Khomh
Abstract:
One of the most important tasks in software maintenance is debugging. To start an interactive debugging session, developers usually set breakpoints in an integrated development environment and navigate through different paths in their debuggers. We started our work by asking} what debugging information is useful to share among developers and study two pieces of information: breakpoints (and their…
▽ More
One of the most important tasks in software maintenance is debugging. To start an interactive debugging session, developers usually set breakpoints in an integrated development environment and navigate through different paths in their debuggers. We started our work by asking} what debugging information is useful to share among developers and study two pieces of information: breakpoints (and their locations) and sessions (debugging paths). To answer our question, we introduce the Swarm Debugging concept to frame the sharing of debugging information, the Swarm Debugging Infrastructure (SDI) with which practitioners and researchers can collect and share data about developers' interactive debugging sessions, and the Swarm Debugging Global View (GV) to display debugging paths. Using the SDI, we conducted a large study with professional developers to understand how developers set breakpoints. Using the GV, we also analyzed professional developers in two studies and collected data about their debugging sessions. Our observations and the answers to our research questions suggest that sharing and visualizing debugging data can support debugging activities.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
On Testing Machine Learning Programs
Authors:
Houssem Ben Braiek,
Foutse Khomh
Abstract:
Nowadays, we are witnessing a wide adoption of Machine learning (ML) models in many safety-critical systems, thanks to recent breakthroughs in deep learning and reinforcement learning. Many people are now interacting with systems based on ML every day, e.g., voice recognition systems used by virtual personal assistants like Amazon Alexa or Google Home. As the field of ML continues to grow, we are…
▽ More
Nowadays, we are witnessing a wide adoption of Machine learning (ML) models in many safety-critical systems, thanks to recent breakthroughs in deep learning and reinforcement learning. Many people are now interacting with systems based on ML every day, e.g., voice recognition systems used by virtual personal assistants like Amazon Alexa or Google Home. As the field of ML continues to grow, we are likely to witness transformative advances in a wide range of areas, from finance, energy, to health and transportation. Given this growing importance of ML-based systems in our daily life, it is becoming utterly important to ensure their reliability. Recently, software researchers have started adapting concepts from the software testing domain (e.g., code coverage, mutation testing, or property-based testing) to help ML engineers detect and correct faults in ML programs. This paper reviews current existing testing practices for ML programs. First, we identify and explain challenges that should be addressed when testing ML programs. Next, we report existing solutions found in the literature for testing ML programs. Finally, we identify gaps in the literature related to the testing of ML programs and make recommendations of future research directions for the scientific community. We hope that this comprehensive review of software testing practices will help ML engineers identify the right approach to improve the reliability of their ML-based systems. We also hope that the research community will act on our proposed research directions to advance the state of the art of testing for ML programs.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Is Fragmentation a Threat to the Success of the Internet of Things?
Authors:
Mohab Aly,
Foutse Khomh,
Yann-Gaël Guéhéneuc,
Hironori Washizaki,
Soumaya Yacout
Abstract:
The current revolution in collaborating distributed things is seen as the first phase of IoT to develop various services. Such collaboration is threatened by the fragmentation found in the industry nowadays as it brings challenges stemming from the difficulty to integrate diverse technologies in system. Diverse networking technologies induce interoperability issues, hence, limiting the possibility…
▽ More
The current revolution in collaborating distributed things is seen as the first phase of IoT to develop various services. Such collaboration is threatened by the fragmentation found in the industry nowadays as it brings challenges stemming from the difficulty to integrate diverse technologies in system. Diverse networking technologies induce interoperability issues, hence, limiting the possibility of reusing the data to develop new services. Different aspects of handling data collection must be available to provide interoperability to the diverse objects interacting; however, such approaches are challenged as they bring substantial performance impairments in settings with the increasing number of collaborating devices/technologies.
△ Less
Submitted 2 August, 2018;
originally announced August 2018.
-
RePOR: Mimicking humans on refactoring tasks. Are we there yet?
Authors:
Rodrigo Morales,
Foutse Khomh,
Giuliano Antoniol
Abstract:
Refactoring is a maintenance activity that aims to improve design quality while preserving the behavior of a system. Several (semi)automated approaches have been proposed to support developers in this maintenance activity, based on the correction of anti-patterns, which are `poor' solutions to recurring design problems. However, little quantitative evidence exists about the impact of automatically…
▽ More
Refactoring is a maintenance activity that aims to improve design quality while preserving the behavior of a system. Several (semi)automated approaches have been proposed to support developers in this maintenance activity, based on the correction of anti-patterns, which are `poor' solutions to recurring design problems. However, little quantitative evidence exists about the impact of automatically refactored code on program comprehension, and in which context automated refactoring can be as effective as manual refactoring. Leveraging RePOR, an automated refactoring approach based on partial order reduction techniques, we performed an empirical study to investigate whether automated refactoring code structure affects the understandability of systems during comprehension tasks. (1) We surveyed 80 developers, asking them to identify from a set of 20 refactoring changes if they were generated by developers or by a tool, and to rate the refactoring changes according to their design quality; (2) we asked 30 developers to complete code comprehension tasks on 10 systems that were refactored by either a freelancer or an automated refactoring tool. To make comparison fair, for a subset of refactoring actions that introduce new code entities, only synthetic identifiers were presented to practitioners. We measured developers' performance using the NASA task load index for their effort, the time that they spent performing the tasks, and their percentages of correct answers. Our findings, despite current technology limitations, show that it is reasonable to expect a refactoring tools to match developer code.
△ Less
Submitted 17 May, 2019; v1 submitted 13 August, 2018;
originally announced August 2018.
-
Is It Safe to Uplift This Patch? An Empirical Study on Mozilla Firefox
Authors:
Marco Castelluccio,
Le An,
Foutse Khomh
Abstract:
In rapid release development processes, patches that fix critical issues, or implement high-value features are often promoted directly from the development channel to a stabilization channel, potentially skipping one or more stabilization channels. This practice is called patch uplift. Patch uplift is risky, because patches that are rushed through the stabilization phase can end up introducing reg…
▽ More
In rapid release development processes, patches that fix critical issues, or implement high-value features are often promoted directly from the development channel to a stabilization channel, potentially skipping one or more stabilization channels. This practice is called patch uplift. Patch uplift is risky, because patches that are rushed through the stabilization phase can end up introducing regressions in the code. This paper examines patch uplift operations at Mozilla, with the aim to identify the characteristics of uplifted patches that introduce regressions. Through statistical and manual analyses, we quantitatively and qualitatively investigate the reasons behind patch uplift decisions and the characteristics of uplifted patches that introduced regressions. Additionally, we interviewed three Mozilla release managers to understand organizational factors that affect patch uplift decisions and outcomes. Results show that most patches are uplifted because of a wrong functionality or a crash. Uplifted patches that lead to faults tend to have larger patch size, and most of the faults are due to semantic or memory errors in the patches. Also, release managers are more inclined to accept patch uplift requests that concern certain specific components, and-or that are submitted by certain specific developers.
△ Less
Submitted 26 September, 2017;
originally announced September 2017.
-
An App Performance Optimization Advisor for Mobile Device App Marketplaces
Authors:
Rubén Saborido,
Foutse Khomh,
Abram Hindle,
Enrique Alba
Abstract:
On mobile phones, users and developers use apps official marketplaces serving as repositories of apps. The Google Play Store and Apple Store are the official marketplaces of Android and Apple products which offer more than a million apps. Although both repositories offer description of apps, information concerning performance is not available. Due to the constrained hardware of mobile devices, use…
▽ More
On mobile phones, users and developers use apps official marketplaces serving as repositories of apps. The Google Play Store and Apple Store are the official marketplaces of Android and Apple products which offer more than a million apps. Although both repositories offer description of apps, information concerning performance is not available. Due to the constrained hardware of mobile devices, users and developers have to meticulously manage the resources available and they should be given access to performance information about apps. Even if this information was available, the selection of apps would still depend on user preferences and it would require a huge cognitive effort to make optimal decisions. Considering this fact we propose APOA, a recommendation system which can be implemented in any marketplace for helping users and developers to compare apps in terms of performance.
APOA uses as input metric values of apps and a set of metrics to optimize. It solves an optimization problem and it generates optimal sets of apps for different user's context. We show how APOA works over an Android case study. Out of 140 apps, we define typical usage scenarios and we collect measurements of power, CPU, memory, and network usages to demonstrate the benefit of using APOA.
△ Less
Submitted 20 May, 2018; v1 submitted 13 September, 2017;
originally announced September 2017.
-
Stack Overflow: A Code Laundering Platform?
Authors:
Le An,
Ons Mlouki,
Foutse Khomh,
Giuliano Antoniol
Abstract:
Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing c…
▽ More
Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.
△ Less
Submitted 10 March, 2017;
originally announced March 2017.
-
Comprehension of Ads-supported and Paid Android Applications: Are They Different?
Authors:
Rubén Saborido,
Foutse Khomh,
Yann-Gaël Guéhéneuc,
Giuliano Antoniol
Abstract:
The Android market is a place where developers offer paid and-or free apps to users. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and-or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and-or remove ads. While paid app…
▽ More
The Android market is a place where developers offer paid and-or free apps to users. Free apps are interesting to users because they can try them immediately without incurring a monetary cost. However, free apps often have limited features and-or contain ads when compared to their paid counterparts. Thus, users may eventually need to pay to get additional features and-or remove ads. While paid apps have clear market values, their ads-supported versions are not entirely free because ads have an impact on performance.
In this paper, first, we perform an exploratory study about ads-supported and paid apps to understand their differences in terms of implementation and development process. We analyze 40 Android apps and we observe that (i) ads-supported apps are preferred by users although paid apps have a better rating, (ii) developers do not usually offer a paid app without a corresponding free version, (iii) ads-supported apps usually have more releases and are released more often than their corresponding paid versions, (iv) there is no a clear strategy about the way developers set prices of paid apps, (v) paid apps do not usually include more functionalities than their corresponding ads-supported versions, (vi) developers do not always remove ad networks in paid versions of their ads-supported apps, and (vii) paid apps require less permissions than ads-supported apps. Second, we carry out an experimental study to compare the performance of ads-supported and paid apps and we propose four equations to estimate the cost of ads-supported apps. We obtain that (i) ads-supported apps use more resources than their corresponding paid versions with statistically significant differences and (ii) paid apps could be considered a most cost-effective choice for users because their cost can be amortized in a short period of time, depending on their usage.
△ Less
Submitted 8 March, 2017;
originally announced March 2017.
-
Anti-patterns and the energy efficiency of Android applications
Authors:
Rodrigo Morales,
Ruben Saborido,
Foutse Khomh,
Francisco Chicano,
Giuliano Antoniol
Abstract:
The boom in mobile apps has changed the traditional landscape of software development by introducing new challenges due to the limited resources of mobile devices, e.g., memory, CPU, network bandwidth and battery. The energy consumption of mobile apps is nowadays a hot topic and researchers are actively investigating the role of coding practices on energy efficiency. Recent studies suggest that de…
▽ More
The boom in mobile apps has changed the traditional landscape of software development by introducing new challenges due to the limited resources of mobile devices, e.g., memory, CPU, network bandwidth and battery. The energy consumption of mobile apps is nowadays a hot topic and researchers are actively investigating the role of coding practices on energy efficiency. Recent studies suggest that design quality can conflict with energy efficiency. Therefore, it is important to take into account energy efficiency when evolving the design of a mobile app. The research community has proposed approaches to detect and remove anti-patterns (i.e., poor solutions to design and implementation problems) in software systems but, to the best of our knowledge, none of these approaches have included anti-patterns that are specific to mobile apps and--or considered the energy efficiency of apps. In this paper, we fill this gap in the literature by analyzing the impact of eight type of anti-patterns on a testbed of 59 android apps extracted from F-Droid. First, we (1) analyze the impact of anti-patterns in mobile apps with respect to energy efficiency; then (2) we study the impact of different types of anti-patterns on energy efficiency. We found that then energy consumption of apps containing anti-patterns and not (refactored apps) is statistically different. Moreover, we find that the impact of refactoring anti-patterns can be positive (7 type of anti-patterns) or negative (2 type of anti-patterns). Therefore, developers should consider the impact on energy efficiency of refactoring when applying maintenance activities.
△ Less
Submitted 19 October, 2016; v1 submitted 18 October, 2016;
originally announced October 2016.
-
ATLAS: An Adaptive Failure-aware Scheduler for Hadoop
Authors:
Mbarka Soualhia,
Foutse Khomh,
Sofiene Tahar
Abstract:
Hadoop has become the de facto standard for processing large data in today's cloud environment. The performance of Hadoop in the cloud has a direct impact on many important applications ranging from web analytic, web indexing, image and document processing to high-performance scientific computing. However, because of the scale, complexity and dynamic nature of the cloud, failures are common and th…
▽ More
Hadoop has become the de facto standard for processing large data in today's cloud environment. The performance of Hadoop in the cloud has a direct impact on many important applications ranging from web analytic, web indexing, image and document processing to high-performance scientific computing. However, because of the scale, complexity and dynamic nature of the cloud, failures are common and these failures often impact the performance of jobs running in Hadoop. Although Hadoop possesses built-in failure detection and recovery mechanisms, several scheduled jobs still fail because of unforeseen events in the cloud environment. A single task failure can cause the failure of the whole job and unpredictable job running times. In this report, we propose ATLAS (AdapTive faiLure-Aware Scheduler), a new scheduler for Hadoop that can adapt its scheduling decisions to events occurring in the cloud environment. Using statistical models, ATLAS predicts task failures and adjusts its scheduling decisions on the fly to reduce task failure occurrences. We implement ATLAS in the Hadoop framework of Amazon Elastic MapReduce (EMR) and perform a case study to compare its performance with those of the FIFO, Fair and Capacity schedulers. Results show that ATLAS can reduce the percentage of failed jobs by up to 28% and the percentage of failed tasks by up to 39%, and the total execution time of jobs by 10 minutes on average. ATLAS also reduces CPU and memory usages.
△ Less
Submitted 5 November, 2015; v1 submitted 4 November, 2015;
originally announced November 2015.
-
Predicting Scheduling Failures in the Cloud
Authors:
Mbarka Soualhia,
Foutse Khomh,
Sofiene Tahar
Abstract:
Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud compu…
▽ More
Cloud Computing has emerged as a key technology to deliver and manage computing, platform, and software services over the Internet. Task scheduling algorithms play an important role in the efficiency of cloud computing services as they aim to reduce the turnaround time of tasks and improve resource utilization. Several task scheduling algorithms have been proposed in the literature for cloud computing systems, the majority relying on the computational complexity of tasks and the distribution of resources. However, several tasks scheduled following these algorithms still fail because of unforeseen changes in the cloud environments. In this paper, using tasks execution and resource utilization data extracted from the execution traces of real world applications at Google, we explore the possibility of predicting the scheduling outcome of a task using statistical models. If we can successfully predict tasks failures, we may be able to reduce the execution time of jobs by rescheduling failed tasks earlier (i.e., before their actual failing time). Our results show that statistical models can predict task failures with a precision up to 97.4%, and a recall up to 96.2%. We simulate the potential benefits of such predictions using the tool kit GloudSim and found that they can improve the number of finished tasks by up to 40%. We also perform a case study using the Hadoop framework of Amazon Elastic MapReduce (EMR) and the jobs of a gene expression correlations analysis study from breast cancer research. We find that when extending the scheduler of Hadoop with our predictive models, the percentage of failed jobs can be reduced by up to 45%, with an overhead of less than 5 minutes.
△ Less
Submitted 13 July, 2015;
originally announced July 2015.