Skip to main content

Showing 51–100 of 111 results for author: Khomh, F

  1. Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

    Authors: Arghavan Moradi Dakhel, Michel C. Desmarais, Foutse Khomh

    Abstract: Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's do… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 30 pages, 5 figures

  2. arXiv:2207.00091  [pdf, other

    cs.CR cs.AI cs.LG

    Threat Assessment in Machine Learning based Systems

    Authors: Lionel Nganyewou Tidjon, Foutse Khomh

    Abstract: Machine learning is a field of artificial intelligence (AI) that is becoming essential for several critical systems, making it a good target for threat actors. Threat actors exploit different Tactics, Techniques, and Procedures (TTPs) against the confidentiality, integrity, and availability of Machine Learning (ML) systems. During the ML cycle, they exploit adversarial TTPs to poison data and fool… ▽ More

    Submitted 30 June, 2022; originally announced July 2022.

  3. arXiv:2206.15331  [pdf, other

    cs.SE cs.LG

    GitHub Copilot AI pair programmer: Asset or Liability?

    Authors: Arghavan Moradi Dakhel, Vahid Majdinasab, Amin Nikanjam, Foutse Khomh, Michel C. Desmarais, Zhen Ming, Jiang

    Abstract: Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it e… ▽ More

    Submitted 14 April, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: 27 pages, 8 figures

  4. arXiv:2206.14322  [pdf, other

    cs.LG

    An Empirical Study of Challenges in Converting Deep Learning Models

    Authors: Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse Khomh, Zhen Ming, Jiang

    Abstract: There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually DL models are developed and trained using DL frameworks that have their own internal mechanisms/formats to represent and train DL models, and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted for publication in ICSME 2022

  5. arXiv:2206.12311  [pdf, other

    cs.SE cs.LG

    Bugs in Machine Learning-based Systems: A Faultload Benchmark

    Authors: Mohammad Mehdi Morovati, Amin Nikanjam, Foutse Khomh, Zhen Ming, Jiang

    Abstract: The rapid escalation of applying Machine Learning (ML) in various domains has led to paying more attention to the quality of ML components. There is then a growth of techniques and tools aiming at improving the quality of ML components and integrating them into the ML-based system safely. Although most of these tools use bugs' lifecycle, there is no standard benchmark of bugs to assess their perfo… ▽ More

    Submitted 16 January, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

  6. arXiv:2206.11981  [pdf, other

    cs.AI cs.CY

    Never trust, always verify : a roadmap for Trustworthy AI?

    Authors: Lionel Nganyewou Tidjon, Foutse Khomh

    Abstract: Artificial Intelligence (AI) is becoming the corner stone of many systems used in our daily lives such as autonomous vehicles, healthcare systems, and unmanned aircraft systems. Machine Learning is a field of AI that enables systems to learn from data and make decisions on new data based on models to achieve a given goal. The stochastic nature of AI models makes verification and validation tasks c… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  7. arXiv:2206.03225  [pdf, other

    cs.CY cs.AI

    The Different Faces of AI Ethics Across the World: A Principle-Implementation Gap Analysis

    Authors: Lionel Nganyewou Tidjon, Foutse Khomh

    Abstract: Artificial Intelligence (AI) is transforming our daily life with several applications in healthcare, space exploration, banking and finance. These rapid progresses in AI have brought increasing attention to the potential impacts of AI technologies on society, with ethically questionable consequences. In recent years, several ethical principles have been released by governments, national and intern… ▽ More

    Submitted 12 May, 2022; originally announced June 2022.

  8. Studying the Practices of Deploying Machine Learning Projects on Docker

    Authors: Moses Openja, Forough Majidi, Foutse Khomh, Bhagya Chembakottu, Heng Li

    Abstract: Docker is a containerization service that allows for convenient deployment of websites, databases, applications' APIs, and machine learning (ML) models with a few lines of code. Studies have recently explored the use of Docker for deploying general software projects with no specific focus on how Docker is used to deploy ML-based projects. In this study, we conducted an exploratory study to under… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Journal ref: The International Conference on Evaluation and Assessment in Software Engineering 2022 (EASE 2022), June 13--15, 2022, Gothenburg, Sweden

  9. arXiv:2206.00666  [pdf, other

    cs.SE

    Technical Debts and Faults in Open-source Quantum Software Systems: An Empirical Study

    Authors: Moses Openja, Mohammad Mehdi Morovati, Le An, Foutse Khomh, Mouna Abidi

    Abstract: Quantum computing is a rapidly growing field attracting the interest of both researchers and software developers. Supported by its numerous open-source tools, developers can now build, test, or run their quantum algorithms. Although the maintenance practices for traditional software systems have been extensively studied, the maintenance of quantum software is still a new field of study but a criti… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  10. arXiv:2205.15419  [pdf, other

    cs.LG

    Fool SHAP with Stealthily Biased Sampling

    Authors: Gabriel Laberge, Ulrich Aïvodji, Satoshi Hara, Mario Marchand., Foutse Khomh

    Abstract: SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a specific input versus a background distribution. Recent studies have shown that they can be manipulated by malicious adversaries to produce arbitrary desired explanations. However, existing attacks focus solely on altering the black-box model itself. In this paper, we propose a comple… ▽ More

    Submitted 3 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  11. arXiv:2205.03181  [pdf, other

    cs.SE

    Understanding Quantum Software Engineering Challenges An Empirical Study on Stack Exchange Forums and GitHub Issues

    Authors: Mohamed Raed El aoun, Heng Li, Foutse Khomh, Moses Openja

    Abstract: With the advance in quantum computing, quantum software becomes critical for exploring the full potential of quantum computing systems. Recently, quantum software engineering (QSE) becomes an emerging area attracting more and more attention. However, it is not clear what are the challenges and opportunities of quantum computing facing the software engineering community. This work aims to understan… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  12. arXiv:2204.11965  [pdf, other

    cs.SE

    Bug Characteristics in Quantum Software Ecosystem

    Authors: Mohamed Raed El aoun, Heng Li, Foutse Khomh, Lionel Tidjon

    Abstract: With the advance in quantum computing in recent years, quantum software becomes vital for exploring the full potential of quantum computing systems. Quantum programming is different from classical programming, for example, the state of a quantum program is probabilistic in nature, and a quantum computer is error-prone due to the instability of quantum mechanisms. Therefore, the characteristics of… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  13. arXiv:2204.00694  [pdf, other

    cs.SE cs.AI cs.LG

    Testing Feedforward Neural Networks Training Programs

    Authors: Houssem Ben Braiek, Foutse Khomh

    Abstract: Nowadays, we are witnessing an increasing effort to improve the performance and trustworthiness of Deep Neural Networks (DNNs), with the aim to enable their adoption in safety critical systems such as self-driving cars. Multiple testing techniques are proposed to generate test cases that can expose inconsistencies in the behavior of DNN models. These techniques assume implicitly that the training… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

  14. arXiv:2203.12138  [pdf, other

    cs.NE

    A Search-Based Framework for Automatic Generation of Testing Environments for Cyber-Physical Systems

    Authors: Dmytro Humeniuk, Foutse Khomh, Giuliano Antoniol

    Abstract: Many modern cyber physical systems incorporate computer vision technologies, complex sensors and advanced control software, allowing them to interact with the environment autonomously. Testing such systems poses numerous challenges: not only should the system inputs be varied, but also the surrounding environment should be accounted for. A number of tools have been developed to test the system mod… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  15. arXiv:2202.03270  [pdf, other

    cs.SE

    Do Developers Refactor Data Access Code? An Empirical Study

    Authors: Biruk Asmare Muse, Foutse Khomh, Giuliano Antoniol

    Abstract: Developers often refactor code to improve the maintainability and comprehension of the software. There are many studies on refactoring activities in traditional software systems. However, refactoring in data-intensive systems is not well explored. Understanding the refactoring practices of developers is important to develop efficient tool support.We conducted a longitudinal study of refactoring ac… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: 29th IEEE International Conference on Software Analysis, Evolution and Reengineering

  16. arXiv:2201.02215  [pdf, other

    cs.SE

    On the Prevalence, Impact, and Evolution of SQL Code Smells in Data-Intensive Systems

    Authors: Biruk Asmare Muse, Mohammad Masudur Rahman, Csaba Nagy, Anthony Cleve, Foutse Khomh, Giuliano Antoniol

    Abstract: Code smells indicate software design problems that harm software quality. Data-intensive systems that frequently access databases often suffer from SQL code smells besides the traditional smells. While there have been extensive studies on traditional code smells, recently, there has been a growing interest in SQL code smells. In this paper, we conduct an empirical study to investigate the prevalen… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Journal ref: In Proceedings of the 17th International Conference on Mining Software Repositories (pp. 327-338) 2020

  17. arXiv:2201.02180  [pdf, other

    cs.SE

    FIXME: Synchronize with Database An Empirical Study of Data Access Self-Admitted Technical Debt

    Authors: Biruk Asmare Muse, Csaba Nagy, Anthony Cleve, Foutse Khomh, Giuliano Antoniol

    Abstract: Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages.… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  18. arXiv:2112.15277  [pdf, other

    cs.SE cs.LG

    Machine Learning Application Development: Practitioners' Insights

    Authors: Md Saidur Rahman, Foutse Khomh, Alaleh Hamidi, Jinghui Cheng, Giuliano Antoniol, Hironori Washizaki

    Abstract: Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

  19. arXiv:2112.13314  [pdf, other

    cs.SE cs.LG

    Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

    Authors: Florian Tambon, Amin Nikanjam, Le An, Foutse Khomh, Giuliano Antoniol

    Abstract: Deep Learning (DL) frameworks are now widely used, simplifying the creation of complex models as well as their integration to various applications even to non DL experts. However, like any other programs, they are prone to bugs. This paper deals with the subcategory of bugs named silent bugs: they lead to wrong behavior but they do not cause system crashes or hangs, nor show an error message to th… ▽ More

    Submitted 1 September, 2023; v1 submitted 25 December, 2021; originally announced December 2021.

  20. arXiv:2111.07101  [pdf

    cs.SE

    Reputation Gaming in Stack Overflow

    Authors: Iren Mazloomzadeh, Gias Udin, Foutse Khomh, Ashkan Sami

    Abstract: Stack Overflow incentive system awards users with reputation scores to ensure quality. The decentralized nature of the forum may make the incentive system prone to manipulation. This paper offers, for the first time, a comprehensive study of the reported types of reputation manipulation scenarios that might be exercised in Stack Overflow and the prevalence of such reputation gamers by qualitative… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  21. arXiv:2111.04865  [pdf, other

    cs.LG

    On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods

    Authors: Paulina Stevia Nouwou Mindom, Amin Nikanjam, Foutse Khomh, John Mullins

    Abstract: The increasing adoption of Reinforcement Learning in safety-critical systems domains such as autonomous vehicles, health, and aviation raises the need for ensuring their safety. Existing safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed. Those disturbances include moving adversaries w… ▽ More

    Submitted 9 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

  22. arXiv:2111.03196  [pdf, other

    cs.SE cs.LG

    An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets

    Authors: Gias Uddin, Yann-Gael Gueheneuc, Foutse Khomh, Chanchal K Roy

    Abstract: Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of developing an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sen… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM), 2021

  23. arXiv:2110.13369  [pdf, other

    cs.LG

    Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set

    Authors: Gabriel Laberge, Yann Pequignot, Alexandre Mathieu, Foutse Khomh, Mario Marchand

    Abstract: Post-hoc global/local feature attribution methods are progressively being employed to understand the decisions of complex machine learning models. Yet, because of limited amounts of data, it is possible to obtain a diversity of models with good empirical performance but that provide very different explanations for the same prediction, making it hard to derive insight from them. In this work, inste… ▽ More

    Submitted 28 December, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

    Journal ref: Journal of Machine Learning Research, 2023, vol. 24, no 364, p. 1-50

  24. Failure Analysis of Hadoop Schedulers using an Integration of Model Checking and Simulation

    Authors: Mbarka Soualhia, Foutse Khomh, Sofiene Tahar

    Abstract: The Hadoop scheduler is a centerpiece of Hadoop, the leading processing framework for data-intensive applications in the cloud. Given the impact of failures on the performance of applications running on Hadoop, testing and verifying the performance of the Hadoop scheduler is critical. Existing approaches such as performance simulation and analytical modeling are inadequate because they are not abl… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: In Proceedings SCSS 2021, arXiv:2109.02501

    Journal ref: EPTCS 342, 2021, pp. 114-128

  25. arXiv:2109.03991  [pdf, other

    cs.SE cs.LG

    The challenge of reproducible ML: an empirical study on the impact of bugs

    Authors: Emilio Rivera-Landos, Foutse Khomh, Amin Nikanjam

    Abstract: Reproducibility is a crucial requirement in scientific research. When results of research studies and scientific papers have been found difficult or impossible to reproduce, we face a challenge which is called reproducibility crisis. Although the demand for reproducibility in Machine Learning (ML) is acknowledged in the literature, a main barrier is inherent non-determinism in ML training and infe… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  26. arXiv:2108.05341  [pdf, other

    cs.SE cs.IR cs.LG

    The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study

    Authors: Mohammad Masudur Rahman, Foutse Khomh, Shamima Yeasmin, Chanchal K. Roy

    Abstract: Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as searc… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: 57 pages, EMSE (2021)

    ACM Class: D.2; D.2.5; D.2.7

  27. Why are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion

    Authors: Mohammad Masudur Rahman, Foutse Khomh, Marco Castelluccio

    Abstract: Software developers attempt to reproduce software bugs to understand their erroneous behaviours and to fix them. Unfortunately, they often fail to reproduce (or fix) them, which leads to faulty, unreliable software systems. However, to date, only a little research has been done to better understand what makes the software bugs non-reproducible. In this paper, we conduct a multimodal study to bette… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: 12 pages

    ACM Class: D.2; D.2.5; D.2.7

    Journal ref: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)

  28. arXiv:2108.02702  [pdf, other

    cs.SE

    Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score

    Authors: Rodrigo F. Silva, M. Masudur Rahman, Carlos Eduardo Dantas, Chanchal Roy, Foutse Khomh, Marcelo A. Maia

    Abstract: Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 31 pages, 5 figures, 9 tables

  29. arXiv:2107.13614  [pdf, other

    cs.SE

    Clones in Deep Learning Code: What, Where, and Why?

    Authors: Hadhemi Jebnoun, Md Saidur Rahman, Foutse Khomh, Biruk Asmare Muse

    Abstract: Deep Learning applications are becoming increasingly popular. Developers of deep learning systems strive to write more efficient code. Deep learning systems are constantly evolving, imposing tighter development timelines and increasing complexity, which may lead to bad design decisions. A copy-paste approach is widely used among deep learning developers because they rely on common frameworks and d… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  30. arXiv:2107.13491  [pdf, other

    cs.LG cs.SE

    Models of Computational Profiles to Study the Likelihood of DNN Metamorphic Test Cases

    Authors: Ettore Merlo, Mira Marhaba, Foutse Khomh, Houssem Ben Braiek, Giuliano Antoniol

    Abstract: Neural network test cases are meant to exercise different reasoning paths in an architecture and used to validate the prediction outcomes. In this paper, we introduce "computational profiles" as vectors of neuron activation levels. We investigate the distribution of computational profile likelihood of metamorphic test cases with respect to the likelihood distributions of training, test and error c… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: 9 pages (10 pages with ref.)

    Journal ref: Published in iMLSE 2020 2nd International Workshop on Machine Learning Systems Engineering https://sig-mlse.wixsite.com/imlse2020

  31. How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

    Authors: Florian Tambon, Gabriel Laberge, Le An, Amin Nikanjam, Paulina Stevia Nouwou Mindom, Yann Pequignot, Foutse Khomh, Giulio Antoniol, Ettore Merlo, François Laviolette

    Abstract: Context: Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called 'safety-critical' systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches. Objective: This paper aims to elucidate challenges related to the certifica… ▽ More

    Submitted 1 December, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

    Comments: 60 pages (92 pages with references and complements), submitted to a journal (Automated Software Engineering). Changes: Emphasizing difference traditional software engineering / ML approach. Adding Related Works, Threats to Validity and Complementary Materials. Adding a table listing papers reference for each section/subsections

    Journal ref: Autom Softw Eng 29, 38 (2022)

  32. arXiv:2107.04863  [pdf, other

    cs.LG cs.SE

    HOMRS: High Order Metamorphic Relations Selector for Deep Neural Networks

    Authors: Florian Tambon, Giulio Antoniol, Foutse Khomh

    Abstract: Deep Neural Networks (DNN) applications are increasingly becoming a part of our everyday life, from medical applications to autonomous cars. Traditional validation of DNN relies on accuracy measures, however, the existence of adversarial examples has highlighted the limitations of these accuracy measures, raising concerns especially when DNN are integrated into safety-critical systems. In this p… ▽ More

    Submitted 21 December, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

    Comments: 33 pages

  33. arXiv:2107.02279  [pdf, other

    cs.SE cs.LG

    Design Smells in Deep Learning Programs: An Empirical Study

    Authors: Amin Nikanjam, Foutse Khomh

    Abstract: Nowadays, we are witnessing an increasing adoption of Deep Learning (DL) based software systems in many industries. Designing a DL program requires constructing a deep neural network (DNN) and then training it on a dataset. This process requires that developers make multiple architectural (e.g., type, size, number, and order of layers) and configuration (e.g., optimizer, regularization methods, an… ▽ More

    Submitted 7 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted for publication by ICSME 2021

  34. arXiv:2105.08095  [pdf, other

    cs.SE cs.LG

    Automatic Fault Detection for Deep Learning Programs Using Graph Transformations

    Authors: Amin Nikanjam, Houssem Ben Braiek, Mohammad Mehdi Morovati, Foutse Khomh

    Abstract: Nowadays, we are witnessing an increasing demand in both corporates and academia for exploiting Deep Learning (DL) to solve complex real-world problems. A DL program encodes the network structure of a desirable DL model and the process by which the model learns from the training dataset. Like any software, a DL program can be faulty, which implies substantial challenges of software quality assuran… ▽ More

    Submitted 30 May, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

  35. arXiv:2104.00058  [pdf, other

    cs.SE

    Investigating Design Anti-pattern and Design Pattern Mutations and Their Change- and Fault-proneness

    Authors: Zeinab, Kermansaravi, Md Saidur Rahman, Foutse Khomh, Fehmi Jaafar, Yann-Gael Gueheneuc

    Abstract: During software evolution, inexperienced developers may introduce design anti-patterns when they modify their software systems to fix bugs or to add new functionalities based on changes in requirements. Developers may also use design patterns to promote software quality or as a possible cure for some design anti-patterns. Thus, design patterns and design anti-patterns are introduced, removed, and… ▽ More

    Submitted 31 March, 2021; originally announced April 2021.

  36. arXiv:2102.11491  [pdf, other

    cs.CR cs.NE

    Data Driven Testing of Cyber Physical Systems

    Authors: Dmytro Humeniuk, Giuliano Antoniol, Foutse Khomh

    Abstract: Consumer grade cyber-physical systems (CPS) are becoming an integral part of our life, automatizing and simplifying everyday tasks. Indeed, due to complex interactions between hardware, networking and software, developing and testing such systems is known to be a challenging task. Various quality assurance and testing strategies have been proposed. The most common approach for pre-deployment testi… ▽ More

    Submitted 23 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: 4 pages, to be published in SBST2021 workshop proceedings

  37. arXiv:2102.08874  [pdf, other

    cs.SE

    Mining API Usage Scenarios from Stack Overflow

    Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

    Abstract: We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

    Journal ref: 2020 Information and Software Technology (IST)

  38. arXiv:2102.08502  [pdf, other

    cs.SE

    Automatic API Usage Scenario Documentation from Technical Q&A Sites

    Authors: Gias Uddin, Foutse Khomh, Chanchal K Roy

    Abstract: The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Journal ref: 2021 ACM Transactions on Software Engineering and Methodology (TOSEM)

  39. arXiv:2102.08495  [pdf, other

    cs.SE

    Understanding How and Why Developers Seek and Analyze API-related Opinions

    Authors: Gias Uddin, Olga Baysal, Latifa Guerrouj, Foutse Khomh

    Abstract: With the advent and proliferation of online developer forums as informal documentation, developers often share their opinions about the APIs they use. Thus, opinions of others often shape the developer's perception and decisions related to software development. For example, the choice of an API or how to reuse the functionality the API offers are, to a considerable degree, conditioned upon what ot… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Journal ref: 2019 IEEE Transactions on Software Engineering (TSE)

  40. arXiv:2101.00135  [pdf, other

    cs.SE cs.LG

    Faults in Deep Reinforcement Learning Programs: A Taxonomy and A Detection Approach

    Authors: Amin Nikanjam, Mohammad Mehdi Morovati, Foutse Khomh, Houssem Ben Braiek

    Abstract: A growing demand is witnessed in both industry and academia for employing Deep Learning (DL) in various domains to solve real-world problems. Deep Reinforcement Learning (DRL) is the application of DL in the domain of Reinforcement Learning (RL). Like any software systems, DRL applications can fail because of faults in their programs. In this paper, we present the first attempt to categorize fault… ▽ More

    Submitted 28 November, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

  41. arXiv:2010.14331  [pdf, other

    cs.SE

    Are Multi-language Design Smells Fault-prone? An Empirical Study

    Authors: Mouna Abidi, Md Saidur Rahman, Moses Openja, Foutse Khomh

    Abstract: Nowadays, modern applications are developed using components written in different programming languages. These systems introduce several advantages. However, as the number of languages increases, so does the challenges related to the development and maintenance of these systems. In such situations, developers may introduce design smells (i.e., anti-patterns and code smells) which are symptoms of p… ▽ More

    Submitted 2 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM) 2020

  42. A Large Scale Empirical Study of the Impact of Spaghetti Code and Blob Anti-patterns on Program Comprehension

    Authors: Cristiano Politowski, Foutse Khomh, Simone Romano, Giuseppe Scanniello, Fabio Petrillo, Yann-Gaël Guéhéneuc, Abdou Maiga

    Abstract: Context: Several studies investigated the impact of anti-patterns (i.e., "poor" solutions to recurring design problems) during maintenance activities and reported that anti-patterns significantly affect the developers' effort required to edit files. However, before developers edit files, they must understand the source code of the systems. This source code must be easy to understand by developers.… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

  43. arXiv:1912.09303  [pdf, other

    cs.CR cs.LG stat.ML

    SIGMA : Strengthening IDS with GAN and Metaheuristics Attacks

    Authors: Simon Msika, Alejandro Quintero, Foutse Khomh

    Abstract: An Intrusion Detection System (IDS) is a key cybersecurity tool for network administrators as it identifies malicious traffic and cyberattacks. With the recent successes of machine learning techniques such as deep learning, more and more IDS are now using machine learning algorithms to detect attacks faster. However, these systems lack robustness when facing previously unseen types of attacks. Wit… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: 11 pages, 6 figures

  44. Deep Learning Anti-patterns from Code Metrics History

    Authors: Antoine Barbez, Foutse Khomh, Yann-Gaël Guéhéneuc

    Abstract: Anti-patterns are poor solutions to recurring design problems. Number of empirical studies have highlighted the negative impact of anti-patterns on software maintenance which motivated the development of various detection techniques. Most of these approaches rely on structural metrics of software systems to identify affected components while others exploit historical information by analyzing co-ch… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Preprint. Paper accepted for inclusion in the Research Track of the 35th IEEE International Conference on Software Maintenance and Evolution (ICSME 2019), Cleveland, Ohio, USA

  45. arXiv:1910.04736  [pdf, other

    cs.SE cs.LG

    Studying Software Engineering Patterns for Designing Machine Learning Systems

    Authors: Hironori Washizaki, Hiromu Uchida, Foutse Khomh, Yann-Gael Gueheneuc

    Abstract: Machine-learning (ML) techniques have become popular in the recent years. ML techniques rely on mathematics and on software engineering. Researchers and practitioners studying best practices for designing ML application systems and software to address the software complexity and quality of ML techniques. Such design practices are often formalized as architecture patterns and design patterns by enc… ▽ More

    Submitted 11 October, 2019; v1 submitted 10 October, 2019; originally announced October 2019.

  46. arXiv:1910.01321  [pdf

    cs.SE

    An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples

    Authors: Morteza Verdi, Ashkan Sami, Jafar Akhondali, Foutse Khomh, Gias Uddin, Alireza Karami Motlagh

    Abstract: Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To… ▽ More

    Submitted 19 January, 2021; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 14 pages

  47. arXiv:1909.02563  [pdf, ps, other

    cs.LG cs.CV stat.ML

    DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

    Authors: Houssem Ben Braiek, Foutse khomh

    Abstract: The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far,… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

  48. arXiv:1909.02562  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    TFCheck : A TensorFlow Library for Detecting Training Issues in Neural Network Programs

    Authors: Houssem Ben Braiek, Foutse Khomh

    Abstract: The increasing inclusion of Machine Learning (ML) models in safety critical systems like autonomous cars have led to the development of multiple model-based ML testing techniques. One common denominator of these testing techniques is their assumption that training programs are adequate and bug-free. These techniques only focus on assessing the performance of the constructed model using manually la… ▽ More

    Submitted 5 September, 2019; originally announced September 2019.

  49. arXiv:1906.07154  [pdf, other

    cs.SE cs.LG

    Machine Learning Software Engineering in Practice: An Industrial Case Study

    Authors: Md Saidur Rahman, Emilio Rivera, Foutse Khomh, Yann-Gaël Guéhéneuc, Bernd Lehnert

    Abstract: SAP is the market leader in enterprise software offering an end-to-end suite of applications and services to enable their customers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and then sent to some central servers for… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: 21 pages, 5 figures

  50. arXiv:1903.01899  [pdf, other

    cs.SE cs.LG

    A Machine-learning Based Ensemble Method For Anti-patterns Detection

    Authors: Antoine Barbez, Foutse Khomh, Yann-Gaël Guéhéneuc

    Abstract: Anti-patterns are poor solutions to recurring design problems. Several empirical studies have highlighted their negative impact on program comprehension, maintainability, as well as fault-proneness. A variety of detection approaches have been proposed to identify their occurrences in source code. However, these approaches can identify only a subset of the occurrences and report large numbers of fa… ▽ More

    Submitted 16 October, 2019; v1 submitted 29 January, 2019; originally announced March 2019.

    Comments: Preprint Submitted to Journal of Systems and Software, Elsevier