-
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
Authors:
Xingfang Wu,
Heng Li,
Nobukazu Yoshioka,
Hironori Washizaki,
Foutse Khomh
Abstract:
One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow ado…
▽ More
One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.
△ Less
Submitted 4 March, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Identifying Characteristics of the Agile Development Process That Impact User Satisfaction
Authors:
Minshun Yang,
Seiji Sato,
Hironori Washizaki,
Yoshiaki Fukazawa,
Juichi Takahashi
Abstract:
The purpose of this study is to identify the characteristics of Agile development processes that impact user satisfaction. We used user reviews of OSS smartphone apps and various data from version control systems to examine the relationships, especially time-series correlations, between user satisfaction and development metrics that are expected to be related to user satisfaction. Although no metr…
▽ More
The purpose of this study is to identify the characteristics of Agile development processes that impact user satisfaction. We used user reviews of OSS smartphone apps and various data from version control systems to examine the relationships, especially time-series correlations, between user satisfaction and development metrics that are expected to be related to user satisfaction. Although no metrics conclusively indicate an improved user satisfaction, motivation of the development team, the ability to set appropriate work units, the appropriateness of work rules, and the improvement of code maintainability should be considered as they are correlated with improved user satisfaction. In contrast, changes in the release frequency and workload are not correlated.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Systematic Literature Review of Gender and Software Engineering in Asia
Authors:
Hironori Washizaki
Abstract:
It is essential to discuss the role, difficulties, and opportunities concerning people of different gender in the field of software engineering research, education, and industry. Although some literature reviews address software engineering and gender, it is still unclear how research and practices in Asia exist for handling gender aspects in software development and engineering. We conducted a sy…
▽ More
It is essential to discuss the role, difficulties, and opportunities concerning people of different gender in the field of software engineering research, education, and industry. Although some literature reviews address software engineering and gender, it is still unclear how research and practices in Asia exist for handling gender aspects in software development and engineering. We conducted a systematic literature review to grasp the comprehensive view of gender research and practices in Asia. We analyzed the 32 identified papers concerning countries and publication years among 463 publications. Researchers and practitioners from various organizations actively work on gender research and practices in some countries, including China, India, and Turkey. We identified topics and classified them into seven categories varying from personal mental health and team building to organization. Future research directions include investigating the synergy between (regional) gender aspects and cultural concerns and considering possible contributions and dependency among different topics to have a solid foundation for accelerating further research and getting actionable practices.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Machine Learning Application Development: Practitioners' Insights
Authors:
Md Saidur Rahman,
Foutse Khomh,
Alaleh Hamidi,
Jinghui Cheng,
Giuliano Antoniol,
Hironori Washizaki
Abstract:
Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts…
▽ More
Nowadays, intelligent systems and services are getting increasingly popular as they provide data-driven solutions to diverse real-world problems, thanks to recent breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML). However, machine learning meets software engineering not only with promising potentials but also with some inherent challenges. Despite some recent research efforts, we still do not have a clear understanding of the challenges of developing ML-based applications and the current industry practices. Moreover, it is unclear where software engineering researchers should focus their efforts to better support ML application developers. In this paper, we report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners (with diverse skills, experience, and application domains) into 17 findings; outlining challenges and best practices for ML application development. Practitioners involved in the development of ML-based software systems can leverage the summarized best practices to improve the quality of their system. We hope that the reported challenges will inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Preliminary Systematic Literature Review of Machine Learning System Development Process
Authors:
Yasuhiro Watanabe,
Hironori Washizaki,
Kazunori Sakamoto,
Daisuke Saito,
Kiyoshi Honda,
Naohiko Tsuda,
Yoshiaki Fukazawa,
Nobukazu Yoshioka
Abstract:
Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems. Assuming that detailed development processes depend on individual developers and are not discussed in detail. To help developers to standardize their ML system development processes, we conduct a preliminary systematic literatu…
▽ More
Previous machine learning (ML) system development research suggests that emerging software quality attributes are a concern due to the probabilistic behavior of ML systems. Assuming that detailed development processes depend on individual developers and are not discussed in detail. To help developers to standardize their ML system development processes, we conduct a preliminary systematic literature review on ML system development processes. A search query of 2358 papers identified 7 papers as well as two other papers determined in an ad-hoc review. Our findings include emphasized phases in ML system developments, frequently described practices and tailored traditional software development practices.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Studying Software Engineering Patterns for Designing Machine Learning Systems
Authors:
Hironori Washizaki,
Hiromu Uchida,
Foutse Khomh,
Yann-Gael Gueheneuc
Abstract:
Machine-learning (ML) techniques have become popular in the recent years. ML techniques rely on mathematics and on software engineering. Researchers and practitioners studying best practices for designing ML application systems and software to address the software complexity and quality of ML techniques. Such design practices are often formalized as architecture patterns and design patterns by enc…
▽ More
Machine-learning (ML) techniques have become popular in the recent years. ML techniques rely on mathematics and on software engineering. Researchers and practitioners studying best practices for designing ML application systems and software to address the software complexity and quality of ML techniques. Such design practices are often formalized as architecture patterns and design patterns by encapsulating reusable solutions to commonly occurring problems within given contexts. However, to the best of our knowledge, there has been no work collecting, classifying, and discussing these software-engineering (SE) design patterns for ML techniques systematically. Thus, we set out to collect good/bad SE design patterns for ML techniques to provide developers with a comprehensive and ordered classification of such patterns. We report here preliminary results of a systematic-literature review (SLR) of good/bad design patterns for ML.
△ Less
Submitted 11 October, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Landscape of IoT Patterns
Authors:
Hironori Washizaki,
Nobukazu Yoshioka,
Atsuo Hazeyama,
Takehisa Kato,
Haruhiko Kaiya,
Shinpei Ogata,
Takao Okubo,
Eduardo B. Fernandez
Abstract:
Patterns are encapsulations of problems and solutions under specific contexts. As the industry is realizing many successes (and failures) in IoT systems development and operations, many IoT patterns have been published such as IoT design patterns and IoT architecture patterns. Because these patterns are not well classified, their adoption does not live up to their potential. To understand the reas…
▽ More
Patterns are encapsulations of problems and solutions under specific contexts. As the industry is realizing many successes (and failures) in IoT systems development and operations, many IoT patterns have been published such as IoT design patterns and IoT architecture patterns. Because these patterns are not well classified, their adoption does not live up to their potential. To understand the reasons, this paper analyzes an extensive set of published IoT architecture and design patterns according to several dimensions and outlines directions for improvements in publishing and adopting IoT patterns.
△ Less
Submitted 25 February, 2019;
originally announced February 2019.
-
Is Fragmentation a Threat to the Success of the Internet of Things?
Authors:
Mohab Aly,
Foutse Khomh,
Yann-Gaël Guéhéneuc,
Hironori Washizaki,
Soumaya Yacout
Abstract:
The current revolution in collaborating distributed things is seen as the first phase of IoT to develop various services. Such collaboration is threatened by the fragmentation found in the industry nowadays as it brings challenges stemming from the difficulty to integrate diverse technologies in system. Diverse networking technologies induce interoperability issues, hence, limiting the possibility…
▽ More
The current revolution in collaborating distributed things is seen as the first phase of IoT to develop various services. Such collaboration is threatened by the fragmentation found in the industry nowadays as it brings challenges stemming from the difficulty to integrate diverse technologies in system. Diverse networking technologies induce interoperability issues, hence, limiting the possibility of reusing the data to develop new services. Different aspects of handling data collection must be available to provide interoperability to the diverse objects interacting; however, such approaches are challenged as they bring substantial performance impairments in settings with the increasing number of collaborating devices/technologies.
△ Less
Submitted 2 August, 2018;
originally announced August 2018.