Skip to main content

Showing 1–22 of 22 results for author: Nagappan, M

  1. arXiv:2407.03093  [pdf, other

    cs.SE cs.AI cs.CR cs.LG

    Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets

    Authors: Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Meiyappan Nagappan, Shane McIntosh

    Abstract: The impact of software vulnerabilities on everyday software systems is significant. Despite deep learning models being proposed for vulnerability detection, their reliability is questionable. Prior evaluations show high recall/F1 scores of up to 99%, but these models underperform in practical scenarios, particularly when assessed on entire codebases rather than just the fixing commit. This paper i… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    ACM Class: D.2; I.2

    Journal ref: 10.1109/TSE.2024.3423712

  2. arXiv:2406.17615  [pdf, other

    cs.SE cs.AI cs.LG

    Aligning Programming Language and Natural Language: Exploring Design Choices in Multi-Modal Transformer-Based Embedding for Bug Localization

    Authors: Partha Chakraborty, Venkatraman Arumugam, Meiyappan Nagappan

    Abstract: Bug localization refers to the identification of source code files which is in a programming language and also responsible for the unexpected behavior of software using the bug report, which is a natural language. As bug localization is labor-intensive, bug localization models are employed to assist software developers. Due to the domain difference between source code files and bug reports, modern… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    ACM Class: D.2; I.2

  3. Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A case study on CodeChef problems

    Authors: Oseremen Joy Idialu, Noble Saji Mathews, Rungroj Maipradit, Joanne M. Atlee, Mei Nagappan

    Abstract: Artificial intelligence (AI) assistants such as GitHub Copilot and ChatGPT, built on large language models like GPT-4, are revolutionizing how programming tasks are performed, raising questions about whether code is authored by generative AI models. Such questions are of particular interest to educators, who worry that these tools enable a new form of academic dishonesty, in which students submit… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 13 pages, 5 figures, MSR Conference

  4. arXiv:2402.13521  [pdf, other

    cs.SE cs.AI

    Test-Driven Development for Code Generation

    Authors: Noble Saji Mathews, Meiyappan Nagappan

    Abstract: Recent Large Language Models (LLMs) have demonstrated significant capabilities in generating code snippets directly from problem statements. This increasingly automated process mirrors traditional human-led software development, where code is often written in response to a requirement. Historically, Test-Driven Development (TDD) has proven its merit, requiring developers to write tests before the… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  5. FuzzSlice: Pruning False Positives in Static Analysis Warnings Through Function-Level Fuzzing

    Authors: Aniruddhan Murali, Noble Saji Mathews, Mahmoud Alfadel, Meiyappan Nagappan, Meng Xu

    Abstract: Manual confirmation of static analysis reports is a daunting task. This is due to both the large number of warnings and the high density of false positives among them. Fuzzing techniques have been proposed to verify static analysis warnings. However, a major limitation is that fuzzing the whole project to reach all static analysis warnings is not feasible. This can take several days and exponentia… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: The paper has been accepted for publication at ICSE 2024 (Research Track)

  6. arXiv:2401.01269  [pdf, other

    cs.CR cs.AI cs.SE

    LLbezpeky: Leveraging Large Language Models for Vulnerability Detection

    Authors: Noble Saji Mathews, Yelizaveta Brus, Yousra Aafer, Meiyappan Nagappan, Shane McIntosh

    Abstract: Despite the continued research and progress in building secure systems, Android applications continue to be ridden with vulnerabilities, necessitating effective detection methods. Current strategies involving static and dynamic analysis tools come with limitations like overwhelming number of false positives and limited scope of analysis which make either difficult to adopt. Over the past years, ma… ▽ More

    Submitted 13 February, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: This project report was presented as a part of the course CS858 at the University of Waterloo under the supervision of Prof. Yousra Aafer

  7. arXiv:2310.16132  [pdf, other

    cs.SE

    Diversity in Software Engineering Conferences and Journals

    Authors: Aditya Shankar Narayanan, Dheeraj Vagavolu, Nancy A Day, Meiyappan Nagappan

    Abstract: Diversity with respect to ethnicity and gender has been studied in open-source and industrial settings for software development. Publication avenues such as academic conferences and journals contribute to the growing technology industry. However, there have been very few diversity-related studies conducted in the context of academia. In this paper, we study the ethnic, gender, and geographical div… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 13 pages, 10 figures, 4 tables

  8. A User-centered Security Evaluation of Copilot

    Authors: Owura Asare, Meiyappan Nagappan, N. Asokan

    Abstract: Code generation tools driven by artificial intelligence have recently become more popular due to advancements in deep learning and natural language processing that have increased their capabilities. The proliferation of these tools may be a double-edged sword because while they can increase developer productivity by making it easier to write code, research has shown that they can also generate ins… ▽ More

    Submitted 5 January, 2024; v1 submitted 12 August, 2023; originally announced August 2023.

    Comments: To be published in ICSE 2024 Research Track

  9. arXiv:2305.10233  [pdf, other

    cs.SE

    Statically Detecting Buffer Overflow in Cross-language Android Applications Written in Java and C/C++

    Authors: Kishanthan Thangarajah, Noble Mathews, Michael Pu, Meiyappan Nagappan, Yousra Aafer, Sridhar Chimalakonda

    Abstract: Many applications are being written in more than one language to take advantage of the features that different languages provide such as native code support, improved performance, and language-specific libraries. However, there are few static analysis tools currently available to analyse the source code of such multilingual applications. Existing work on cross-language (Java and C/C++) analysis fa… ▽ More

    Submitted 17 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  10. arXiv:2305.06439  [pdf, other

    cs.SE

    Measuring the Runtime Performance of Code Produced with GitHub Copilot

    Authors: Daniel Erhabor, Sreeharsha Udayashankar, Meiyappan Nagappan, Samer Al-Kiswany

    Abstract: GitHub Copilot is an artificially intelligent programming assistant used by many developers. While a few studies have evaluated the security risks of using Copilot, there has not been any study to show if it aids developers in producing code with better runtime performance. We evaluate the runtime performance of code produced when developers use GitHub Copilot versus when they do not. To this end,… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  11. arXiv:2305.05586  [pdf, other

    cs.SE cs.AI

    RLocator: Reinforcement Learning for Bug Localization

    Authors: Partha Chakraborty, Mahmoud Alfadel, Meiyappan Nagappan

    Abstract: Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do n… ▽ More

    Submitted 2 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  12. arXiv:2204.04741  [pdf, other

    cs.SE cs.CR

    Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?

    Authors: Owura Asare, Meiyappan Nagappan, N. Asokan

    Abstract: Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a comparative empirical analysis of Copilot-generated code from a security perspective. The aim of this study is to determine if Copilot is as bad as human… ▽ More

    Submitted 5 January, 2024; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in Empirical Software Engineering

  13. arXiv:2204.04318  [pdf, other

    cs.SE cs.CY

    Towards Understanding Barriers and Mitigation Strategies of Software Engineers with Non-traditional Educational and Occupational Backgrounds

    Authors: Tavian Barnes, Ken Jen Lee, Cristina Tavares, Gema Rodríguez-Pérez, Meiyappan Nagappan

    Abstract: The traditional path to a software engineering career involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many software engineers take a non-traditional path to their career, starting from other industries or fields of study. This paper proposes a study on barriers faced by software engineers with non-traditional educational and occupational ba… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 8 pages, 5 figures, accepted at the MSR 2022 Registered Reports Track as a Continuity Acceptance (CA)

    ACM Class: D.2; K.4.2

  14. arXiv:2203.00101  [pdf, ps, other

    cs.SE cs.AI cs.LG cs.PL

    ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

    Authors: Hossein Keshavarz, Meiyappan Nagappan

    Abstract: In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models tha… ▽ More

    Submitted 29 April, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

  15. arXiv:2104.06143  [pdf, other

    cs.SE

    On the Relationship Between the Developer's Perceptible Race and Ethnicity and the Evaluation of Contributions in OSS

    Authors: Reza Nadri, Gema Rodríguez-Pérez, Meiyappan Nagappan

    Abstract: Open Source Software (OSS) projects are typically the result of collective efforts performed by developers with different backgrounds. Although the quality of developers' contributions should be the only factor influencing the evaluation of the contributions to OSS projects, recent studies have shown that diversity issues are correlated with the acceptance or rejection of developers' contributions… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  16. Watch out for Extrinsic Bugs! A Case Study of their Impact in Just-In-Time Bug Prediction Models on the OpenStack project

    Authors: Gema Rodriguez-Perez, Meiyappan Nagappan, Gregorio Robles

    Abstract: Intrinsic bugs are bugs for which a bug introducing change can be identified in the version control system of a software. In contrast, extrinsic bugs are caused by external changes to a software, such as errors in external APIs; thereby they do not have an explicit bug introducing change in the version control system. Although most previous research literature has assumed that all bugs are of intr… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

    Comments: in IEEE Transactions on Software Engineering, 2020

  17. arXiv:2009.09130  [pdf, other

    cs.SE

    How are Project-Specific Forums Utilized? A Study of Participation, Content, and Sentiment in the Eclipse Ecosystem

    Authors: Yusuf Sulistyo Nugroho, Syful Islam, Keitaro Nakasai, Ifraz Rehman, Hideaki Hata, Raula Gaikovina Kula, Meiyappan Nagappan, Kenichi Matsumoto

    Abstract: Although many software development projects have moved their developer discussion forums to generic platforms such as Stack Overflow, Eclipse has been steadfast in hosting their self-supported community forums. While recent studies show forums share similarities to generic communication channels, it is unknown how project-specific forums are utilized. In this paper, we analyze 832,058 forum thread… ▽ More

    Submitted 5 August, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: 33 pages, 7 figures

  18. Ammonia: An Approach for Deriving Project-specific Bug Patterns

    Authors: Yoshiki Higo, Shinpei Hayashi, Hideaki Hata, Meiyappan Nagappan

    Abstract: Finding and fixing buggy code is an important and cost-intensive maintenance task, and static analysis (SA) is one of the methods developers use to perform it. SA tools warn developers about potential bugs by scanning their source code for commonly occurring bug patterns, thus giving those developers opportunities to fix the warnings (potential bugs) before they release the software. Typically, SA… ▽ More

    Submitted 14 March, 2020; v1 submitted 27 January, 2020; originally announced January 2020.

    Comments: 28 pages, Empirical Software Engineering

    Journal ref: Empirical Software Engineering, 25(3):1951-1979, 2020

  19. arXiv:1911.07620  [pdf, other

    cs.SE cs.CL

    Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits

    Authors: Achyudh Ram, Ji Xin, Meiyappan Nagappan, Yaoliang Yu, Rocío Cabrera Lozoya, Antonino Sabetta, Jimmy Lin

    Abstract: Public vulnerability databases such as CVE and NVD account for only 60% of security vulnerabilities present in open-source projects, and are known to suffer from inconsistent quality. Over the last two years, there has been considerable growth in the number of known vulnerabilities across projects available in various repositories such as NPM and Maven Central. Such an increasing risk calls for a… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

  20. arXiv:1812.09653  [pdf, other

    cs.CL cs.SE

    Supervised Sentiment Classification with CNNs for Diverse SE Datasets

    Authors: Achyudh Ram, Meiyappan Nagappan

    Abstract: Sentiment analysis, a popular technique for opinion mining, has been used by the software engineering research community for tasks such as assessing app reviews, developer emotions in issue trackers and developer opinions on APIs. Past research indicates that state-of-the-art sentiment analysis techniques have poor performance on SE data. This is because sentiment analysis tools are often designed… ▽ More

    Submitted 22 December, 2018; originally announced December 2018.

  21. A Large-Scale Study on the Usage of Testing Patterns that Address Maintainability Attributes (Patterns for Ease of Modification, Diagnoses, and Comprehension)

    Authors: Danielle Gonzalez, Joanna C. S. Santos, Andrew Popovich, Mehdi Mirakhorli, Mei Nagappan

    Abstract: Test case maintainability is an important concern, especially in open source and distributed development environments where projects typically have high contributor turnover with varying backgrounds and experience, and where code ownership changes often. Similar to design patterns, patterns for unit testing promote maintainability quality attributes such as ease of diagnoses, modifiability, and co… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: Mining Software Repositories (MSR) 2017 Research Track

    Journal ref: 017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), Buenos Aires, 2017, pp. 391-401

  22. arXiv:1702.07681  [pdf, other

    cs.CY

    What Aspects of Mobile Ads Do Users Care About? An Empirical Study of Mobile In-app Ad Reviews

    Authors: Jiaping Gui, Meiyappan Nagappan, William G. J. Halfond

    Abstract: In the mobile app ecosystem, developers receive ad revenue by placing ads in their apps and releasing them for free. While there is evidence that users do not like ads, we do not know what are the aspects of ads that users dislike nor if they dislike certain aspects of ads more than others. Therefore, in this paper, we analyzed the different topics of ad related complaints from users. In order to… ▽ More

    Submitted 22 February, 2017; originally announced February 2017.

    Comments: 10 pages