Skip to main content

Showing 1–3 of 3 results for author: Babii, H

  1. arXiv:2103.01722  [pdf, other

    cs.SE

    Mining Software Repositories with a Collaborative Heuristic Repository

    Authors: Hlib Babii, Julian Aron Prenner, Laurin Stricker, Anjan Karmakar, Andrea Janes, Romain Robbes

    Abstract: Many software engineering studies or tasks rely on categorizing software engineering artifacts. In practice, this is done either by defining simple but often imprecise heuristics, or by manual labelling of the artifacts. Unfortunately, errors in these categorizations impact the tasks that rely on them. To improve the precision of these categorizations, we propose to gather heuristics in a collabor… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 5 pages; to appear in Proceedings of ICSE NIER 2021

  2. Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

    Authors: Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, Andrea Janes

    Abstract: Statistical language modeling techniques have successfully been applied to large source code corpora, yielding a variety of new software development tools, such as tools for code suggestion, improving readability, and API migration. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. Both large… ▽ More

    Submitted 17 March, 2020; originally announced March 2020.

    Comments: 13 pages; to appear in Proceedings of ICSE 2020

  3. arXiv:1904.01873  [pdf, other

    cs.CL cs.SE

    Modeling Vocabulary for Big Code Machine Learning

    Authors: Hlib Babii, Andrea Janes, Romain Robbes

    Abstract: When building machine learning models that operate on source code, several decisions have to be made to model source-code vocabulary. These decisions can have a large impact: some can lead to not being able to train models at all, others significantly affect performance, particularly for Neural Language Models. Yet, these decisions are not often fully described. This paper lists important modeling… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: 12 pages, 1 figure