-
Enhancing Testing at Meta with Rich-State Simulated Populations
Authors:
Nadia Alshahwan,
Arianna Blasi,
Kinga Bojarczuk,
Andrea Ciancone,
Natalija Gucevska,
Mark Harman,
Simon Schellaert,
Inna Harper,
Yue Jia,
Michał Królikowski,
Will Lewis,
Dragos Martac,
Rubmary Rojas,
Kate Ustiuzhanina
Abstract:
This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS…
▽ More
This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS and Android Platforms. These apps consist of tens of millions of lines of code, communicating with hundreds of millions of lines of backend code, and are used by over 2 billion people every day. Our results reveal that rich state increases average code coverage by 38\%, and endpoint coverage by 61\%. More importantly, it also yields an average increase of 115\% in the faults found by automated testing. The rich-state test user populations are also deployed in a (continually evolving) Test Universe; a web-enabled simulation platform for privacy-safe manual testing, which has been used by over 21,000 Meta engineers since its deployment in November 2022.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Automated Unit Test Improvement using Large Language Models at Meta
Authors:
Nadia Alshahwan,
Jubin Chheda,
Anastasia Finegenova,
Beliz Gokkaya,
Mark Harman,
Inna Harper,
Alexandru Marginean,
Shubho Sengupta,
Eddy Wang
Abstract:
This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Ins…
▽ More
This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Instagram and Facebook platforms. In an evaluation on Reels and Stories products for Instagram, 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers. We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Assured LLM-Based Software Engineering
Authors:
Nadia Alshahwan,
Mark Harman,
Inna Harper,
Alexandru Marginean,
Shubho Sengupta,
Eddy Wang
Abstract:
In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code
- does not regress the properties of the original code?
- improves the original in a verifiable and measurable way?
To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approac…
▽ More
In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code
- does not regress the properties of the original code?
- improves the original in a verifiable and measurable way?
To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approach, inspired by Genetic Improvement. Assured LLMSE applies a series of semantic filters that discard code that fails to meet these twin guarantees. This overcomes the potential problem of LLM's propensity to hallucinate. It allows us to generate code using LLMs, independently of any human. The human plays the role only of final code reviewer, as they would do with code generated by other human engineers.
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.