Skip to main content

Showing 1–50 of 145 results for author: Lyu, M R

  1. arXiv:2406.16386  [pdf, other

    cs.SE cs.AI

    Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

    Authors: Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, Michael R. Lyu

    Abstract: Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

    Authors: Shuqing Li, Cuiyun Gao, Jianping Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

    Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

  3. arXiv:2406.07174  [pdf, other

    cs.SE

    ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units

    Authors: Junjie Huang, Zhihan Jiang, Zhuangbin Chen, Michael R. Lyu

    Abstract: Log parsing serves as an essential prerequisite for various log analysis tasks. Recent advancements in this field have improved parsing accuracy by leveraging the semantics in logs through fine-tuning large language models (LLMs) or learning from in-context demonstrations. However, these methods heavily depend on labeled examples to achieve optimal performance. In practice, collecting sufficient l… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.06975  [pdf, other

    cs.DC cs.SE

    TraceMesh: Scalable and Streaming Sampling for Distributed Traces

    Authors: Zhuangbin Chen, Zhihan Jiang, Yuxin Su, Michael R. Lyu, Zibin Zheng

    Abstract: Distributed tracing serves as a fundamental element in the monitoring of cloud-based and datacenter systems. It provides visibility into the full lifecycle of a request or operation across multiple services, which is essential for understanding system dependencies and performance bottlenecks. To mitigate computational and storage overheads, most tracing frameworks adopt a uniform sampling strategy… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by The 2024 IEEE 17th International Conference on Cloud Computing (CLOUD)

  5. arXiv:2405.02213  [pdf, other

    cs.SE cs.AI cs.LG

    Automatic Programming: Large Language Models and Beyond

    Authors: Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

    Abstract: Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related is… ▽ More

    Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  6. arXiv:2404.19368  [pdf, other

    cs.SE

    Exploring Multi-Lingual Bias of Large Code Models in Code Generation

    Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

    Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 12 pages

  7. arXiv:2404.17153  [pdf, other

    cs.SE

    A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

    Authors: Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu

    Abstract: Tremendous efforts have been devoted to automating software debugging, a time-consuming process involving fault localization and repair generation. Recently, Large Language Models (LLMs) have shown great potential in automated debugging. However, we identified three challenges posed to traditional and LLM-based debugging tools: 1) the upstream imperfection of fault localization affects the downstr… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  8. arXiv:2404.13957  [pdf, other

    cs.CL

    How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO

    Authors: Man Tik Ng, Hui Tung Tse, Jen-tse Huang, Jingjing Li, Wenxuan Wang, Michael R. Lyu

    Abstract: The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 9 pages

  9. arXiv:2403.19096  [pdf, other

    cs.SE cs.CR

    SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection

    Authors: Xin-Cheng Wen, Cuiyun Gao, Shuzheng Gao, Yang Xiao, Michael R. Lyu

    Abstract: Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related stru… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ISSTA 2024

  10. arXiv:2403.18252  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Beyond Embeddings: The Promise of Visual Table in Visual Reasoning

    Authors: Yiwu Zhong, Zi-Yuan Hu, Michael R. Lyu, Liwei Wang

    Abstract: Visual representation learning has been a cornerstone in computer vision, involving typical forms such as visual embeddings, structural symbols, and text-based representations. Despite the success of CLIP-type visual embeddings, they often lack access to world knowledge critical for visual reasoning. In this work, we propose Visual Table, a novel form of visual representation tailored for visual r… ▽ More

    Submitted 17 June, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: Project page: https://github.com/LaVi-Lab/Visual-Table

  11. arXiv:2403.17574  [pdf, other

    cs.SE cs.DC

    SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions

    Authors: Cheryl Lee, Zhouruixin Zhu, Tianyi Yang, Yintong Huo, Yuxin Su, Pinjia He, Michael R. Lyu

    Abstract: As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloadi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

  12. arXiv:2403.11807  [pdf, other

    cs.AI cs.CL

    How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

    Authors: Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

    Abstract: Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce o… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme

  13. arXiv:2403.06485  [pdf, other

    cs.SE cs.CL cs.LG

    Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

    Authors: Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu

    Abstract: Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typic… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  14. arXiv:2402.17583  [pdf, other

    cs.SE cs.CL cs.LG

    FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

    Authors: Junjie Huang, Jinyang Liu, Zhuangbin Chen, Zhihan Jiang, Yichen LI, Jiazhen Gu, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

    Abstract: Postmortem analysis is essential in the management of incidents within cloud systems, which provides valuable insights to improve system's reliability and robustness. At CloudA, fault pattern profiling is performed during the postmortem phase, which involves the classification of incidents' faults into unique categories, referred to as fault pattern. By aggregating and analyzing these fault patter… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  15. arXiv:2402.12958  [pdf, other

    cs.SE

    Go Static: Contextualized Logging Statement Generation

    Authors: Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, Michael R. Lyu

    Abstract: Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify thre… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

  16. arXiv:2402.11217  [pdf, other

    cs.CL cs.CV

    Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models

    Authors: Wenxuan Wang, Yihang Su, Jingyuan Huan, Jie Liu, Wenting Chen, Yudi Zhang, Cheng-Yi Li, Kao-Jung Chang, Xiaohan Xin, Linlin Shen, Michael R. Lyu

    Abstract: The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve com… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 20 pages, 15 figures

  17. arXiv:2402.03630  [pdf, other

    cs.SE cs.AI

    Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

    Authors: Yichen Li, Yun Peng, Yintong Huo, Michael R. Lyu

    Abstract: Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that… ▽ More

    Submitted 19 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  18. arXiv:2401.06175  [pdf, other

    cs.SE cs.AI cs.LG

    MTAD: Tools and Benchmarks for Multivariate Time Series Anomaly Detection

    Authors: Jinyang Liu, Wenwei Gu, Zhuangbin Chen, Yichen Li, Yuxin Su, Michael R. Lyu

    Abstract: Key Performance Indicators (KPIs) are essential time-series metrics for ensuring the reliability and stability of many software systems. They faithfully record runtime states to facilitate the understanding of anomalous system behaviors and provide informative clues for engineers to pinpoint the root causes. The unprecedented scale and complexity of modern software systems, however, make the volum… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: The code and datasets are available at https://github.com/OpsPAI/MTAD

  19. Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models

    Authors: Shuzheng Gao, Wenxin Mao, Cuiyun Gao, Li Li, Xing Hu, Xin Xia, Michael R. Lyu

    Abstract: Pre-trained code models have recently achieved substantial improvements in many code intelligence tasks. These models are first pre-trained on large-scale unlabeled datasets in a task-agnostic manner using self-supervised learning, and then fine-tuned on labeled datasets in downstream tasks. However, the labeled datasets are usually limited in size (i.e., human intensive efforts), which may hinder… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted by ICSE 2024

  20. arXiv:2401.00763  [pdf, other

    cs.SE cs.AI cs.CL cs.CV cs.MM

    New Job, New Gender? Measuring the Social Bias in Image Generation Models

    Authors: Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu

    Abstract: Image generation models can generate or edit images from a given text. Recent advancements in image generation technology, exemplified by DALL-E and Midjourney, have been groundbreaking. These advanced models, despite their impressive capabilities, are often trained on massive Internet datasets, making them susceptible to generating content that perpetuates social stereotypes and biases, which can… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  21. arXiv:2401.00761  [pdf, other

    cs.SE cs.AI cs.CL

    The Earth is Flat? Unveiling Factual Errors in Large Language Models

    Authors: Wenxuan Wang, Juluan Shi, Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

    Abstract: Large Language Models (LLMs) like ChatGPT are foundational in various applications due to their extensive knowledge from pre-training and fine-tuning. Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education to mislead users. Current methods for evaluating LLMs' veracity are limited by test data leakage… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  22. arXiv:2401.00757  [pdf, other

    cs.SE cs.AI cs.CL cs.LO

    A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models

    Authors: Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

    Abstract: Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as m… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  23. arXiv:2310.12598  [pdf, other

    cs.SE

    Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem

    Authors: Yun Peng, Ruida Hu, Ruoke Wang, Cuiyun Gao, Shuqing Li, Michael R. Lyu

    Abstract: Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries within the PyPI ecosystem. Nevertheless, the utilization of third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. Moreover, endeavors have been made to automatically infer dependencies. These… ▽ More

    Submitted 4 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by ICSE 2024

  24. arXiv:2310.12481  [pdf, other

    cs.CL cs.AI

    Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

    Authors: Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu

    Abstract: This paper identifies a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g., ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark of conc… ▽ More

    Submitted 16 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

  25. arXiv:2310.01796  [pdf, other

    cs.SE

    LILAC: Log Parsing using LLMs with Adaptive Parsing Cache

    Authors: Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Gu, Michael R. Lyu

    Abstract: Log parsing transforms log messages into structured formats, serving as the prerequisite step for various log analysis tasks. Although a variety of log parsing approaches have been proposed, their performance on complicated log data remains compromised due to the use of human-crafted rules or learning-based models with limited training data. The recent emergence of powerful large language models (… ▽ More

    Submitted 22 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

  26. arXiv:2310.01386  [pdf, other

    cs.CL

    Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

    Authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education. LLMs become more than mere applications, evolving into assistants capable of addressing diverse user requests. This narrows the distinction between human beings and artificial in… ▽ More

    Submitted 22 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted for ICLR 2024 Oral Presentation. 15 pages (main text) and 5 pages (appendix)

  27. arXiv:2310.00905  [pdf, other

    cs.CL cs.AI

    All Languages Matter: On the Multilingual Safety of Large Language Models

    Authors: Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

    Abstract: Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of common… ▽ More

    Submitted 20 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted by ACL 2024 Findings. The first multilingual safety benchmark for large language models

  28. arXiv:2310.00677  [pdf, other

    cs.SE

    A Roadmap towards Intelligent Operations for Reliable Cloud Computing Systems

    Authors: Yintong Huo, Cheryl Lee, Jinyang Liu, Tianyi Yang, Michael R. Lyu

    Abstract: The increasing complexity and usage of cloud systems have made it challenging for service providers to ensure reliability. This paper highlights two main challenges, namely internal and external factors, that affect the reliability of cloud microservices. Afterward, we discuss the data-driven approach that can resolve these challenges from four key aspects: ticket management, log management, multi… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by ICDM AIOPS workshop

  29. arXiv:2309.12167  [pdf, other

    cs.SE

    Revealing Performance Issues in Server-side WebAssembly Runtimes via Differential Testing

    Authors: Shuyao Jiang, Ruiying Zeng, Zihao Rao, Jiazhen Gu, Yangfan Zhou, Michael R. Lyu

    Abstract: WebAssembly (Wasm) is a bytecode format originally serving as a compilation target for Web applications. It has recently been used increasingly on the server side, e.g., providing a safer, faster, and more portable alternative to Linux containers. With the popularity of server-side Wasm applications, it is essential to study performance issues (i.e., abnormal latency) in Wasm runtimes, as they may… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted by the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

  30. Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System

    Authors: Jiacheng Shen, Pengfei Zuo, Xuchuan Luo, Yuxin Su, Jiazhen Gu, Hao Feng, Yangfan Zhou, Michael R. Lyu

    Abstract: In-memory caching systems are fundamental building blocks in cloud services. However, due to the coupled CPU and memory on monolithic servers, existing caching systems cannot elastically adjust resources in a resource-efficient and agile manner. To achieve better elasticity, we propose to port in-memory caching systems to the disaggregated memory (DM) architecture, where compute and memory resourc… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  31. arXiv:2309.08115  [pdf, other

    cs.SE

    REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

    Authors: Chaozheng Wang, Zongjie Li, Yun Peng, Shuzheng Gao, Sirong Chen, Shuai Wang, Cuiyun Gao, Michael R. Lyu

    Abstract: Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep lear… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by ASE 2023 Industry Challenge(Competition) Track

  32. Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-Coded Credentials

    Authors: Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, Michael R. Lyu

    Abstract: Neural Code Completion Tools (NCCTs) have reshaped the field of software engineering, which are built upon the language modeling technique and can accurately suggest contextually relevant code snippets. However, language models may emit the training data verbatim during inference with appropriate prompts. This memorization property raises privacy concerns of NCCTs about hard-coded credential leaka… ▽ More

    Submitted 20 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by FSE '24

  33. arXiv:2308.10828  [pdf, other

    cs.SE

    A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?

    Authors: Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen Li, Yintong Huo, Jiazhen Gu, Zhuangbin Chen, Jieming Zhu, Michael R. Lyu

    Abstract: Log data have facilitated various tasks of software development and maintenance, such as testing, debugging and diagnosing. Due to the unstructured nature of logs, log parsing is typically required to transform log messages into structured data for automated log analysis. Given the abundance of log parsers that employ various techniques, evaluating these tools to comprehend their characteristics a… ▽ More

    Submitted 22 March, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: This paper was accepted by 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024)

  34. arXiv:2308.09937  [pdf, other

    cs.SE cs.LG

    Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

    Authors: Jinyang Liu, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Cong Feng, Zengyin Yang, Michael R. Lyu

    Abstract: As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the dependency between different metrics and their historical patterns plays a critical role in pursuing prompt and accurate anomaly detection. Existing approaches fa… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted by the 34th IEEE International Symposium on Software Reliability Engineering (ISSRE'2023)

  35. arXiv:2308.09810  [pdf, other

    cs.SE cs.AI cs.CL cs.CV

    An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

    Authors: Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, Michael R. Lyu

    Abstract: The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts i… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted by ASE 2023. arXiv admin note: substantial text overlap with arXiv:2302.05706

  36. arXiv:2308.09804  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

    Authors: Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang

    Abstract: As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning becomes prohibitively expensive for model training and storage. In vision-and-language (VL), parameter-efficient tuning (PET) techniques are proposed to integrate modular modifications (e.g., Adapter and LoRA) into encoder-decoder PLMs. By tuning a small set of trainable parameters, these techniques perform on… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 (17 pages, 6 figures, 22 tables)

  37. arXiv:2308.09324  [pdf, other

    cs.SE

    AutoLog: A Log Sequence Synthesis Framework for Anomaly Detection

    Authors: Yintong Huo, Yichen Li, Yuxin Su, Pinjia He, Zifan Xie, Michael R. Lyu

    Abstract: The rapid progress of modern computing systems has led to a growing interest in informative run-time logs. Various log-based anomaly detection techniques have been proposed to ensure software reliability. However, their implementation in the industry has been limited due to the lack of high-quality public log resources as training datasets. While some log datasets are available for anomaly detec… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: The paper has been accepted by ASE 2023 (Research Track)

  38. arXiv:2308.07676  [pdf, other

    cs.SE

    Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion

    Authors: Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Michael R. Lyu

    Abstract: Ensuring the reliability and user satisfaction of cloud services necessitates prompt anomaly detection followed by diagnosis. Existing techniques for anomaly detection focus solely on real-time detection, meaning that anomaly alerts are issued as soon as anomalies occur. However, anomalies can propagate and escalate into failures, making faster-than-real-time anomaly detection highly desirable… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted by the Research track of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

  39. arXiv:2308.07638  [pdf, other

    cs.SE

    Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

    Authors: Jinyang Liu, Zhihan Jiang, Jiazhen Gu, Junjie Huang, Zhuangbin Chen, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

    Abstract: Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functio… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: The paper was accepted by the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

  40. arXiv:2308.06783  [pdf, other

    cs.SE cs.HC

    Towards Modeling Software Quality of Virtual Reality Applications from Users' Perspectives

    Authors: Shuqing Li, Lili Wei, Yepang Liu, Cuiyun Gao, Shing-Chi Cheung, Michael R. Lyu

    Abstract: Virtual Reality (VR) technology has become increasingly popular in recent years as a key enabler of the Metaverse. VR applications have unique characteristics, including the revolutionized human-computer interaction mechanisms, that distinguish them from traditional software. Hence, user expectations for the software quality of VR applications diverge from those for traditional software. Investiga… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    ACM Class: D.2.9; H.5.1

  41. arXiv:2308.04813  [pdf, other

    cs.CL

    CLEVA: Chinese Language Models EVAluation Platform

    Authors: Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, Liwei Wang

    Abstract: With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of… ▽ More

    Submitted 16 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: EMNLP 2023 System Demonstrations camera-ready

  42. arXiv:2308.03656  [pdf, other

    cs.CL

    Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

    Authors: Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situa… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages of main text; 9 pages of appendices

  43. arXiv:2307.09163  [pdf, other

    cs.SE

    Generative Type Inference for Python

    Authors: Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, Michael R. Lyu

    Abstract: Python is a popular dynamic programming language, evidenced by its ranking as the second most commonly used language on GitHub. However, its dynamic type system can lead to potential type errors, leading researchers to explore automatic type inference approaches for Python programs. The rule-based type inference approaches can ensure the accuracy of predicted variable types, but they suffer from l… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted by ASE'23

  44. arXiv:2307.05950  [pdf, other

    cs.SE

    Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study

    Authors: Yichen Li, Yintong Huo, Zhihan Jiang, Renyi Zhong, Pinjia He, Yuxin Su, Lionel Briand, Michael R. Lyu

    Abstract: Automated logging statement generation supports developers in documenting critical software runtime behavior. Given the great success in natural language generation and programming language comprehension, large language models (LLMs) might help developers generate logging statements, but this has not yet been investigated. To fill the gap, this paper performs the first study on exploring LLMs for… ▽ More

    Submitted 1 April, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

  45. arXiv:2306.08257  [pdf, other

    cs.CV cs.CR

    On the Robustness of Latent Diffusion Models

    Authors: Jianping Zhang, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weibin Wu, Michael R. Lyu

    Abstract: Latent diffusion models achieve state-of-the-art performance on a variety of generative tasks, such as image synthesis and image editing. However, the robustness of latent diffusion models is not well studied. Previous works only focus on the adversarial attacks against the encoder or the output image under white-box settings, regardless of the denoising process. Therefore, in this paper, we aim t… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  46. arXiv:2306.05032  [pdf, other

    cs.SE cs.LG

    Log-based Anomaly Detection based on EVT Theory with feedback

    Authors: Jinyang Liu, Junjie Huang, Yintong Huo, Zhihan Jiang, Jiazhen Gu, Zhuangbin Chen, Cong Feng, Minzhi Yan, Michael R. Lyu

    Abstract: System logs play a critical role in maintaining the reliability of software systems. Fruitful studies have explored automatic log-based anomaly detection and achieved notable accuracy on benchmark datasets. However, when applied to large-scale cloud systems, these solutions face limitations due to high resource consumption and lack of adaptability to evolving logs. In this paper, we present an acc… ▽ More

    Submitted 30 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

  47. arXiv:2306.01509  [pdf, other

    cs.SE

    EvLog: Identifying Anomalous Logs over Software Evolution

    Authors: Yintong Huo, Cheryl Lee, Yuxin Su, Shiwen Shan, Jinyang Liu, Michael R. Lyu

    Abstract: Software logs record system activities, aiding maintainers in identifying the underlying causes for failures and enabling prompt mitigation actions. However, maintainers need to inspect a large volume of daily logs to identify the anomalous logs that reveal failure details for further diagnosis. Thus, how to automatically distinguish these anomalous logs from normal logs becomes a critical problem… ▽ More

    Submitted 15 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by ISSRE'23

  48. arXiv:2306.01394  [pdf, other

    cs.SE

    Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

    Authors: Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, Michael R. Lyu

    Abstract: Although the dynamic type system of Python facilitates the developers in writing Python programs, it also brings type errors at run-time. There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules and suffer from low template coverage of real-world type errors. Learni… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: This paper has been accepted by ICSE'24

  49. arXiv:2305.19926  [pdf, other

    cs.CL

    Revisiting the Reliability of Psychological Scales on Large Language Models

    Authors: Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao, Michael R. Lyu

    Abstract: Recent research has extended beyond assessing the performance of Large Language Models (LLMs) to examining their characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics. The administration of personality tests to LLMs has emerged as a noteworthy area in this context. However, the suitability of employing psychological scales, i… ▽ More

    Submitted 28 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 10 pages. Added more comprehensive experiments and analysis

  50. What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?

    Authors: Shuzheng Gao, Xin-Cheng Wen, Cuiyun Gao, Wenxuan Wang, Hongyu Zhang, Michael R. Lyu

    Abstract: Pre-trained models of source code have gained widespread popularity in many code intelligence tasks. Recently, with the scaling of the model and corpus size, large language models have shown the ability of in-context learning (ICL). ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. This new learning… ▽ More

    Submitted 8 August, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

    Comments: This paper is accepted by ASE 2023