Skip to main content

Showing 1–27 of 27 results for author: Nie, P

  1. arXiv:2405.14619  [pdf, other

    cs.SE

    Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models

    Authors: Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

    Abstract: Many popular programming languages, including C#, Java, and Python, support exceptions. Exceptions are thrown during program execution if an unwanted event happens, e.g., a method is invoked with an illegal argument value. Software developers write exceptional behavior tests (EBTs) to check that their code detects unwanted events and throws appropriate exceptions. Prior research studies have shown… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2404.05904  [pdf, other

    cs.CL

    The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

    Authors: Giwon Hong, Aryo Pradipta Gema, Rohit Saxena, Xiaotang Du, Ping Nie, Yu Zhao, Laura Perez-Beltrachini, Max Ryabinin, Xuanli He, Clémentine Fourrier, Pasquale Minervini

    Abstract: Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text. However, these models are prone to ``hallucinations'' -- outputs that do not align with factual reality or the input context. This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and com… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  4. arXiv:2402.15627  [pdf, other

    cs.LG cs.DC

    MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

    Authors: Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao , et al. (7 additional authors not shown)

    Abstract: We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  5. arXiv:2312.11678  [pdf, other

    cs.HC

    Misinformation as a harm: structured approaches for fact-checking prioritization

    Authors: Connie Moon Sehat, Ryan Li, Peipei Nie, Tarunima Prabhakar, Amy X. Zhang

    Abstract: In this work, we examine how fact-checkers prioritize which claims to fact-check and what tools may assist them in their efforts. Through a series of interviews with 23 professional fact-checkers from around the world, we validate that harm assessment is a central component of how fact-checkers triage their work. We also clarify the processes behind fact-checking prioritization, finding that they… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to CSCW 2024, with clean up for typos and figures

  6. arXiv:2307.14991  [pdf, other

    cs.SE cs.AI

    Multilingual Code Co-Evolution Using Large Language Models

    Authors: Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

    Abstract: Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers)… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: FSE 2023 (camera ready)

  7. arXiv:2305.13486  [pdf, other

    cs.SE

    pytest-inline: An Inline Testing Tool for Python

    Authors: Yu Liu, Zachary Thurston, Alan Han, Pengyu Nie, Milos Gligoric, Owolabi Legunsen

    Abstract: We present pytest-inline, the first inline testing framework for Python. We recently proposed inline tests to make it easier to test individual program statements. But, there is no framework-level support for developers to write inline tests in Python. To fill this gap, we design and implement pytest-inline as a plugin for pytest, the most popular Python testing framework. Using pytest-inline, a d… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted as a tool demo paper at ICSE DEMO 2023

  8. arXiv:2302.10166  [pdf, other

    cs.SE cs.CL cs.LG

    Learning Deep Semantics for Test Completion

    Authors: Pengyu Nie, Rahul Banerjee, Junyi Jessy Li, Raymond J. Mooney, Milos Gligoric

    Abstract: Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TeCo --… ▽ More

    Submitted 7 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted as a conference paper in ICSE 2023

  9. arXiv:2210.16637  [pdf, other

    cs.CL cs.LG

    Beyond Prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations

    Authors: Yu Fei, Ping Nie, Zhao Meng, Roger Wattenhofer, Mrinmaya Sachan

    Abstract: Recent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically,… ▽ More

    Submitted 23 November, 2022; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  10. arXiv:2209.06315  [pdf, other

    cs.SE

    Inline Tests

    Authors: Yu Liu, Pengyu Nie, Owolabi Legunsen, Milos Gligoric

    Abstract: Unit tests are widely used to check source code quality, but they can be too coarse-grained or ill-suited for testing individual program statements. We introduce inline tests to make it easier to check for faults in statements. We motivate inline tests through several language features and a common testing scenario in which inline tests could be beneficial. For example, inline tests can allow a de… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted as a conference paper in ASE 2022

  11. arXiv:2209.04725  [pdf, other

    cs.CV cs.CL

    Anticipating the Unseen Discrepancy for Vision and Language Navigation

    Authors: Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang

    Abstract: Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target. The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well. Previous studies propose data augmentation methods to mitigate the data bias explicitly or implicitly and provide improvements in generalization. However, they try to memor… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

  12. arXiv:2208.05446  [pdf, ps, other

    cs.SE cs.LG

    CoditT5: Pretraining for Source Code and Natural Language Editing

    Authors: Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

    Abstract: Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on l… ▽ More

    Submitted 14 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: ASE 2022 (camera ready)

  13. arXiv:2206.03450  [pdf, other

    cs.HC cs.CY

    A Trade-off-centered Framework of Content Moderation

    Authors: Jialun Aaron Jiang, Peipei Nie, Jed R. Brubaker, Casey Fiesler

    Abstract: Content moderation research typically prioritizes representing and addressing challenges for one group of stakeholders or communities in one type of context. While taking a focused approach is reasonable or even favorable for empirical case studies, it does not address how content moderation works in multiple contexts. Through a systematic literature review of 86 content moderation papers that doc… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: To appear in ACM TOCHI

    ACM Class: J.4; K.4.2

  14. arXiv:2206.03273  [pdf, other

    cs.CY

    City-scale synthetic individual-level vehicle trip data

    Authors: Guilong Li, Yixian Chen, Yimin Wang, Zhi Yu, Peilin Nie, Zhaocheng He

    Abstract: Trip data that records each vehicle's trip activity on the road network describes the operation of urban traffic from the individual perspective, and it is extremely valuable for transportation research. However, restricted by data privacy, the trip data of individual-level cannot be opened for all researchers, while the need for it is very urgent. In this paper, we produce a city-scale synthetic… ▽ More

    Submitted 1 February, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  15. arXiv:2202.10695  [pdf, other

    cs.AI

    A Framework for Multi-stage Bonus Allocation in meal delivery Platform

    Authors: Zhuolin Wu, Li Wang, Fangsheng Huang, Linjun Zhou, Yu Song, Chengpeng Ye, Pengyu Nie, Hao Ren, Jinghua Hao, Renqing He, Zhizhao Sun

    Abstract: Online meal delivery is undergoing explosive growth, as this service is becoming increasingly popular. A meal delivery platform aims to provide excellent and stable services for customers and restaurants. However, in reality, several hundred thousand orders are canceled per day in the Meituan meal delivery platform since they are not accepted by the crowd soucing drivers. The cancellation of the o… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 9 pages; submit to KDD 2022

  16. arXiv:2110.11570  [pdf, other

    cs.IR

    MIC: Model-agnostic Integrated Cross-channel Recommenders

    Authors: Yujie Lu, Ping Nie, Shengyu Zhang, Ming Zhao, Ruobing Xie, William Yang Wang, Yi Ren

    Abstract: Semantically connecting users and items is a fundamental problem for the matching stage of an industrial recommender system. Recent advances in this topic are based on multi-channel retrieval to efficiently measure users' interest on items from the massive candidate pool. However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Emb… ▽ More

    Submitted 13 February, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: 10 pages, 4 figures

  17. arXiv:2108.09619  [pdf, other

    cs.SE cs.LG

    Impact of Evaluation Methodologies on Code Summarization

    Authors: Pengyu Nie, Jiyang Zhang, Junyi Jessy Li, Raymond J. Mooney, Milos Gligoric

    Abstract: There has been a growing interest in developing machine learning (ML) models for code summarization tasks, e.g., comment generation and method naming. Despite substantial increase in the effectiveness of ML models, the evaluation methodologies, i.e., the way people split datasets into training, validation, and test sets, were not well studied. Specifically, no prior work on code summarization cons… ▽ More

    Submitted 5 April, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

    Comments: Accepted as a conference paper in ACL 2022

  18. arXiv:2103.13426  [pdf, other

    cs.CL cs.LG cs.SE

    Learning to Generate Code Comments from Class Hierarchies

    Authors: Jiyang Zhang, Sheena Panthaplackel, Pengyu Nie, Raymond J. Mooney, Junyi Jessy Li, Milos Gligoric

    Abstract: Descriptive code comments are essential for supporting code comprehension and maintenance. We propose the task of automatically generating comments for overriding methods. We formulate a novel framework which accommodates the unique contextual and linguistic reasoning that is required for performing this task. Our approach features: (1) incorporating context from the class hierarchy; (2) condition… ▽ More

    Submitted 17 April, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

  19. arXiv:2103.01346  [pdf, other

    cs.PL cs.CL cs.SE

    Roosterize: Suggesting Lemma Names for Coq Verification Projects Using Deep Learning

    Authors: Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric

    Abstract: Naming conventions are an important concern in large verification projects using proof assistants, such as Coq. In particular, lemma names are used by proof engineers to effectively understand and modify Coq code. However, providing accurate and informative lemma names is a complex task, which is currently often carried out manually. Even when lemma naming is automated using rule-based tools, gene… ▽ More

    Submitted 3 May, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in International Conference on Software Engineering, Demonstrations Track (ICSE-DEMO 2021)

  20. arXiv:2009.07465  [pdf, other

    cs.CL

    Answering Any-hop Open-domain Questions with Iterative Document Reranking

    Authors: Ping Nie, Yuyu Zhang, Arun Ramamurthy, Le Song

    Abstract: Existing approaches for open-domain question answering (QA) are typically designed for questions that require either single-hop or multi-hop reasoning, which make strong assumptions of the complexity of questions to be answered. Also, multi-step document retrieval often incurs higher number of relevant but non-supporting documents, which dampens the downstream noise-sensitive reader module for ans… ▽ More

    Submitted 24 May, 2021; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: Accepted by SIGIR 2021

  21. arXiv:2006.16743  [pdf, ps, other

    cs.HC cs.CL cs.PL cs.SE

    Learning to Format Coq Code Using Language Models

    Authors: Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric

    Abstract: Should the final right bracket in a record declaration be on a separate line? Should arguments to the rewrite tactic be separated by a single space? Coq code tends to be written in distinct manners by different people and teams. The expressiveness, flexibility, and extensibility of Coq's languages and notations means that Coq projects have a wide variety of recognizable coding styles, sometimes ex… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted in the Coq Workshop 2020

  22. A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

    Authors: Jie Cai, Zhengzhou Zhu, Ping Nie, Qian Liu

    Abstract: Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more comp… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: e.g.: 4 pages, 1 figure

  23. arXiv:2004.12169  [pdf, other

    cs.CL cs.LG cs.SE

    Learning to Update Natural Language Comments Based on Code Changes

    Authors: Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, Raymond J. Mooney

    Abstract: We formulate the novel task of automatically updating an existing natural language comment based on changes in the body of code it accompanies. We propose an approach that learns to correlate changes across two distinct language representations, to generate a sequence of edits that are applied to the existing comment to reflect the source code modifications. We train and evaluate our model using a… ▽ More

    Submitted 27 April, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: Accepted in Association for Computational Linguistics (ACL) 2020

  24. arXiv:2004.07761  [pdf, other

    cs.PL cs.CL cs.SE

    Deep Generation of Coq Lemma Names Using Elaborated Terms

    Authors: Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric

    Abstract: Coding conventions for naming, spacing, and other essentially stylistic properties are necessary for developers to effectively understand, review, and modify source code in large software projects. Consistent conventions in verification projects based on proof assistants, such as Coq, increase in importance as projects grow in size and scope. While conventions can be documented and enforced manual… ▽ More

    Submitted 22 April, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted in International Joint Conference on Automated Reasoning (IJCAR 2020). With Appendix

  25. arXiv:2002.12591  [pdf, other

    cs.CL

    DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding

    Authors: Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang

    Abstract: Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

  26. arXiv:1901.10125  [pdf, other

    cs.CL cs.AI cs.CV

    Glyce: Glyph-vectors for Chinese Character Representations

    Authors: Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li

    Abstract: It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this… ▽ More

    Submitted 21 May, 2020; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted by NeurIPS 2019

  27. arXiv:1808.01729  [pdf, other

    cs.SE

    Executable Trigger-Action Comments

    Authors: Pengyu Nie, Rishabh Rai, Junyi Jessy Li, Sarfraz Khurshid, Raymond J. Mooney, Milos Gligoric

    Abstract: Natural language elements, e.g., todo comments, are frequently used to communicate among the developers and to describe tasks that need to be performed (actions) when specific conditions hold in the code repository (triggers). As projects evolve, development processes change, and development teams reorganize, these comments, because of their informal nature, frequently become irrelevant or forgott… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.