Skip to main content

Showing 1–25 of 25 results for author: Pyatkin, V

  1. arXiv:2406.09279  [pdf, other

    cs.CL

    Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

    Authors: Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

    Abstract: Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  3. arXiv:2405.20967  [pdf, other

    cs.CL

    Superlatives in Context: Explicit and Implicit Domain Restrictions for Superlative Frames

    Authors: Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty

    Abstract: Superlatives are used to single out elements with a maximal/minimal property. Semantically, superlatives perform a set comparison: something (or some things) has the min/max property out of a set. As such, superlatives provide an ideal phenomenon for studying implicit phenomena and discourse restrictions. While this comparison set is often not explicitly defined, its (implicit) restrictions can be… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 11 pages

  4. arXiv:2403.13787  [pdf, other

    cs.LG

    RewardBench: Evaluating Reward Models for Language Modeling

    Authors: Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training a… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 44 pages, 19 figures, 12 tables

  5. arXiv:2402.16786  [pdf, other

    cs.CL cs.AI

    Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

    Authors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

    Abstract: Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificial… ▽ More

    Submitted 5 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (Main Conference)

  6. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  7. arXiv:2401.06877  [pdf, other

    cs.CL

    Promptly Predicting Structures: The Return of Inference

    Authors: Maitrey Mehta, Valentina Pyatkin, Vivek Srikumar

    Abstract: Prompt-based methods have been used extensively across NLP to build zero- and few-shot label predictors. Many NLP tasks are naturally structured: that is, their outputs consist of multiple labels which constrain each other. Annotating data for such tasks can be cumbersome. Can the promise of the prompt-based paradigm be extended to such structured outputs? In this paper, we present a framework for… ▽ More

    Submitted 29 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: 19 pages, 13 figures Accepted to NAACL'2024 (Main)

  8. arXiv:2311.10702  [pdf, other

    cs.CL

    Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

    Authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

    Abstract: Since the release of TÜLU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into TÜLU, resulting in TÜLU 2, a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user pr… ▽ More

    Submitted 19 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: technical report; fixed zephyr numbers

  9. arXiv:2310.17793  [pdf, other

    cs.CL cs.AI

    "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation

    Authors: Allyson Ettinger, Jena D. Hwang, Valentina Pyatkin, Chandra Bhagavatula, Yejin Choi

    Abstract: Large language models (LLMs) show amazing proficiency and fluency in the use of language. Does this mean that they have also acquired insightful linguistic knowledge about the language, to an extent that they can serve as an "expert linguistic annotator"? In this paper, we examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure, focus… ▽ More

    Submitted 11 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Findings (short)

  10. arXiv:2310.15431  [pdf, other

    cs.CL

    What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

    Authors: Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

    Abstract: Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral r… ▽ More

    Submitted 1 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Camera Ready EMNLP Findings 2023. First two authors contributed equally

  11. arXiv:2310.08559  [pdf, other

    cs.CL cs.AI

    Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

    Authors: Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren

    Abstract: The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reason… ▽ More

    Submitted 22 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  12. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

    Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi

    Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve A… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 38

    Journal ref: Vol. 38 No. 18: AAAI-24 Technical Tracks 18; 2024; 19937-19947

  13. arXiv:2305.19472  [pdf, other

    cs.CL cs.AI cs.LG

    PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

    Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi

    Abstract: Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using… ▽ More

    Submitted 26 July, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: cited new paper, 27 pages

  14. arXiv:2305.15605  [pdf, other

    cs.CL

    Revisiting Sentence Union Generation as a Testbed for Text Consolidation

    Authors: Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan

    Abstract: Tasks involving text generation based on multiple input texts, such as multi-document summarization, long-form question answering and contemporary dialogue applications, challenge models for their ability to properly consolidate partly-overlapping multi-text information. However, these tasks entangle the consolidation phase with the often subjective and ill-defined content selection requirement, i… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Findings of the Association for Computational Linguistics (ACL 2023)

  15. arXiv:2305.12517  [pdf, other

    cs.CL cs.IR cs.LG

    Description-Based Text Similarity

    Authors: Shauli Ravfogel, Valentina Pyatkin, Amir DN Cohen, Avshalom Manevich, Yoav Goldberg

    Abstract: Identifying texts with a given semantics is central for many information seeking scenarios. Similarity search over vector embeddings appear to be central to this ability, yet the similarity reflected in current text embeddings is corpus-driven, and is inconsistent and sub-optimal for many use cases. What, then, is a good notion of similarity for effective retrieval of text? We identify the need… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: A preprint

  16. arXiv:2304.00815  [pdf, other

    cs.CL

    Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

    Authors: Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

    Abstract: Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purp… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to TACL, pre-MIT Press publication version

  17. arXiv:2212.10409  [pdf, other

    cs.CL

    ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations

    Authors: Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi, Chandra Bhagavatula

    Abstract: Context is everything, even in commonsense moral reasoning. Changing contexts can flip the moral judgment of an action; "Lying to a friend" is wrong in general, but may be morally acceptable if it is intended to protect their life. We present ClarifyDelphi, an interactive system that learns to ask clarification questions (e.g., why did you lie to your friend?) in order to elicit additional salie… ▽ More

    Submitted 30 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to ACL 2023 main conference, 9 pages + bibliography + appendix

  18. arXiv:2210.16407  [pdf, other

    cs.CL

    Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

    Authors: Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark

    Abstract: Figurative language (e.g., "he flew like the wind") is challenging to understand, as it is hard to tell what implicit information is being conveyed from the surface form alone. We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language. We present DREAM-FLUTE, a figurative language understanding sys… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted at The Third Workshop on Figurative Language Processing @ EMNLP 2022

  19. arXiv:2205.11413  [pdf, other

    cs.CL

    QASem Parsing: Text-to-text Modeling of QA-based Semantics

    Authors: Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan

    Abstract: Several recent works have suggested to represent semantic relations with questions and answers, decomposing textual information into separate interrogative natural language statements. In this paper, we consider three QA-based semantic tasks - namely, QA-SRL, QANom and QADiscourse, each targeting a certain type of predication - and propose to regard them as jointly providing a comprehensive repres… ▽ More

    Submitted 14 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

  20. arXiv:2109.04832  [pdf, other

    cs.CL

    Asking It All: Generating Contextualized Questions for any Semantic Role

    Authors: Valentina Pyatkin, Paul Roit, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan

    Abstract: Asking questions about a situation is an inherent step towards understanding it. To this end, we introduce the task of role question generation, which, given a predicate mention and a passage, requires producing a set of questions asking about all possible semantic roles of the predicate. We develop a two-stage model for this task, which first produces a context-independent question prototype for… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: Accepted as a long paper to EMNLP 2021, Main Conference

  21. arXiv:2106.14321  [pdf, other

    cs.CL

    Draw Me a Flower: Processing and Grounding Abstraction in Natural Language

    Authors: Royi Lachmy, Valentina Pyatkin, Avshalom Manevich, Reut Tsarfaty

    Abstract: Abstraction is a core tenet of human cognition and communication. When composing natural language instructions, humans naturally evoke abstraction to convey complex procedures in an efficient and concise way. Yet, interpreting and grounding abstraction expressed in NL has not yet been systematically studied in NLP, with no accepted benchmarks specifically eliciting abstraction in NL. In this work,… ▽ More

    Submitted 30 September, 2022; v1 submitted 27 June, 2021; originally announced June 2021.

    Comments: Accepted to the TACL journal. This is a pre-MIT Press publication version

  22. arXiv:2106.08037  [pdf, other

    cs.CL

    The Possible, the Plausible, and the Desirable: Event-Based Modality Detection for Language Processing

    Authors: Valentina Pyatkin, Shoval Sadde, Aynat Rubinstein, Paul Portner, Reut Tsarfaty

    Abstract: Modality is the linguistic ability to describe events with added information such as how desirable, plausible, or feasible they are. Modality is important for many NLP downstream tasks such as the detection of hedging, uncertainty, speculation, and more. Previous studies that address modality detection in NLP often restrict modal expressions to a closed syntactic class, and the modal sense labels… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  23. arXiv:2010.02815  [pdf, other

    cs.CL

    QADiscourse -- Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

    Authors: Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan

    Abstract: Discourse relations describe how two propositions relate to one another, and identifying them automatically is an integral part of natural language understanding. However, annotating discourse relations typically requires expert annotators. Recently, different semantic aspects of a sentence have been represented and crowd-sourced via question-and-answer (QA) pairs. This paper proposes a novel repr… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear at EMNLP 2020

  24. arXiv:1707.06068  [pdf, other

    cs.DS cs.CV

    On Finding Maximum Cardinality Subset of Vectors with a Constraint on Normalized Squared Length of Vectors Sum

    Authors: Anton V. Eremeev, Alexander V. Kelmanov, Artem V. Pyatkin, Igor A. Ziegler

    Abstract: In this paper, we consider the problem of finding a maximum cardinality subset of vectors, given a constraint on the normalized squared length of vectors sum. This problem is closely related to Problem 1 from (Eremeev, Kel'manov, Pyatkin, 2016). The main difference consists in swapping the constraint with the optimization criterion. We prove that the problem is NP-hard even in terms of finding a… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: To appear in Proceedings of the 6th International Conference on Analysis of Images, Social Networks, and Texts (AIST'2017)

  25. arXiv:1603.01191  [pdf, other

    cs.DM cs.DS

    A fixed-parameter algorithm for a routing open shop problem: unit processing times, few machines and locations

    Authors: René van Bevern, Artem V. Pyatkin

    Abstract: The open shop problem is to find a minimum makespan schedule to process each job $J_i$ on each machine $M_q$ for $p_{iq}$ time such that, at any time, each machine processes at most one job and each job is processed by at most one machine. We study a problem variant in which the jobs are located in the vertices of an edge-weighted graph. The weights determine the time needed for the machines to tr… ▽ More

    Submitted 25 April, 2017; v1 submitted 3 March, 2016; originally announced March 2016.

    Comments: Compared to the previous version, gives a description of the algorithm in pseudocode, simplifies many proofs, corrects the incorrect Lemma 5.5 of the previous version

    MSC Class: 90B35 ACM Class: F.2.2; I.2.8; G.2.1; G.2.2; G.1.6