Skip to main content

Showing 1–50 of 57 results for author: Choshen, L

  1. arXiv:2407.10944  [pdf, other

    cs.CL

    Learning from Naturally Occurring Feedback

    Authors: Shachar Don-Yehiya, Leshem Choshen, Omri Abend

    Abstract: Human feedback data is a critical component in developing language models. However, collecting this feedback is costly and ultimately not scalable. We propose a scalable method for extracting feedback that users naturally include when interacting with chat models, and leveraging it for model training. We are further motivated by previous work that showed there are also qualitative advantages to us… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.00066  [pdf, other

    cs.DC cs.AI cs.CL cs.LG

    Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

    Authors: Rickard Brüel-Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon

    Abstract: Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and of… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  3. arXiv:2405.17202  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Efficient multi-prompt evaluation of LLMs

    Authors: Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin

    Abstract: Most popular benchmarks for comparing LLMs rely on a limited set of prompt templates, which may not fully capture the LLMs' abilities and can affect the reproducibility of results on leaderboards. Many recent works empirically verify prompt sensitivity and advocate for changes in LLM evaluation. In this paper, we consider the problem of estimating the performance distribution across many prompt va… ▽ More

    Submitted 7 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.09605  [pdf, other

    cs.CL cs.AI cs.LG

    Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models

    Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

    Abstract: The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill-defined. We present Elements of World Knowledge (EWOK), a framework for evaluating world modeling in language models by testing their ability to use knowledge of a concept to match a target text with a plausible/i… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 21 pages (11 main), 7 figures. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally

  5. arXiv:2404.18923  [pdf, other

    cs.CL

    Holmes: Benchmark the Linguistic Competence of Language Models

    Authors: Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

    Abstract: We introduce Holmes, a benchmark to assess the linguistic competence of language models (LMs) - their ability to grasp linguistic phenomena. Unlike prior prompting-based evaluations, Holmes assesses the linguistic competence of LMs via their internal representations using classifier-based probing. In doing so, we disentangle specific phenomena (e.g., part-of-speech of words) from other cognitive a… ▽ More

    Submitted 22 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.15198  [pdf, other

    cs.LG cs.IT

    Lossless and Near-Lossless Compression for Foundation Models

    Authors: Moshik Hershcovitch, Leshem Choshen, Andrew Wood, Ilias Enmouri, Peter Chin, Swaminathan Sundararaman, Danny Harnik

    Abstract: With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression -- one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to i… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  7. arXiv:2404.06214  [pdf, other

    cs.CL

    [Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

    Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

    Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  8. arXiv:2404.00459  [pdf, other

    cs.CL

    NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

    Authors: Eli Schwartz, Leshem Choshen, Joseph Shtok, Sivan Doveh, Leonid Karlinsky, Assaf Arbelle

    Abstract: Language models struggle with handling numerical data and performing arithmetic operations. We hypothesize that this limitation can be partially attributed to non-intuitive textual numbers representation. When a digit is read or generated by a causal language model it does not know its place value (e.g. thousands vs. hundreds) until the entire number is processed. To address this issue, we propose… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  9. arXiv:2402.16842  [pdf, other

    cs.LG

    Asymmetry in Low-Rank Adapters of Foundation Models

    Authors: Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon

    Abstract: Parameter-efficient fine-tuning optimizes large, pre-trained foundation models by updating a subset of parameters; in this class, Low-Rank Adaptation (LoRA) is particularly effective. Inspired by an effort to investigate the different roles of LoRA matrices during fine-tuning, this paper characterizes and leverages unexpected asymmetry in the importance of low-rank adapter matrices. Specifically,… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages, 2 figures, 9 tables

  10. arXiv:2402.14992  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    tinyBenchmarks: evaluating LLMs with fewer examples

    Authors: Felipe Maia Polo, Lucas Weber, Leshem Choshen, Yuekai Sun, Gongjun Xu, Mikhail Yurochkin

    Abstract: The versatility of large language models (LLMs) led to the creation of diverse benchmarks that thoroughly test a variety of language models' abilities. These benchmarks consist of tens of thousands of examples making evaluation of LLMs very expensive. In this paper, we investigate strategies to reduce the number of evaluations needed to assess the performance of an LLM on several key benchmarks. F… ▽ More

    Submitted 26 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML)

  11. arXiv:2402.07891  [pdf, other

    cs.CL cs.LG

    Label-Efficient Model Selection for Text Generation

    Authors: Shir Ashury-Tahan, Ariel Gera, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch

    Abstract: Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models. We introduce DiffUse, an efficient method to make an informed decision between candidate text generation models based on preference annotations. DiffUse reduces the required amount of annotations, thus saving valuable time and resources in performing evaluatio… ▽ More

    Submitted 6 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL (main conference)

  12. arXiv:2401.14367  [pdf, other

    cs.CL cs.AI cs.LG

    Genie: Achieving Human Parity in Content-Grounded Datasets Generation

    Authors: Asaf Yehudai, Boaz Carmeli, Yosi Mass, Ofir Arviv, Nathaniel Mills, Assaf Toledo, Eyal Shnarch, Leshem Choshen

    Abstract: The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR24

  13. arXiv:2401.14019  [pdf, other

    cs.CL cs.AI

    Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

    Authors: Elron Bandel, Yotam Perlitz, Elad Venezian, Roni Friedman-Melamed, Ofir Arviv, Matan Orbach, Shachar Don-Yehyia, Dafna Sheinwald, Ariel Gera, Leshem Choshen, Michal Shmueli-Scheuer, Yoav Katz

    Abstract: In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations. The escalating complexity, involving system prompts, model-specific formats, instructions, and more, calls for a shift to a structured, modular, and customizable solution. Addressing this need, we p… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Submitted to NAACL demo track

  14. arXiv:2401.08574  [pdf, other

    cs.CL

    Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability

    Authors: Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas

    Abstract: While language models (LMs) can sometimes generate factually correct text and estimate truth values of individual claims, these generally do not reflect a globally coherent, manipulable model of the world. As a consequence, current LMs also generate incorrect or nonsensical content, and are difficult to edit and bring up to date. We present a method called Deductive Closure Training (DCT) that use… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACL Findings

  15. arXiv:2311.13171  [pdf, other

    cs.LG cs.AI cs.CL

    ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

    Authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal

    Abstract: Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of exper… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 25 Pages, 6 Figures, 16 Tables

  16. arXiv:2311.12131  [pdf, other

    cs.CL

    Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney

    Authors: Shachar Don-Yehiya, Leshem Choshen, Omri Abend

    Abstract: Generating images with a Text-to-Image model often requires multiple trials, where human users iteratively update their prompt based on feedback, namely the output image. Taking inspiration from cognitive work on reference games and dialogue alignment, this paper analyzes the dynamics of the user prompts along such iterations. We compile a dataset of iterative interactions of human users with Midj… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: EMNLP23

  17. arXiv:2311.07682  [pdf, other

    cs.CL cs.AI cs.LG

    Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

    Authors: Kerem Zaman, Leshem Choshen, Shashank Srivastava

    Abstract: Model fusion research aims to aggregate the knowledge of multiple models to enhance performance by combining their weights. In this work, we study the inverse, investigating whether and how can model fusion interfere and reduce unwanted knowledge. We delve into the effects of model fusion on the evolution of learned shortcuts, social biases, and memorization capabilities in fine-tuned language mod… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 16 pages, 9 figures, 6 tables

  18. arXiv:2308.11696  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Efficient Benchmarking of Language Models

    Authors: Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen

    Abstract: The increasing versatility of language models (LMs) has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs, extending to thousands of GPU hours per model. However, the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work, we present t… ▽ More

    Submitted 1 April, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted to NAACL main track

  19. arXiv:2306.01708  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    TIES-Merging: Resolving Interference When Merging Models

    Authors: Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal

    Abstract: Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model me… ▽ More

    Submitted 26 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023, 23 Pages, 13 Figures, 14 Tables

  20. arXiv:2305.14991  [pdf, other

    cs.CL cs.AI

    MuLER: Detailed and Scalable Reference-based Evaluation

    Authors: Taelin Karidi, Leshem Choshen, Gal Patel, Omri Abend

    Abstract: We propose a novel methodology (namely, MuLER) that transforms any reference-based evaluation metric for text generation, such as machine translation (MT) into a fine-grained analysis tool. Given a system and a metric, MuLER quantifies how much the chosen metric penalizes specific error types (e.g., errors in translating names of locations). MuLER thus enables a detailed error analysis which can l… ▽ More

    Submitted 29 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  21. arXiv:2303.09435  [pdf, other

    cs.CL

    Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

    Authors: Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva

    Abstract: Transformer-based language models create hidden representations of their inputs at every layer, but only use final-layer representations for prediction. This obscures the internal decision-making process of the model and the utility of its intermediate representations. One way to elucidate this is to cast the hidden representations as final representations, bypassing the transformer computation in… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

    Journal ref: LREC-COLING 2024

  22. arXiv:2302.04863  [pdf, other

    cs.LG cs.AI cs.CL

    Knowledge is a Region in Weight Space for Fine-tuned Language Models

    Authors: Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen

    Abstract: Research on neural networks has focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, particularly those trained or tested on different datasets. We address this by studying how the weight space and the underlying loss landscape of different models are interconnected. Specifically, we demonstrate… ▽ More

    Submitted 12 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  23. arXiv:2301.11796  [pdf, other

    cs.CL

    Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

    Authors: Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

    Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  24. arXiv:2212.01378  [pdf, other

    cs.LG cs.CL cs.DC

    ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning

    Authors: Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen

    Abstract: We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion. It provides the benefits of multitask learning but leverages distributed computation with limited communication and eliminates the need for shared data. Consequentially, ColD Fusion can give rise to a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are b… ▽ More

    Submitted 13 September, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: ACL 23

  25. arXiv:2211.05655  [pdf, other

    cs.CL cs.AI cs.LG

    DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering

    Authors: Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, Omri Abend

    Abstract: Question answering models commonly have access to two sources of "knowledge" during inference time: (1) parametric knowledge - the factual knowledge encoded in the model weights, and (2) contextual knowledge - external knowledge (e.g., a Wikipedia passage) given to the model to generate a grounded answer. Having these two sources of knowledge entangled together is a core issue for generative QA mo… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 12 pages, 2 figures

  26. arXiv:2211.00107  [pdf, other

    cs.CL cs.AI cs.LG

    Where to start? Analyzing the potential value of intermediate models

    Authors: Leshem Choshen, Elad Venezian, Shachar Don-Yehia, Noam Slonim, Yoav Katz

    Abstract: Previous studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this intertraining scheme, over a wide range of English classification tasks. Surprisingly, our analysis su… ▽ More

    Submitted 10 November, 2022; v1 submitted 31 October, 2022; originally announced November 2022.

    Comments: https://ibm.github.io/model-recycling/

  27. arXiv:2210.03053  [pdf, other

    cs.CL cs.AI cs.LG

    Reinforcement Learning with Large Action Spaces for Neural Machine Translation

    Authors: Asaf Yehudai, Leshem Choshen, Lior Fox, Omri Abend

    Abstract: Applying Reinforcement learning (RL) following maximum likelihood estimation (MLE) pre-training is a versatile method for enhancing neural machine translation (NMT) performance. However, recent work has argued that the gains produced by RL for NMT are mostly due to promoting tokens that have already received a fairly high probability in pre-training. We hypothesize that the large action space is a… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Accepted for Coling

  28. arXiv:2208.01483  [pdf, other

    cs.CL cs.HC

    Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

    Authors: Eyal Shnarch, Alon Halfon, Ariel Gera, Marina Danilevsky, Yannis Katsis, Leshem Choshen, Martin Santillan Cooper, Dina Epelboim, Zheng Zhang, Dakuo Wang, Lucy Yip, Liat Ein-Dor, Lena Dankin, Ilya Shnayderman, Ranit Aharonov, Yunyao Li, Naftali Liberman, Philip Levin Slesarev, Gwilym Newton, Shila Ofek-Koifman, Noam Slonim, Yoav Katz

    Abstract: Text classification can be useful in many real-world scenarios, saving a lot of time for end users. However, building a custom classifier typically requires coding skills and ML knowledge, which poses a significant barrier for many potential users. To lift this barrier, we introduce Label Sleuth, a free open source system for labeling and creating text classifiers. This system is unique for (a) be… ▽ More

    Submitted 31 October, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: 7 pages, 2 figures To be published at EMNLP 2022

  29. arXiv:2205.09178  [pdf, other

    cs.CL cs.LG

    PreQuEL: Quality Estimation of Machine Translation Outputs in Advance

    Authors: Shachar Don-Yehiya, Leshem Choshen, Omri Abend

    Abstract: We present the task of PreQuEL, Pre-(Quality-Estimation) Learning. A PreQuEL system predicts how well a given sentence will be translated, without recourse to the actual translation, thus eschewing unnecessary resource allocation when translation quality is bound to be low. PreQuEL can be defined relative to a given MT system (e.g., some industry service) or generally relative to the state-of-the-… ▽ More

    Submitted 4 December, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Accepted to the main conference of EMNLP 2022

  30. arXiv:2205.05730  [pdf, other

    cs.CL cs.AI cs.CY

    Some Grammatical Errors are Frequent, Others are Important

    Authors: Leshem Choshen, Ofir Shifman, Omri Abend

    Abstract: In Grammatical Error Correction, systems are evaluated by the number of errors they correct. However, no one has assessed whether all error types are equally important. We provide and apply a method to quantify the importance of different grammatical error types to humans. We show that some rare errors are considered disturbing while other common ones are not. This affects possible directions to i… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  31. arXiv:2204.03044  [pdf, other

    cs.CL cs.CV cs.LG

    Fusing finetuned models for better pretraining

    Authors: Leshem Choshen, Elad Venezian, Noam Slonim, Yoav Katz

    Abstract: Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour that few can undertake. In this paper, we create better base models at hardly any cost, by fusing multiple existing fine tuned models into one. Specifically, we fuse by averaging the weights of these models. We show… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  32. arXiv:2203.10581  [pdf, other

    cs.CL cs.LG

    Cluster & Tune: Boost Cold Start Performance in Text Classification

    Authors: Eyal Shnarch, Ariel Gera, Alon Halfon, Lena Dankin, Leshem Choshen, Ranit Aharonov, Noam Slonim

    Abstract: In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone to produce poor performance. We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task, between the… ▽ More

    Submitted 20 March, 2022; originally announced March 2022.

    Comments: 9 pages, 6 figures; To be published in ACL 2022

  33. Semantics-aware Attention Improves Neural Machine Translation

    Authors: Aviv Slobodkin, Leshem Choshen, Omri Abend

    Abstract: The integration of syntactic structures into Transformer machine translation has shown positive results, but to our knowledge, no work has attempted to do so with semantic structures. In this work we propose two novel parameter-free methods for injecting semantic information into Transformers, both rely on semantics-aware masking of (some of) the attention heads. One such method operates on the en… ▽ More

    Submitted 24 May, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to *SEM 2022

  34. arXiv:2110.03067  [pdf, other

    cs.CL

    On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

    Authors: Gal Patel, Leshem Choshen, Omri Abend

    Abstract: We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems. We demonstrate our model-agnostic approach with the Transformer English-German translation model. We analyze neuron-level correlation of activations between paraphrases while discussing the methodology challenges and the need for confound analysis to isolate the effe… ▽ More

    Submitted 2 November, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

  35. arXiv:2109.06096  [pdf, other

    cs.CL cs.AI cs.LG

    The Grammar-Learning Trajectories of Neural Language Models

    Authors: Leshem Choshen, Guy Hacohen, Daphna Weinshall, Omri Abend

    Abstract: The learning trajectories of linguistic phenomena in humans provide insight into linguistic representation, beyond what can be gleaned from inspecting the behavior of an adult speaker. To apply a similar approach to analyze neural language models (NLM), it is first necessary to establish that different models are similar enough in the generalizations they make. In this paper, we show that NLMs wit… ▽ More

    Submitted 6 April, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: ACL camera-ready

  36. arXiv:2108.10763  [pdf, other

    cs.CL cs.LG cs.SE

    ComSum: Commit Messages Summarization and Meaning Preservation

    Authors: Leshem Choshen, Idan Amit

    Abstract: We present ComSum, a data set of 7 million commit messages for text summarization. When documenting commits, software code changes, both a message and its summary are posted. We gather and filter those to curate developers' work summarization data set. Along with its growing size, practicality and challenging language domain, the data set benefits from the living field of empirical software engine… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  37. arXiv:2106.00745  [pdf

    cs.CL cs.AI cs.LG

    Part of Speech and Universal Dependency effects on English Arabic Machine Translation

    Authors: Ofek Rafaeli, Omri Abend, Leshem Choshen, Dmitry Nikolaev

    Abstract: In this research paper, I will elaborate on a method to evaluate machine translation models based on their performance on underlying syntactical phenomena between English and Arabic languages. This method is especially important as such "neural" and "machine learning" are hard to fine-tune and change. Thus, finding a way to evaluate them easily and diversely would greatly help the task of betterin… ▽ More

    Submitted 3 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: 19 pages

  38. arXiv:2104.08202  [pdf, other

    cs.CL

    $Q^{2}$: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering

    Authors: Or Honovich, Leshem Choshen, Roee Aharoni, Ella Neeman, Idan Szpektor, Omri Abend

    Abstract: Neural knowledge-grounded generative models for dialogue often produce content that is factually inconsistent with the knowledge they rely on, making them unreliable and limiting their applicability. Inspired by recent work on evaluating factual consistency in abstractive summarization, we propose an automatic evaluation metric for factual consistency in knowledge-grounded dialogue using automatic… ▽ More

    Submitted 9 September, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021

  39. Mediators in Determining what Processing BERT Performs First

    Authors: Aviv Slobodkin, Leshem Choshen, Omri Abend

    Abstract: Probing neural models for the ability to perform downstream tasks using their activation patterns is often used to localize what parts of the network specialize in performing what tasks. However, little work addressed potential mediating factors in such comparisons. As a test-case mediating factor, we consider the prediction's context length, namely the length of the span whose processing is minim… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: Accepted to NAACL 2021

  40. arXiv:2104.03958  [pdf, other

    cs.CL cs.AI cs.LG

    GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

    Authors: Piyawat Lertvittayakumjorn, Leshem Choshen, Eyal Shnarch, Francesca Toni

    Abstract: Data exploration is an important step of every data science and machine learning project, including those involving textual data. We provide a novel language tool, in the form of a publicly available Python library for extracting patterns from textual data. The library integrates a first public implementation of the existing GrASP algorithm. It allows users to extract patterns using a number of ge… ▽ More

    Submitted 16 June, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: Proceedings of Language Resources and Evaluation (LREC), Marseille, France pp 6093-6103 (2022)

  41. arXiv:2104.02310  [pdf, ps, other

    cs.CL

    SERRANT: a syntactic classifier for English Grammatical Error Types

    Authors: Leshem Choshen, Matanel Oren, Dmitry Nikolaev, Omri Abend

    Abstract: SERRANT is a system and code for automatic classification of English grammatical errors that combines SErCl and ERRANT. SERRANT uses ERRANT's annotations when they are informative and those provided by SErCl otherwise.

    Submitted 7 April, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Code library in: https://github.com/matanel-oren/serrant

  42. arXiv:2101.12640  [pdf, other

    cs.CL cs.LG

    Enhancing the Transformer Decoder with Transition-based Syntax

    Authors: Leshem Choshen, Omri Abend

    Abstract: Notwithstanding recent advances, syntactic generalization remains a challenge for text decoders. While some studies showed gains from incorporating source-side symbolic syntactic and semantic structure into text generation Transformers, very little work addressed the decoding of such structure. We propose a general approach for tree decoding using a transition-based approach. Examining the challen… ▽ More

    Submitted 31 October, 2022; v1 submitted 29 January, 2021; originally announced January 2021.

    Comments: Accepted to CoNLL

  43. arXiv:2010.11032  [pdf, other

    cs.CL

    Classifying Syntactic Errors in Learner Language

    Authors: Leshem Choshen, Dmitry Nikolaev, Yevgeni Berzak, Omri Abend

    Abstract: We present a method for classifying syntactic errors in learner language, namely errors whose correction alters the morphosyntactic structure of a sentence. The methodology builds on the established Universal Dependencies syntactic representation scheme, and provides complementary information to other error-classification systems. Unlike existing error classification methods, our method is app… ▽ More

    Submitted 27 October, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: CoNLL 2020

  44. arXiv:2010.09459  [pdf, other

    cs.CL cs.HC cs.LG

    Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains

    Authors: Eyal Shnarch, Leshem Choshen, Guy Moshkowich, Noam Slonim, Ranit Aharonov

    Abstract: Approaching new data can be quite deterrent; you do not know how your categories of interest are realized in it, commonly, there is no labeled data at hand, and the performance of domain adaptation methods is unsatisfactory. Aiming to assist domain experts in their first steps into a new task over a new corpus, we present an unsupervised approach to reveal complex rules which cluster the unexplo… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted to Findings of EMNLP

  45. arXiv:1911.10763  [pdf, other

    cs.CL cs.AI cs.IR

    Corpus Wide Argument Mining -- a Working Solution

    Authors: Liat Ein-Dor, Eyal Shnarch, Lena Dankin, Alon Halfon, Benjamin Sznajder, Ariel Gera, Carlos Alzate, Martin Gleize, Leshem Choshen, Yufang Hou, Yonatan Bilu, Ranit Aharonov, Noam Slonim

    Abstract: One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehen… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

    Journal ref: AAAI 2020

  46. arXiv:1909.06814  [pdf, other

    cs.CL cs.LG

    Automatically Extracting Challenge Sets for Non local Phenomena in Neural Machine Translation

    Authors: Leshem Choshen, Omri Abend

    Abstract: We show that the state of the art Transformer Machine Translation (MT) model is not biased towards monotonic reordering (unlike previous recurrent neural network models), but that nevertheless, long-distance dependencies remain a challenge for the model. Since most dependencies are short-distance, common evaluation metrics will be little influenced by how well systems perform on them. We, therefor… ▽ More

    Submitted 25 September, 2019; v1 submitted 15 September, 2019; originally announced September 2019.

    Comments: Accepted for CoNLL

  47. arXiv:1907.08971  [pdf, other

    cs.LG cs.CL stat.ML

    Are You Convinced? Choosing the More Convincing Evidence with a Siamese Network

    Authors: Martin Gleize, Eyal Shnarch, Leshem Choshen, Lena Dankin, Guy Moshkowich, Ranit Aharonov, Noam Slonim

    Abstract: With the advancement in argument detection, we suggest to pay more attention to the challenging task of identifying the more convincing arguments. Machines capable of responding and interacting with humans in helpful ways have become ubiquitous. We now expect them to discuss with us the more delicate questions in our world, and they should do so armed with effective arguments. But what makes an ar… ▽ More

    Submitted 23 July, 2019; v1 submitted 21 July, 2019; originally announced July 2019.

    Comments: accepted to ACL 2019 - long paper

  48. arXiv:1907.01752  [pdf, other

    cs.CL cs.AI cs.LG

    On the Weaknesses of Reinforcement Learning for Neural Machine Translation

    Authors: Leshem Choshen, Lior Fox, Zohar Aizenbud, Omri Abend

    Abstract: Reinforcement learning (RL) is frequently used to increase performance in text generation tasks, including machine translation (MT), notably through the use of Minimum Risk Training (MRT) and Generative Adversarial Networks (GAN). However, little is known about what and how these methods learn in the context of MT. We prove that one of the most common RL methods for MT does not optimize the expect… ▽ More

    Submitted 15 January, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: Accepted to ICLR 2020 (matching content, different style)

  49. arXiv:1906.03897  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Learning to combine Grammatical Error Corrections

    Authors: Yoav Kantor, Yoav Katz, Leshem Choshen, Edo Cohen-Karlik, Naftali Liberman, Assaf Toledo, Amir Menczel, Noam Slonim

    Abstract: The field of Grammatical Error Correction (GEC) has produced various systems to deal with focused phenomena or general text editing. We propose an automatic way to combine black-box systems. Our method automatically detects the strength of a system or the combination of several systems per error type, improving precision and recall while optimizing $F$ score directly. We show consistent improvemen… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: BEA 2019

  50. arXiv:1905.10854  [pdf, other

    cs.LG stat.ML

    Let's Agree to Agree: Neural Networks Share Classification Order on Real Datasets

    Authors: Guy Hacohen, Leshem Choshen, Daphna Weinshall

    Abstract: We report a series of robust empirical observations, demonstrating that deep Neural Networks learn the examples in both the training and test sets in a similar order. This phenomenon is observed in all the commonly used benchmarks we evaluated, including many image classification benchmarks, and one text classification benchmark. While this phenomenon is strongest for models of the same architectu… ▽ More

    Submitted 20 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Published at ICML 2020

    Journal ref: Proceedings: 37th International Conference on Machine Learning (ICML), Viena Austria, July 2020