Skip to main content

Showing 1–28 of 28 results for author: Ghazvininejad, M

  1. arXiv:2305.14771  [pdf, other

    cs.CL

    David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs

    Authors: Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad

    Abstract: Diffusion-based language models are emerging as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced controllability at inference time. While autoregressive LMs have benefited immensely from scaling and instruction-based learning, existing studies of diffusion LMs have been conducted on a smaller scale. Starting with a recently pr… ▽ More

    Submitted 14 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

  2. arXiv:2302.07856  [pdf, other

    cs.CL cs.LG

    Dictionary-based Phrase-level Prompting of Large Language Models for Machine Translation

    Authors: Marjan Ghazvininejad, Hila Gonen, Luke Zettlemoyer

    Abstract: Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting, even though they were not explicitly trained for this task. However, even given the incredible quantities of data they are trained on, LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios. We show that LLM prompting can provide an eff… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  3. arXiv:2302.02060  [pdf, other

    cs.CL cs.LG

    Representation Deficiency in Masked Language Modeling

    Authors: Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

    Abstract: Masked Language Modeling (MLM) has been one of the most prominent approaches for pretraining bidirectional text encoders due to its simplicity and effectiveness. One notable concern about MLM is that the special $\texttt{[MASK]}$ symbol causes a discrepancy between pretraining data and downstream data as it is present only in pretraining but not in fine-tuning. In this work, we offer a new perspec… ▽ More

    Submitted 16 March, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: ICLR 2024

  4. arXiv:2301.10472  [pdf, other

    cs.CL cs.LG

    XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

    Authors: Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

    Abstract: Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits the representational capabilities of multilingual models like XLM-R. In this paper, we introduce a new approach for scaling to very large multili… ▽ More

    Submitted 13 October, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: EMNLP 2023

  5. arXiv:2212.02437  [pdf, other

    cs.CL

    In-context Examples Selection for Machine Translation

    Authors: Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad

    Abstract: Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 14 pages; 4 figures; 16 tables

  6. arXiv:2204.11454  [pdf, other

    cs.CL cs.SE

    Natural Language to Code Translation with Execution

    Authors: Freda Shi, Daniel Fried, Marjan Ghazvininejad, Luke Zettlemoyer, Sida I. Wang

    Abstract: Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly incorporate program semantics (i.e., execution results) during training, they are able to generate correct solutions for many problems. However, choosing a sin… ▽ More

    Submitted 1 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022

  7. arXiv:2204.06031  [pdf, other

    cs.CL cs.AI

    A Review on Language Models as Knowledge Bases

    Authors: Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad

    Abstract: Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major ad… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Preprint

  8. arXiv:2112.05717  [pdf, other

    cs.CL cs.LG stat.ML

    Discourse-Aware Soft Prompting for Text Generation

    Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Vera Gor, Asli Celikyilmaz

    Abstract: Current efficient fine-tuning methods (e.g., adapters, prefix-tuning, etc.) have optimized conditional text generation via training a small set of extra parameters of the neural language model, while freezing the rest for efficiency. While showing strong performance on some generation tasks, they don't generalize across all generation tasks. We show that soft-prompt based conditional text generati… ▽ More

    Submitted 23 May, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

  9. arXiv:2111.06787  [pdf, other

    cs.CL

    BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation

    Authors: Eleftheria Briakou, Sida I. Wang, Luke Zettlemoyer, Marjan Ghazvininejad

    Abstract: Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation (NMT). While filtering such pairs out is known to improve final model quality, we argue that it is suboptimal in low-resource conditions where even mined data can be limited. In our work, we propose instead, to refine the mined bitexts via automatic editing: given a sentence in a… ▽ More

    Submitted 30 May, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

  10. arXiv:2109.04020  [pdf, other

    cs.CL cs.AI cs.LG

    Distributionally Robust Multilingual Machine Translation

    Authors: Chunting Zhou, Daniel Levy, Xian Li, Marjan Ghazvininejad, Graham Neubig

    Abstract: Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models. However, the heavy data imbalance between languages hinders the model from performing uniformly across language pairs. In this paper, we propose a new learning objective for MNMT based on distributional… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: Long paper accepted by EMNLP2021 main conference

  11. arXiv:2106.06823  [pdf, other

    cs.CL cs.AI

    Prompting Contrastive Explanations for Commonsense Reasoning Tasks

    Authors: Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Luke Zettlemoyer, Hannaneh Hajishirzi

    Abstract: Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit. Large pretrained language models (PLMs) can achieve near-human performance on such tasks, while providing little human-interpretable evidence of the underlying reasoning they use. In this work, we show how to use these same models to generate… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: ACL 2021 Findings

  12. arXiv:2105.06982  [pdf, other

    cs.CL

    EASE: Extractive-Abstractive Summarization with Explanations

    Authors: Haoran Li, Arash Einolghozati, Srinivasan Iyer, Bhargavi Paranjape, Yashar Mehdad, Sonal Gupta, Marjan Ghazvininejad

    Abstract: Current abstractive summarization systems outperform their extractive counterparts, but their widespread adoption is inhibited by the inherent lack of interpretability. To achieve the best of both worlds, we propose EASE, an extractive-abstractive framework for evidence-based text generation and apply it to document summarization. We present an explainable summarization system based on the Informa… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  13. arXiv:2104.04923  [pdf, other

    cs.CL cs.LG

    Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog

    Authors: Arun Babu, Akshat Shrivastava, Armen Aghajanyan, Ahmed Aly, Angela Fan, Marjan Ghazvininejad

    Abstract: Semantic parsing using sequence-to-sequence models allows parsing of deeper representations compared to traditional word tagging based models. In spite of these advantages, widespread adoption of these models for real-time conversational use cases has been stymied by higher compute requirements and thus higher latency. In this work, we propose a non-autoregressive approach to predict semantic pars… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

  14. arXiv:2011.02593  [pdf, other

    cs.CL cs.AI

    Detecting Hallucinated Content in Conditional Neural Sequence Generation

    Authors: Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad

    Abstract: Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input. These variety of fluent but wrong outputs are particularly problematic, as it will not be possible for users to tell they are being presented incorrect content. To detect these errors, we propose a task to predict whe… ▽ More

    Submitted 2 June, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: Accepted by ACL-Finding 2021

  15. arXiv:2010.12836  [pdf, other

    cs.CL

    Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

    Authors: Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq Joty, Dragomir Radev, Yashar Mehdad

    Abstract: Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a novel and generalizable method, called WikiTransfer, for fin… ▽ More

    Submitted 11 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

    Comments: NAACL 2021

  16. arXiv:2008.00623  [pdf, other

    cs.LG cs.CL

    DeLighT: Deep and Light-weight Transformer

    Authors: Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

    Abstract: We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using the DeLighT transformation, a deep and light-weight transformation, and (2) across blocks using block-wise scaling, which allows f… ▽ More

    Submitted 11 February, 2021; v1 submitted 2 August, 2020; originally announced August 2020.

    Comments: Accepted at ICLR 2021

  17. arXiv:2006.15020  [pdf, other

    cs.CL cs.LG stat.ML

    Pre-training via Paraphrasing

    Authors: Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

    Abstract: We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of genera… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  18. Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

    Authors: Asa Cooper Stickland, Xian Li, Marjan Ghazvininejad

    Abstract: There has been recent success in pre-training on monolingual data and fine-tuning on Machine Translation (MT), but it remains unclear how to best leverage a pre-trained model for a given MT task. This paper investigates the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on MT. We focus on 1) Fine-tuning a model trained only on English monol… ▽ More

    Submitted 20 June, 2022; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: Accepted for publication at EACL 2021

  19. arXiv:2004.01655  [pdf, other

    cs.CL cs.LG stat.ML

    Aligned Cross Entropy for Non-Autoregressive Machine Translation

    Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

    Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propos… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  20. arXiv:2001.08785  [pdf, other

    cs.CL cs.LG stat.ML

    Semi-Autoregressive Training Improves Mask-Predict Decoding

    Authors: Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer

    Abstract: The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  21. arXiv:2001.08210  [pdf, other

    cs.CL

    Multilingual Denoising Pre-training for Neural Machine Translation

    Authors: Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

    Abstract: This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence… ▽ More

    Submitted 23 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

    Comments: Work in progress

  22. arXiv:2001.05136  [pdf, other

    cs.CL

    Non-Autoregressive Machine Translation with Disentangled Context Transformer

    Authors: Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu

    Abstract: State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in inference since we cannot generate multiple tokens in each sentence in parallel. We propose an attention-masking based model, called Disentangled Context (DisCo)… ▽ More

    Submitted 30 June, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: ICML 2020

  23. arXiv:1910.13461  [pdf, other

    cs.CL cs.LG stat.ML

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

    Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  24. arXiv:1906.05683  [pdf, ps, other

    cs.CL

    Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation

    Authors: Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May

    Abstract: Given a rough, word-by-word gloss of a source language sentence, target language natives can uncover the latent, fully-fluent rendering of the translation. In this work we explore this intuition by breaking translation into a two step process: generating a rough gloss by means of a dictionary and then `translating' the resulting pseudo-translation, or `Translationese' into a fully fluent translati… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Accepted in ACL 2019

  25. arXiv:1904.09324  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Mask-Predict: Parallel Decoding of Conditional Masked Language Models

    Authors: Marjan Ghazvininejad, Omer Levy, Yinhan Liu, Luke Zettlemoyer

    Abstract: Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation. This approach allows for efficient iterative decoding, where we first predict all of the target words non-autoregressively,… ▽ More

    Submitted 4 September, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: EMNLP 2019

  26. arXiv:1902.01509  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

    Authors: Vladimir Karpukhin, Omer Levy, Jacob Eisenstein, Marjan Ghazvininejad

    Abstract: We consider the problem of making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatica… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  27. arXiv:1702.01932  [pdf, other

    cs.CL

    A Knowledge-Grounded Neural Conversation Model

    Authors: Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley

    Abstract: Neural network models are capable of generating extremely natural sounding conversational interactions. Nevertheless, these models have yet to demonstrate that they can incorporate content in the form of factual information or entity-grounded opinion that would enable them to serve in more task-oriented conversational applications. This paper presents a novel, fully data-driven, and knowledge-grou… ▽ More

    Submitted 15 November, 2018; v1 submitted 7 February, 2017; originally announced February 2017.

    Comments: AAAI 2018 (9 pages)

  28. arXiv:1505.01576  [pdf, other

    cs.LG

    Learning and Optimization with Submodular Functions

    Authors: Bharath Sankaran, Marjan Ghazvininejad, Xinran He, David Kale, Liron Cohen

    Abstract: In many naturally occurring optimization problems one needs to ensure that the definition of the optimization problem lends itself to solutions that are tractable to compute. In cases where exact solutions cannot be computed tractably, it is beneficial to have strong guarantees on the tractable approximate solutions. In order operate under these criterion most optimization problems are cast under… ▽ More

    Submitted 7 May, 2015; originally announced May 2015.

    Comments: Tech Report - USC Computer Science CS-599, Convex and Combinatorial Optimization