Skip to main content

Showing 1–8 of 8 results for author: Arthur, P

  1. arXiv:2205.04651  [pdf, other

    cs.CL

    ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

    Authors: Alham Fikri Aji, Tirana Noor Fatyanosa, Radityo Eko Prasojo, Philip Arthur, Suci Fitriany, Salma Qonitah, Nadhifa Zulfa, Tomi Santoso, Mahendra Data

    Abstract: We release our synthetic parallel paraphrase corpus across 17 languages: Arabic, Catalan, Czech, German, English, Spanish, Estonian, French, Hindi, Indonesian, Italian, Dutch, Romanian, Russian, Swedish, Vietnamese, and Chinese. Our method relies only on monolingual data and a neural machine translation system to generate paraphrases, hence simple to apply. We generate multiple translation samples… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: 10 pages, 3 figures, 6 tables. Accepted at PACLIC 2021. (ACL Anthology link: https://aclanthology.org/2021.paclic-1.56/)

    MSC Class: 68T50 ACM Class: I.2.7; I.2.6

  2. arXiv:2110.05213  [pdf, other

    cs.CL cs.LG

    It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

    Authors: Jinming Zhao, Philip Arthur, Gholamreza Haffari, Trevor Cohn, Ehsan Shareghi

    Abstract: Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora. We argue that SiMT systems should be trained and tested on real interpretation data. To illustrate this argument, we propose an interpretation test set and conduct a realistic evaluation of SiMT trained on offline translations. Our results, on our test set along with 3 existing s… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: EMNLP2021

  3. arXiv:2002.04306  [pdf, other

    cs.CL cs.AI cs.LG

    Learning Coupled Policies for Simultaneous Machine Translation using Imitation Learning

    Authors: Philip Arthur, Trevor Cohn, Gholamreza Haffari

    Abstract: We present a novel approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies. First, wepresent an algorithmic oracle to produce oracle READ/WRITE actions for training bilingual sentence-pairs using the notion of word alignments. This oracle actions are designed to capture enough information from the partial input before writing the output. Next, we… ▽ More

    Submitted 25 January, 2021; v1 submitted 11 February, 2020; originally announced February 2020.

    Comments: 9 pages

  4. arXiv:1902.03499  [pdf, ps, other

    cs.CL

    Multilingual Neural Machine Translation With Soft Decoupled Encoding

    Authors: Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig

    Abstract: Multilingual training of neural machine translation (NMT) systems has led to impressive accuracy improvements on low-resource languages. However, there are still significant challenges in efficiently learning word representations in the face of paucity of data. In this paper, we propose Soft Decoupled Encoding (SDE), a multilingual lexicon encoding framework specifically designed to share lexical-… ▽ More

    Submitted 9 February, 2019; originally announced February 2019.

    Comments: accepted at ICLR 2019

  5. arXiv:1803.00188  [pdf, ps, other

    cs.CL

    XNMT: The eXtensible Neural Machine Translation Toolkit

    Authors: Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, Liming Wang

    Abstract: This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of m… ▽ More

    Submitted 28 February, 2018; originally announced March 2018.

    Comments: To be presented at AMTA 2018 Open Source Software Showcase

  6. arXiv:1802.05092  [pdf, other

    cs.CL

    Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

    Authors: Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

    Abstract: We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: Accepted to ICASSP 2018

  7. arXiv:1704.06918  [pdf, ps, other

    cs.CL

    Neural Machine Translation via Binary Code Prediction

    Authors: Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura

    Abstract: In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed mod… ▽ More

    Submitted 23 April, 2017; originally announced April 2017.

    Comments: Accepted as a long paper at ACL2017

  8. arXiv:1606.02006  [pdf, ps, other

    cs.CL

    Incorporating Discrete Translation Lexicons into Neural Machine Translation

    Authors: Philip Arthur, Graham Neubig, Satoshi Nakamura

    Abstract: Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence. We propose a method to alleviate this problem by augmenting NMT systems with discrete translation lexicons that efficiently encode translations of these low-frequency words. We describe a method to calculate the lexicon probability of the… ▽ More

    Submitted 4 October, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: Accepted at EMNLP 2016