Skip to main content

Showing 1–43 of 43 results for author: Ott, M

  1. arXiv:2405.00078  [pdf, other

    cs.CR cs.OS

    Mitigating Spectre-PHT using Speculation Barriers in Linux BPF

    Authors: Luis Gerhorst, Henriette Herzog, Peter Wägemann, Maximilian Ott, Rüdiger Kapitza, Timo Hönig

    Abstract: High-performance IO demands low-overhead communication between user- and kernel space. This demand can no longer be fulfilled by traditional system calls. Linux's extended Berkeley Packet Filter (BPF) avoids user-/kernel transitions by just-in-time compiling user-provided bytecode and executing it in kernel mode with near-native speed. To still isolate BPF programs from the kernel, they are static… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    MSC Class: 68M25 ACM Class: D.4.6

  2. arXiv:2401.16971  [pdf, other

    cs.DC

    Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations

    Authors: Francieli Boito, Jim Brandt, Valeria Cardellini, Philip Carns, Florina M. Ciorba, Hilary Egan, Ahmed Eleliemy, Ann Gentile, Thomas Gruber, Jeff Hanson, Utz-Uwe Haus, Kevin Huck, Thomas Ilsche, Thomas Jakobsche, Terry Jones, Sven Karlsson, Abdullah Mueen, Michael Ott, Tapasya Patki, Ivy Peng, Krishnan Raghavan, Stephen Simms, Kathleen Shoga, Michael Showerman, Devesh Tiwari , et al. (2 additional authors not shown)

    Abstract: Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  3. arXiv:2304.11277  [pdf, other

    cs.DC cs.AI cs.LG cs.PF

    PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

    Authors: Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

    Abstract: It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit tech… ▽ More

    Submitted 12 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  4. arXiv:2205.01068  [pdf, other

    cs.CL cs.LG

    OPT: Open Pre-trained Transformer Language Models

    Authors: Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

    Abstract: Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open… ▽ More

    Submitted 21 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  5. arXiv:2203.06850  [pdf, other

    cs.CL cs.AI

    Efficient Language Modeling with Sparse all-MLP

    Authors: Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li

    Abstract: All-MLP architectures have attracted increasing interest as an alternative to attention-based models. In NLP, recent work like gMLP shows that all-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. In this work, we analyze the limitations of MLPs in expressiveness, and propose sparsely activated MLPs with mixture-of-experts (MoEs) in both feature and input… ▽ More

    Submitted 31 May, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

  6. arXiv:2112.10684  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Large Scale Language Modeling with Mixtures of Experts

    Authors: Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

    Abstract: Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we… ▽ More

    Submitted 26 October, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: EMNLP 2022

  7. arXiv:2112.10668  [pdf, other

    cs.CL cs.AI

    Few-shot Learning with Multilingual Language Models

    Authors: Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

    Abstract: Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t… ▽ More

    Submitted 10 November, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted to EMNLP 2022; 34 pages

  8. arXiv:2111.00364  [pdf, other

    cs.LG cs.AI cs.AR

    Sustainable AI: Environmental Implications, Challenges and Opportunities

    Authors: Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood

    Abstract: This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, w… ▽ More

    Submitted 9 January, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

  9. arXiv:2110.09456  [pdf, other

    cs.CL cs.AI

    NormFormer: Improved Transformer Pretraining with Extra Normalization

    Authors: Sam Shleifer, Jason Weston, Myle Ott

    Abstract: During pretraining, the Pre-LayerNorm transformer suffers from a gradient magnitude mismatch: gradients at early layers are much larger than at later layers. These issues can be alleviated by our proposed NormFormer architecture, which adds three normalization operations to each layer: a Layer Norm after self attention, head-wise scaling of self-attention outputs, and a Layer Norm after the first… ▽ More

    Submitted 1 November, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  10. arXiv:2106.14423  [pdf, other

    cs.DC eess.SY

    Operational Data Analytics in Practice: Experiences from Design to Deployment in Production HPC Environments

    Authors: Alessio Netti, Michael Ott, Carla Guillen, Daniele Tafani, Martin Schulz

    Abstract: As HPC systems grow in complexity, efficient and manageable operation is increasingly critical. Many centers are thus starting to explore the use of Operational Data Analytics (ODA) techniques, which extract knowledge from massive amounts of monitoring data and use it for control and visualization purposes. As ODA is a multi-faceted problem, much effort has gone into researching its separate aspec… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: Preliminary version of the article

  11. arXiv:2106.09563  [pdf, other

    cs.LG cs.CV

    On Anytime Learning at Macroscale

    Authors: Lucas Caccia, Jing Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

    Abstract: In many practical applications of machine learning data arrives sequentially over time in large chunks. Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time. Online learning theory for convex optimization suggests that the best strategy is to use data as soon as it arrives. However, this might not be the best stra… ▽ More

    Submitted 2 August, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2022

  12. arXiv:2105.00572  [pdf, ps, other

    cs.CL

    Larger-Scale Transformers for Multilingual Masked Language Modeling

    Authors: Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau

    Abstract: Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large m… ▽ More

    Submitted 2 May, 2021; originally announced May 2021.

    Comments: 4 pages

  13. arXiv:2012.09543  [pdf, other

    cs.LG

    Few-shot Sequence Learning with Transformers

    Authors: Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that t… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: NeurIPS Meta-Learning Workshop 2020

  14. arXiv:2010.06186  [pdf, other

    cs.DC cs.LG eess.SY

    Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data

    Authors: Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz

    Abstract: Modern High-Performance Computing (HPC) and data center operators rely more and more on data analytics techniques to improve the efficiency and reliability of their operations. They employ models that ingest time-series monitoring sensor data and transform it into actionable knowledge for system tuning: a process known as Operational Data Analytics (ODA). However, monitoring data has a high dimens… ▽ More

    Submitted 19 February, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at the 35th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2021)

  15. arXiv:2004.14287  [pdf, other

    cs.CL

    General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference

    Authors: Jingfei Du, Myle Ott, Haoran Li, Xing Zhou, Veselin Stoyanov

    Abstract: The state of the art on many NLP tasks is currently achieved by large pre-trained language models, which require a considerable amount of computation. We explore a setting where many different predictions are made on a single piece of text. In that case, some of the computational cost during inference can be amortized over the different tasks using a shared text encoder. We compare approaches for… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  16. arXiv:2004.13637  [pdf, other

    cs.CL cs.AI

    Recipes for building an open-domain chatbot

    Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

    Abstract: Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a… ▽ More

    Submitted 30 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  17. arXiv:2004.11714  [pdf, other

    cs.CL cs.LG

    Residual Energy-Based Models for Text Generation

    Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation. The dominant parametric approach is based on locally normalized models which predict one word at a time. While these work remarkably well, they are plagued by exposure bias due to the greedy nature of the generation process. In this work, we investigate un-normalized energy-based models (EBMs)… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: published at ICLR 2020. arXiv admin note: substantial text overlap with arXiv:2004.10188

    Journal ref: ICLR 2020

  18. arXiv:2004.10188  [pdf, other

    cs.CL cs.LG stat.ML

    Residual Energy-Based Models for Text

    Authors: Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmati… ▽ More

    Submitted 21 December, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: long journal version

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-41

  19. arXiv:2001.07649  [pdf, other

    stat.OT cs.OH

    Integrating data science ethics into an undergraduate major: A case study

    Authors: Benjamin S. Baumer, Randi L. Garcia, Albert Y. Kim, Katherine M. Kinnaird, Miles Q. Ott

    Abstract: We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for integrating ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-si… ▽ More

    Submitted 31 January, 2022; v1 submitted 21 January, 2020; originally announced January 2020.

    MSC Class: 00A05 ACM Class: K.7.4; K.3.2

  20. arXiv:1911.03587  [pdf, other

    cs.CL

    How Decoding Strategies Affect the Verifiability of Generated Text

    Authors: Luca Massarelli, Fabio Petroni, Aleksandra Piktus, Myle Ott, Tim Rocktäschel, Vassilis Plachouras, Fabrizio Silvestri, Sebastian Riedel

    Abstract: Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generat… ▽ More

    Submitted 29 September, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: accepted at Findings of EMNLP 2020

  21. arXiv:1911.02116  [pdf, other

    cs.CL

    Unsupervised Cross-lingual Representation Learning at Scale

    Authors: Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov

    Abstract: This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lin… ▽ More

    Submitted 7 April, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: ACL 2020 (+ updated results)

  22. arXiv:1910.07117  [pdf, other

    cs.CL cs.AI cs.LG

    Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

    Authors: Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

    Abstract: In this work, we study how the finetuning stage in the pretrain-finetune framework changes the behavior of a pretrained neural language generator. We focus on the transformer encoder-decoder model for the open-domain dialogue response generation task. Our major finding is that after standard finetuning, the model forgets some of the important language generation skills acquired during large-scale… ▽ More

    Submitted 16 January, 2021; v1 submitted 15 October, 2019; originally announced October 2019.

    Journal ref: EACL 2021

  23. arXiv:1910.06848  [pdf, other

    cs.CL

    Facebook AI's WAT19 Myanmar-English Translation Task Submission

    Authors: Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato

    Abstract: This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-training, back-translation and their combination. We further improve results by using noisy channel re-ranking and ensembling. We demonstrate that these techni… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: The 6th Workshop on Asian Translation

  24. DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems

    Authors: Alessio Netti, Micha Mueller, Carla Guillen, Michael Ott, Daniele Tafani, Gence Ozer, Martin Schulz

    Abstract: As we approach the exascale era, the size and complexity of HPC systems continues to increase, raising concerns about their manageability and sustainability. For this reason, more and more HPC centers are experimenting with fine-grained monitoring coupled with Operational Data Analytics (ODA) to optimize efficiency and effectiveness of system operations. However, while monitoring is a common reali… ▽ More

    Submitted 18 April, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at the 29th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020)

  25. arXiv:1909.13151  [pdf, other

    cs.CL

    The Source-Target Domain Mismatch Problem in Machine Translation

    Authors: Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

    Abstract: While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that partic… ▽ More

    Submitted 16 June, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

  26. arXiv:1908.05204  [pdf, other

    cs.CL

    On The Evaluation of Machine Translation Systems Trained With Back-Translation

    Authors: Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli

    Abstract: Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training da… ▽ More

    Submitted 18 August, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: ACL 2020

  27. arXiv:1907.11692  [pdf, ps, other

    cs.CL

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

    Abstract: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that caref… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

  28. arXiv:1907.06616  [pdf, ps, other

    cs.CL

    Facebook FAIR's WMT19 News Translation Task Submission

    Authors: Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov

    Abstract: This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling toolkit which rely on sampled back-translations. This… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 7 pages; WMT

  29. From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB

    Authors: Alessio Netti, Micha Mueller, Axel Auweter, Carla Guillen, Michael Ott, Daniele Tafani, Martin Schulz

    Abstract: Today's HPC installations are highly-complex systems, and their complexity will only increase as we move to exascale and beyond. At each layer, from facilities to systems, from runtimes to applications, a wide range of tuning decisions must be made in order to achieve efficient operation. This, however, requires systematic and continuous monitoring of system and user data. While many insular solut… ▽ More

    Submitted 14 August, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: Accepted at the The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2019

  30. arXiv:1906.03351  [pdf, other

    cs.LG cs.CL stat.ML

    Real or Fake? Learning to Discriminate Machine from Human Generated Text

    Authors: Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be increased is difficult. In part, this is because standard gradient-based methods are not readily app… ▽ More

    Submitted 25 November, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  31. arXiv:1904.01038  [pdf, other

    cs.CL

    fairseq: A Fast, Extensible Toolkit for Sequence Modeling

    Authors: Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

    Abstract: fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: NAACL 2019 Demo paper

  32. arXiv:1902.07816  [pdf, other

    cs.CL cs.LG

    Mixture Models for Diverse Machine Translation: Tricks of the Trade

    Authors: Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

    Abstract: Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixtur… ▽ More

    Submitted 24 May, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: ICML 2019 camera-ready

  33. arXiv:1902.01382  [pdf, other

    cs.CL

    The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

    Authors: Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

    Abstract: For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLoRes e… ▽ More

    Submitted 14 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: EMNLP 2019

  34. arXiv:1808.09381  [pdf, ps, other

    cs.CL

    Understanding Back-Translation at Scale

    Authors: Sergey Edunov, Myle Ott, Michael Auli, David Grangier

    Abstract: An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or… ▽ More

    Submitted 2 October, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

    Comments: 12 pages; EMNLP 2018

  35. arXiv:1806.00187  [pdf, other

    cs.CL

    Scaling Neural Machine Translation

    Authors: Myle Ott, Sergey Edunov, David Grangier, Michael Auli

    Abstract: Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT'14 English-German translation, we match the accuracy of Vaswani et al. (20… ▽ More

    Submitted 4 September, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: WMT 2018

  36. arXiv:1804.07755  [pdf, other

    cs.CL

    Phrase-Based & Neural Unsupervised Machine Translation

    Authors: Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model varian… ▽ More

    Submitted 13 August, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018

  37. arXiv:1803.00047  [pdf, other

    cs.CL

    Analyzing Uncertainty in Neural Machine Translation

    Authors: Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty o… ▽ More

    Submitted 13 August, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

    Comments: ICML 2018

  38. arXiv:1711.04956  [pdf, other

    cs.CL

    Classical Structured Prediction Losses for Sequence to Sequence Learning

    Authors: Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losse… ▽ More

    Submitted 5 October, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: 10 pages, NAACL 2018

  39. arXiv:1410.1681  [pdf, other

    cs.NI

    Repeatable Experiments with LabWiki

    Authors: Thierry Rakotoarivelo, Guillaume Jourjon, Olivier Mehani, Maximilian Ott, Mike Zink

    Abstract: The ability to repeat the experiments from a research study and obtain similar results is a corner stone in experiment-based scientific discovery. This essential feature has been often ignored by the distributed computing and networking community. There are many reasons for that, such as the complexity of provisioning, configuring, and orchestrating the resources used by experiments, their multipl… ▽ More

    Submitted 7 October, 2014; originally announced October 2014.

  40. arXiv:1208.1569  [pdf, other

    cs.OH

    Design For Change: Information-Centric Architecture to Support Agile Disaster Response

    Authors: Yan Shvartzshnaider, Maximilian Ott

    Abstract: This paper presents a case for the adoption of an information-centric architecture for a global disaster management system. Drawing from a case study of the 2010/2011 Queensland floods, we describe the challenges in providing every participant with relevant and actionable information. We use various examples to argue for a more flexible information dissemination framework which is designed from th… ▽ More

    Submitted 7 August, 2012; originally announced August 2012.

  41. arXiv:1204.2804  [pdf, other

    cs.SI cs.CL cs.CY

    Estimating the Prevalence of Deception in Online Review Communities

    Authors: Myle Ott, Claire Cardie, Jeff Hancock

    Abstract: Consumers' purchase decisions are increasingly influenced by user-generated online reviews. Accordingly, there has been growing concern about the potential for posting "deceptive opinion spam" -- fictitious reviews that have been deliberately written to sound authentic, to deceive the reader. But while this practice has received considerable public attention and concern, relatively little is known… ▽ More

    Submitted 12 April, 2012; originally announced April 2012.

    Comments: 10 pages, 4 figures, 3 tables, to appear at WWW 2012

    ACM Class: I.2.7; J.4; K.4.1; K.4.4

  42. arXiv:1107.4557  [pdf, ps, other

    cs.CL cs.CY

    Finding Deceptive Opinion Spam by Any Stretch of the Imagination

    Authors: Myle Ott, Yejin Choi, Claire Cardie, Jeffrey T. Hancock

    Abstract: Consumers increasingly rate, review and research products online. Consequently, websites containing consumer reviews are becoming targets of opinion spam. While recent work has focused primarily on manually identifiable instances of opinion spam, in this work we study deceptive opinion spam---fictitious opinions that have been deliberately written to sound authentic. Integrating work from psycholo… ▽ More

    Submitted 22 July, 2011; originally announced July 2011.

    Comments: 11 pages, 5 tables, data available at: http://www.cs.cornell.edu/~myleott

    ACM Class: I.2.7; J.4; K.4.2

    Journal ref: Proceedings of ACL 2011: HLT, pp. 309-319

  43. arXiv:1104.2134  [pdf, ps, other

    cs.NI

    A Case for a Global Information Network

    Authors: Maximilian Ott, Yan Shvartzshnaider

    Abstract: This paper argues for the adoption of a information centric system model instead of the current service-oriented one. We present an architecture for a global information storage and dissemination network which provides for efficient interaction and coordination among autonomous actors through a shared information space. We believe that the resulting, loosely coupled systems, while probabilistic in… ▽ More

    Submitted 12 April, 2011; originally announced April 2011.

    Report number: TR-4783