Skip to main content

Showing 1–15 of 15 results for author: Renduchintala, A

  1. arXiv:2402.07792  [pdf, other

    cs.LG cs.DC

    Empowering Federated Learning for Massive Models with NVIDIA FLARE

    Authors: Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, Andrew Feng

    Abstract: In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copy… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  2. arXiv:2311.09578  [pdf, other

    cs.CL cs.AI cs.LG

    Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

    Authors: Adithya Renduchintala, Tugrul Konuk, Oleksii Kuchaiev

    Abstract: We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA). Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters. Across $5$ diverse… ▽ More

    Submitted 12 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: 8 pages 4 figures

  3. arXiv:2211.12615  [pdf, other

    cs.CL cs.AI

    AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

    Authors: Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis

    Abstract: Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  4. arXiv:2206.02079  [pdf, other

    cs.CL

    Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

    Authors: Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao Gu, Xian Li

    Abstract: Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) c… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: EACL 2021

  5. arXiv:2106.00169  [pdf, other

    cs.CL

    Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

    Authors: Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

    Abstract: Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We const… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: Accepted at ACL 2021

  6. arXiv:2105.15071  [pdf, other

    cs.CL

    Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data

    Authors: Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab

    Abstract: The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic structures. In this work, we exploit this linguistic overlap to facilitate translating to and from a… ▽ More

    Submitted 1 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: ACL 2021

  7. arXiv:2104.08597  [pdf, other

    cs.CL

    XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment

    Authors: Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn

    Abstract: Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification. While knowledge bases contain a large number of entities in high-resource languages such as English and French, corresponding entities for lower-resource languages are often missing. To address this, we propose Lexical-Semantic-Phonetic Align (LSP-Align)… ▽ More

    Submitted 10 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

  8. arXiv:2104.07838  [pdf, ps, other

    cs.CL

    Investigating Failures of Automatic Translation in the Case of Unambiguous Gender

    Authors: Adithya Renduchintala, Adina Williams

    Abstract: Transformer based models are the modern work horses for neural machine translation (NMT), reaching state of the art across several benchmarks. Despite their impressive accuracy, we observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn't mark gender on nouns into others that do. We find that even when the surroundi… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 10 pages, 2 figures, 4 tables, submitting to EMNLP 2021

  9. arXiv:2102.04020  [pdf, other

    cs.CL

    Quality Estimation without Human-labeled Data

    Authors: Yi-Lin Tuan, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Francisco Guzmán, Lucia Specia

    Abstract: Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a techni… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted by EACL2021

  10. arXiv:2101.00977  [pdf, other

    cs.LG

    Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms

    Authors: Yilun Zhou, Adithya Renduchintala, Xian Li, Sida Wang, Yashar Mehdad, Asish Ghoshal

    Abstract: Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process. While many algorithms have been proposed, there is little study on what the optimal AL algorithm looks like, which would help researchers understand where their models fall short and iterate on the design. In this paper, we present a simulated annealing algorithm to s… ▽ More

    Submitted 20 February, 2021; v1 submitted 29 December, 2020; originally announced January 2021.

    Comments: AISTATS 2021

  11. arXiv:1905.10453  [pdf, ps, other

    cs.CL

    A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

    Authors: Shuoyang Ding, Adithya Renduchintala, Kevin Duh

    Abstract: Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration on different numbers of BPE merge operations to understand how it interacts with the model architecture, the stra… ▽ More

    Submitted 24 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted to MT Summit 2019

  12. arXiv:1812.03919  [pdf, other

    eess.AS cs.CL cs.SD

    Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings

    Authors: Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur

    Abstract: We explore training attention-based encoder-decoder ASR in low-resource settings. These models perform poorly when trained on small amounts of transcribed speech, in part because they depend on having sufficient target-side text to train the attention and decoder networks. In this paper we address this shortcoming by pretraining our network parameters using only text-based data and transcribed spe… ▽ More

    Submitted 2 August, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

  13. arXiv:1809.02223  [pdf, other

    cs.CL

    Character-Aware Decoder for Translation into Morphologically Rich Languages

    Authors: Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn

    Abstract: Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology. We present a character-aware decoder designed to capture such patterns when translating into morphologically rich languages. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional… ▽ More

    Submitted 18 June, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: 9 pages (12 including Appendix), 5 figures, Accepted at MT Summit 2019

  14. arXiv:1804.00015  [pdf, other

    cs.CL

    ESPnet: End-to-End Speech Processing Toolkit

    Authors: Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai

    Abstract: This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a co… ▽ More

    Submitted 30 March, 2018; originally announced April 2018.

  15. arXiv:1803.10299  [pdf, other

    cs.CL cs.SD eess.AS

    Multi-Modal Data Augmentation for End-to-End ASR

    Authors: Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe

    Abstract: We present a new end-to-end architecture for automatic speech recognition (ASR) that can be trained using \emph{symbolic} input in addition to the traditional acoustic input. This architecture utilizes two separate encoders: one for acoustic input and another for symbolic input, both sharing the attention and decoder parameters. We call this architecture a multi-modal data augmentation network (MM… ▽ More

    Submitted 18 June, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: 5 Pages, 1 Figure, accepted at INTERSPEECH 2018