Skip to main content

Showing 1–11 of 11 results for author: Adebara, I

  1. arXiv:2407.04796  [pdf, other

    cs.CL

    Toucan: Many-to-Many Translation for 150 African Language Pairs

    Authors: AbdelRahim Elmadany, Ife Adebara, Muhammad Abdul-Mageed

    Abstract: We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, we introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2404.05943  [pdf, other

    cs.CL cs.AI

    Interplay of Machine Translation, Diacritics, and Diacritization

    Authors: Wei-Rui Chen, Ife Adebara, Muhammad Abdul-Mageed

    Abstract: We investigate two research questions: (1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the effect of keeping (vs. removing) diacritics on MT performance. We examine these two questions in both high-resource (HR) and low-resource (LR) settings across 55 different languages (36 African languages and 19 European langu… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Main Conference

  3. arXiv:2401.01053  [pdf, other

    cs.CL

    Cheetah: Natural Language Generation for 517 African Languages

    Authors: Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

    Abstract: Low-resource African languages pose unique challenges for natural language processing (NLP) tasks, including natural language generation (NLG). In this paper, we develop Cheetah, a massively multilingual NLG language model for African languages. Cheetah supports 517 African languages and language varieties, allowing us to address the scarcity of NLG resources and provide a solution to foster lingu… ▽ More

    Submitted 10 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  4. arXiv:2311.09696  [pdf, other

    cs.CL

    Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

    Authors: Wei-Rui Chen, Ife Adebara, Khai Duy Doan, Qisheng Liao, Muhammad Abdul-Mageed

    Abstract: ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five c… ▽ More

    Submitted 8 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024 Findings

  5. arXiv:2304.11256  [pdf, other

    cs.CL

    UBC-DLNLP at SemEval-2023 Task 12: Impact of Transfer Learning on African Sentiment Analysis

    Authors: Gagan Bhatia, Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed

    Abstract: We describe our contribution to the SemEVAl 2023 AfriSenti-SemEval shared task, where we tackle the task of sentiment analysis in 14 different African languages. We develop both monolingual and multilingual models under a full supervised setting (subtasks A and B). We also develop models for the zero-shot setting (subtask C). Our approach involves experimenting with transfer learning using six lan… ▽ More

    Submitted 25 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: AfriSenti 2023 @ ACL 2023

  6. arXiv:2212.10785  [pdf, other

    cs.CL cs.AI

    SERENGETI: Massively Multilingual Language Models for Africa

    Authors: Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte

    Abstract: Multilingual pretrained language models (mPLMs) acquire valuable, generalizable linguistic information during pretraining and have advanced the state of the art on task-specific finetuning. To date, only ~31 out of ~2,000 African languages are covered in existing language models. We ameliorate this limitation by developing SERENGETI, a massively multilingual language model that covers 517 African… ▽ More

    Submitted 26 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: To appear in Findings of ACL 2023

  7. arXiv:2210.11744  [pdf, other

    cs.CL cs.LG

    AfroLID: A Neural Language Identification Tool for African Languages

    Authors: Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, Alcides Alcoba Inciarte

    Abstract: Language identification (LID) is a crucial precursor for NLP, especially for mining web data. Problematically, most of the world's 7000+ languages today are not covered by LID technologies. We address this pressing issue for Africa by introducing AfroLID, a neural LID toolkit for $517$ African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 lang… ▽ More

    Submitted 6 December, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: To appear at EMNLP 2022 Main conference

  8. arXiv:2203.08351  [pdf, other

    cs.CL cs.AI

    Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go

    Authors: Ife Adebara, Muhammad Abdul-Mageed

    Abstract: Aligning with ACL 2022 special Theme on "Language Diversity: from Low Resource to Endangered Languages", we discuss the major linguistic and sociopolitical challenges facing development of NLP technologies for African languages. Situating African languages in a typological framework, we discuss how the particulars of these languages can be harnessed. To facilitate future research, we also highligh… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)

  9. arXiv:2108.03533  [pdf, ps, other

    cs.AI cs.CL

    Improving Similar Language Translation With Transfer Learning

    Authors: Ife Adebara, Muhammad Abdul-Mageed

    Abstract: We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages. This work is part of our contribution to the WMT 2021 Similar Languages Translation Shared Task where we submitted models for different language pairs, including French-Bambara, Spanish-Catalan, and Spanish-Portuguese in both directions. Our models for Cata… ▽ More

    Submitted 6 October, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: Submitted to WMT 2021 Similar Language Task

  10. arXiv:2103.04225  [pdf, other

    cs.CL cs.AI cs.LG

    Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings

    Authors: Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg

    Abstract: Translating between languages where certain features are marked morphologically in one but absent or marked contextually in the other is an important test case for machine translation. When translating into English which marks (in)definiteness morphologically, from Yorùbá which uses bare nouns but marks these features contextually, ambiguities arise. In this work, we perform fine-grained analysis… ▽ More

    Submitted 6 April, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

    Comments: Accepted at AfricanNLP @ EACL 2021

  11. arXiv:2011.05037  [pdf, other

    cs.CL cs.LG

    Translating Similar Languages: Role of Mutual Intelligibility in Multilingual Transformers

    Authors: Ife Adebara, El Moatez Billah Nagoudi, Muhammad Abdul Mageed

    Abstract: We investigate different approaches to translate between similar languages under low resource conditions, as part of our contribution to the WMT 2020 Similar Languages Translation Shared Task. We submitted Transformer-based bilingual and multilingual systems for all language pairs, in the two directions. We also leverage back-translation for one of the language pairs, acquiring an improvement of m… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.