Skip to main content

Showing 1–7 of 7 results for author: Siminyu, K

  1. arXiv:2104.02516  [pdf, other

    cs.CL

    AI4D -- African Language Program

    Authors: Kathleen Siminyu, Godson Kalipe, Davor Orlic, Jade Abbott, Vukosi Marivate, Sackey Freshia, Prateek Sibal, Bhanu Neupane, David I. Adelani, Amelia Taylor, Jamiil Toure ALI, Kevin Degila, Momboladji Balogoun, Thierno Ibrahima DIOP, Davis David, Chayma Fourati, Hatem Haddad, Malek Naski

    Abstract: Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation. These are however only available for high resource languages like English, French or Chinese. Without foundational digital resources for African languages, which are considered low-resource in the digital context, these advanced tools remain out of reach. This… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  2. arXiv:2104.01624  [pdf, other

    cs.CL

    Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties

    Authors: Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig

    Abstract: Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages. In this work, we focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation, which incorporates phonological knowledge through a language-dependent allophone layer that associates a universal narrow… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

  3. arXiv:2104.00824  [pdf

    cs.CL cs.SD eess.AS

    Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments

    Authors: David R. Mortensen, Jordan Picone, Xinjian Li, Kathleen Siminyu

    Abstract: There is growing interest in ASR systems that can recognize phones in a language-independent fashion. There is additionally interest in building language technologies for low-resource and endangered languages. However, there is a paucity of realistic data that can be used to test such systems and technologies. This paper presents a publicly available, phonetically transcribed corpus of 2255 uttera… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: 4 pages, 3 figures

  4. arXiv:2011.10361   

    cs.CL

    1st AfricaNLP Workshop Proceedings, 2020

    Authors: Kathleen Siminyu, Laura Martinus, Vukosi Marivate

    Abstract: Proceedings of the 1st AfricaNLP Workshop held on 26th April alongside ICLR 2020, Virtual Conference, Formerly Addis Ababa Ethiopia.

    Submitted 20 November, 2020; originally announced November 2020.

  5. arXiv:2010.02353  [pdf, other

    cs.CL cs.AI cs.LG

    Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

    Authors: Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer , et al. (23 additional authors not shown)

    Abstract: Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communicat… ▽ More

    Submitted 6 November, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020; updated benchmarks

  6. arXiv:2007.11865  [pdf, ps, other

    cs.CL

    AI4D -- African Language Dataset Challenge

    Authors: Kathleen Siminyu, Sackey Freshia, Jade Abbott, Vukosi Marivate

    Abstract: As language and speech technologies become more advanced, the lack of fundamental digital resources for African languages, such as data, spell checkers and Part of Speech taggers, means that the digital divide between these languages and others keeps growing. This work details the organisation of the AI4D - African Language Dataset Challenge, an effort to incentivize the creation, organization and… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

  7. arXiv:2003.11529  [pdf, other

    cs.CL

    Masakhane -- Machine Translation For Africa

    Authors: Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir

    Abstract: Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Accepted for the AfricaNLP Workshop, ICLR 2020