Skip to main content

Showing 1–3 of 3 results for author: Zariquiey, R

  1. arXiv:2210.02509  [pdf, other

    cs.CL

    Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation

    Authors: Arturo Oncevay, Kervy Dante Rivas Rojas, Liz Karen Chavez Sanchez, Roberto Zariquiey

    Abstract: Language modelling and machine translation tasks mostly use subword or character inputs, but syllables are seldom used. Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size. In this study, we first explore the potential of syllables for open-vocabulary language modelling in 21 langua… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: COLING 2022, short-paper

  2. arXiv:2206.10343  [pdf, other

    cs.CL

    Building an Endangered Language Resource in the Classroom: Universal Dependencies for Kakataibo

    Authors: Roberto Zariquiey, Claudia Alvarado, Ximena Echevarria, Luisa Gomez, Rosa Gonzales, Mariana Illescas, Sabina Oporto, Frederic Blum, Arturo Oncevay, Javier Vera

    Abstract: In this paper, we launch a new Universal Dependencies treebank for an endangered language from Amazonia: Kakataibo, a Panoan language spoken in Peru. We first discuss the collaborative methodology implemented, which proved effective to create a treebank in the context of a Computational Linguistic course for undergraduates. Then, we describe the general details of the treebank and the language-spe… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to LREC 2022

  3. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions