Skip to main content

Showing 1–12 of 12 results for author: Hamed, I

  1. arXiv:2407.04910  [pdf, other

    cs.CL cs.AI

    NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

    Authors: Muhammad Abdul-Mageed, Amr Keleg, AbdelRahim Elmadany, Chiyu Zhang, Injy Hamed, Walid Magdy, Houda Bouamor, Nizar Habash

    Abstract: We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that allow researchers to collaboratively compete on pre-specified tasks. NADI 2024 targeted both dialect identification cast as a multi-label task (Su… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted by The Second Arabic Natural Language Processing Conference

  2. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2403.18182  [pdf, other

    cs.CL

    ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

    Authors: Injy Hamed, Fadhl Eryani, David Palfreyman, Nizar Habash

    Abstract: We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus. The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation where Students brainstorm ideas for a certain topic and then discuss it with an Interlocutor. The meetings cover different topics and are divided into phases with different language setups. The corpus pres… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  4. arXiv:2310.15262  [pdf, other

    cs.CL

    Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

    Authors: Injy Hamed, Nizar Habash, Ngoc Thang Vu

    Abstract: Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW.… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  5. arXiv:2211.16319  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

    Authors: Injy Hamed, Amir Hussein, Oumnia Chellah, Shammur Chowdhury, Hamdy Mubarak, Sunayana Sitaram, Nizar Habash, Ahmed Ali

    Abstract: Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition. In this paper, we focus on the question of robust and fair evaluation metrics. To that end, we develop a reference benchmark data set of code-switching speech recognition hypotheses with human judgments. We define clear guidelines for minimal editing of automatic hypotheses. We validate the… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to SLT 2022

  6. arXiv:2211.12000  [pdf, other

    cs.CL

    ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

    Authors: Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

    Abstract: We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus.… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted to the Seventh Arabic Natural Language Processing Workshop (WANLP 2022)

  7. arXiv:2210.06990  [pdf, other

    cs.CL

    Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

    Authors: Marwa Gaser, Manuel Mager, Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Data sparsity is one of the main challenges posed by code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of machine translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings. In this paper, we study the effectiveness of diffe… ▽ More

    Submitted 30 April, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to EACL 2023

  8. arXiv:2208.00433  [pdf, other

    cs.CL

    The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

    Authors: Injy Hamed, Alia El Bolock, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Code-switching (CS) is a common linguistic phenomenon exhibited by multilingual individuals, where they tend to alternate between languages within one single conversation. CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. Given that the factors giving rise to CS vary from one c… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: To be published in the International Journal of Asian Language Processing. arXiv admin note: substantial text overlap with arXiv:2112.06462

  9. arXiv:2205.12649  [pdf, other

    cs.CL

    Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation

    Authors: Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Data sparsity is a main problem hindering the development of code-switching (CS) NLP systems. In this paper, we investigate data augmentation techniques for synthesizing dialectal Arabic-English CS text. We perform lexical replacements using word-aligned parallel corpora where CS points are either randomly chosen or learnt using a sequence-to-sequence model. We compare these approaches against dic… ▽ More

    Submitted 4 April, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted to LoResMT 2023

  10. arXiv:2112.06462  [pdf, other

    cs.CL

    Predicting User Code-Switching Level from Sociological and Psychological Profiles

    Authors: Injy Hamed, Alia El Bolock, Nader Rizk, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Multilingual speakers tend to alternate between languages within a conversation, a phenomenon referred to as "code-switching" (CS). CS is a complex phenomenon that not only encompasses linguistic challenges, but also contains a great deal of complexity in terms of its dynamic behaviour across speakers. This dynamic behaviour has been studied by sociologists and psychologists, identifying factors a… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: To be published in the proceedings of the International Conference on Asian Language Information Processing

  11. arXiv:2108.12881  [pdf, other

    cs.CL

    Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

    Authors: Injy Hamed, Pavel Denisov, Chia-Yu Li, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Code-switching (CS), defined as the mixing of languages in conversations, has become a worldwide phenomenon. The prevalence of CS has been recently met with a growing demand and interest to build CS ASR systems. In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR). We first contribute in filling the huge gap in resources by collecting, anal… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.

    Comments: To be published in Computer Speech and Language Journal

  12. arXiv:1909.10892  [pdf, other

    cs.CL

    Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

    Authors: Injy Hamed, Moritz Zhu, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang Vu

    Abstract: Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate different state-of-the-art bilingual word embeddings app… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Comments: 11 pages, 1 figure (having 2 sub-figures), submitted to the 21st International Conference on Speech and Computer (SPECOM'19),

    Journal ref: Proceedings of the 21st International Conference on Speech and Computer (SPECOM'19), Istanbul, Turkey, August 20-25, 2019 https://link.springer.com/book/10.1007/978-3-030-26061-3