Skip to main content

Showing 1–50 of 126 results for author: Ahmad, S

  1. arXiv:2407.10152  [pdf, other

    cs.CL

    Mitigating Translationese in Low-resource Languages: The Storyboard Approach

    Authors: Garry Kuwanto, Eno-Abasi E. Urua, Priscilla Amondi Amuok, Shamsuddeen Hassan Muhammad, Anuoluwapo Aremu, Verrah Otiende, Loice Emma Nanyanga, Teresiah W. Nyoike, Aniefon D. Akpan, Nsima Ab Udouboh, Idongesit Udeme Archibong, Idara Effiong Moses, Ifeoluwatayo A. Ige, Benjamin Ajibade, Olumide Benjamin Awokoya, Idris Abdulmumin, Saminu Mohammad Aliyu, Ruqayya Nasir Iro, Ibrahim Said Ahmad, Deontae Smith, Praise-EL Michaels, David Ifeoluwa Adelani, Derry Tanti Wijaya, Anietie Andy

    Abstract: Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent a… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: published at LREC-COLING 2024

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) 11349-11360

  2. Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling

    Authors: Sohaib Ahmad, Hui Guan, Ramesh K. Sitaraman

    Abstract: The rapid adoption of machine learning (ML) has underscored the importance of serving ML models with high throughput and resource efficiency. Traditional approaches to managing increasing query demands have predominantly focused on hardware scaling, which involves increasing server count or computing power. However, this strategy can often be impractical due to limitations in the available budget… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.02631  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Nollywood: Let's Go to the Movies!

    Authors: John E. Ortega, Ibrahim Said Ahmad, William Chen

    Abstract: Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to Ame… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 2 tables

  4. arXiv:2406.19504  [pdf, other

    cs.CL

    Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT

    Authors: Ibrahim Said Ahmad, Shiran Dudy, Resmi Ramachandranpillai, Kenneth Church

    Abstract: Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by nat… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2404.18981  [pdf, other

    eess.IV cs.AI

    Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis

    Authors: Akash Awasthi, Safwan Ahmad, Bryant Le, Hien Van Nguyen

    Abstract: In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial f… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted in ISBI 2024

  6. arXiv:2404.03188  [pdf

    eess.IV cs.CV cs.LG

    Classification of Nasopharyngeal Cases using DenseNet Deep Learning Architecture

    Authors: W. S. H. M. W. Ahmad, M. F. A. Fauzi, M. K. Abdullahi, Jenny T. H. Lee, N. S. A. Basry, A Yahaya, A. M. Ismail, A. Adam, Elaine W. L. Chan, F. S. Abas

    Abstract: Nasopharyngeal carcinoma (NPC) is one of the understudied yet deadliest cancers in South East Asia. In Malaysia, the prevalence is identified mainly in Sarawak, among the ethnic of Bidayuh. NPC is often late-diagnosed because it is asymptomatic at the early stage. There are several tissue representations from the nasopharynx biopsy, such as nasopharyngeal inflammation (NPI), lymphoid hyperplasia (… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This article has been accepted in the Journal of Engineering Science and Technology (JESTEC) and awaiting publication

  7. arXiv:2403.18933  [pdf, other

    cs.CL

    SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine De Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

    Abstract: We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. The… ▽ More

    Submitted 17 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: SemEval 2024 Task Description Paper. arXiv admin note: text overlap with arXiv:2402.08638

  8. arXiv:2403.17338  [pdf, other

    eess.SY cs.AI

    Reinforcement Learning-based Receding Horizon Control using Adaptive Control Barrier Functions for Safety-Critical Systems

    Authors: Ehsan Sabouni, H. M. Sabbir Ahmad, Vittorio Giammarino, Christos G. Cassandras, Ioannis Ch. Paschalidis, Wenchao Li

    Abstract: Optimal control methods provide solutions to safety-critical problems but easily become intractable. Control Barrier Functions (CBFs) have emerged as a popular technique that facilitates their solution by provably guaranteeing safety, through their forward invariance property, at the expense of some performance loss. This approach involves defining a performance objective alongside CBF-based safet… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  9. arXiv:2403.02473  [pdf, other

    cs.CV

    When do Convolutional Neural Networks Stop Learning?

    Authors: Sahan Ahmad, Gabriel Trahan, Aminul Islam

    Abstract: Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in computer vision tasks such as image classification, detection, segmentation, and medical image analysis. In general, an arbitrary number of epochs is used to train such neural networks. In a single epoch, the entire training data -- divided by batch size -- are fed to the network. In practice, validation error with t… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  10. arXiv:2402.08638  [pdf, other

    cs.CL

    SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages

    Authors: Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine De Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata , et al. (2 additional authors not shown)

    Abstract: Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks. While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present \textit{SemRel}, a new semantic relatedness dat… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to the Findings of ACL 2024

  11. arXiv:2401.13133  [pdf, other

    cs.CL cs.SI

    Analyzing COVID-19 Vaccination Sentiments in Nigerian Cyberspace: Insights from a Manually Annotated Twitter Dataset

    Authors: Ibrahim Said Ahmad, Lukman Jibril Aliyu, Abubakar Auwal Khalid, Saminu Muhammad Aliyu, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Bala Mairiga Abduljalil, Bello Shehu Bello, Amina Imam Abubakar

    Abstract: Numerous successes have been achieved in combating the COVID-19 pandemic, initially using various precautionary measures like lockdowns, social distancing, and the use of face masks. More recently, various vaccinations have been developed to aid in the prevention or reduction of the severity of the COVID-19 infection. Despite the effectiveness of the precautionary measures and the vaccines, there… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  12. arXiv:2401.02723   

    cs.LG cs.CV

    Predicting Traffic Flow with Federated Learning and Graph Neural with Asynchronous Computations Network

    Authors: Muhammad Yaqub, Shahzad Ahmad, Malik Abdul Manan, Imran Shabir Chuhan

    Abstract: Real-time traffic flow prediction holds significant importance within the domain of Intelligent Transportation Systems (ITS). The task of achieving a balance between prediction precision and computational efficiency presents a significant challenge. In this article, we present a novel deep-learning method called Federated Learning and Asynchronous Graph Convolutional Network (FLAGCN). Our framewor… ▽ More

    Submitted 5 April, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: I request to withdraw my paper from arXiv due to significant updates and improvements identified post-submission. These enhancements will substantially elevate the work's quality and impact. I plan to resubmit the revised paper upon completion of these updates. Thank you for accommodating this request

  13. arXiv:2401.01511  [pdf, other

    cs.IR

    Enhancing Multilingual Information Retrieval in Mixed Human Resources Environments: A RAG Model Implementation for Multicultural Enterprise

    Authors: Syed Rameel Ahmad

    Abstract: The advent of Large Language Models has revolutionized information retrieval, ushering in a new era of expansive knowledge accessibility. While these models excel in providing open-world knowledge, effectively extracting answers in diverse linguistic environments with varying levels of literacy remains a formidable challenge. Retrieval Augmented Generation (RAG) emerges as a promising solution, br… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  14. arXiv:2312.08010  [pdf, other

    cs.CV cs.LG

    EZ-CLIP: Efficient Zeroshot Video Action Recognition

    Authors: Shahzad Ahmad, Sukalpa Chanda, Yogesh S Rawat

    Abstract: Recent advancements in large-scale pre-training of visual-language models on paired image-text data have demonstrated impressive generalization capabilities for zero-shot tasks. Building on this success, efforts have been made to adapt these image-based visual-language models, such as CLIP, for videos extending their zero-shot capabilities to the video domain. While these adaptations have shown pr… ▽ More

    Submitted 19 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  15. arXiv:2312.05986  [pdf, other

    eess.IV cs.CV cs.LG

    Reconstruction of Cortical Surfaces with Spherical Topology from Infant Brain MRI via Recurrent Deformation Learning

    Authors: Xiaoyang Chen, Junjie Zhao, Siyuan Liu, Sahar Ahmad, Pew-Thian Yap

    Abstract: Cortical surface reconstruction (CSR) from MRI is key to investigating brain structure and function. While recent deep learning approaches have significantly improved the speed of CSR, a substantial amount of runtime is still needed to map the cortex to a topologically-correct spherical manifold to facilitate downstream geometric analyses. Moreover, this mapping is possible only if the topology of… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  16. arXiv:2311.12179  [pdf, other

    cs.CL

    Leveraging Closed-Access Multilingual Embedding for Automatic Sentence Alignment in Low Resource Languages

    Authors: Idris Abdulmumin, Auwal Abubakar Khalid, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Lukman Jibril Aliyu, Babangida Sani, Bala Mairiga Abduljalil, Sani Ahmad Hassan

    Abstract: The importance of qualitative parallel data in machine translation has long been determined but it has always been very difficult to obtain such in sufficient quantity for the majority of world languages, mainly because of the associated cost and also the lack of accessibility to these languages. Despite the potential for obtaining parallel datasets from online articles using automatic approaches,… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: To appear in the proceedings of ICCAIT 2023. 6 pages, 2 figures

  17. arXiv:2311.05903  [pdf, other

    cs.IR cs.AI

    Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users

    Authors: Jennifer Dodgson, Lin Nanzheng, Julian Peh, Akira Rafhael Janson Pattirane, Alfath Daryl Alhajir, Eko Ridho Dinarto, Joseph Lim, Syed Danyal Ahmad

    Abstract: Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-… ▽ More

    Submitted 19 March, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: 10 pages, LaTeX; typos corrected, using the correct term 'system prompting' instead of 'soft prompting'

  18. arXiv:2308.04402  [pdf

    cs.CV

    Person Re-Identification without Identification via Event Anonymization

    Authors: Shafiq Ahmad, Pietro Morerio, Alessio Del Bue

    Abstract: Wide-scale use of visual surveillance in public spaces puts individual privacy at stake while increasing resource consumption (energy, bandwidth, and computation). Neuromorphic vision sensors (event-cameras) have been recently considered a valid solution to the privacy issue because they do not capture detailed RGB visual information of the subjects in the scene. However, recent deep learning arch… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at International Conference on Computer Vision (ICCV), 2023

  19. arXiv:2307.15846  [pdf, other

    cs.CY

    Education 5.0: Requirements, Enabling Technologies, and Future Directions

    Authors: Shabir Ahmad, Sabina Umirzakova, Ghulam Mujtaba, Muhammad Sadiq Amin, Taegkeun Whangbo

    Abstract: We are currently in a post-pandemic era in which life has shifted to a digital world. This has affected many aspects of life, including education and learning. Education 5.0 refers to the fifth industrial revolution in education by leveraging digital technologies to eliminate barriers to learning, enhance learning methods, and promote overall well-being. The concept of Education 5.0 represents a n… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  20. arXiv:2306.03217  [pdf, other

    cs.GR

    Zero-shot CAD Program Re-Parameterization for Interactive Manipulation

    Authors: Milin Kodnongbua, Benjamin T. Jones, Maaz Bin Safeer Ahmad, Vladimir G. Kim, Adriana Schulz

    Abstract: Parametric CAD models encode entire families of shapes that should, in principle, be easy for designers to explore. However, in practice, parametric CAD models can be difficult to manipulate due to implicit semantic constraints among parameter values. Finding and enforcing these semantic constraints solely from geometry or programmatic shape representations is not possible because these constraint… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  21. arXiv:2306.01871  [pdf, other

    cs.RO

    Optimal Control of Connected Automated Vehicles with Event-Triggered Control Barrier Functions: a Test Bed for Safe Optimal Merging

    Authors: Ehsan Sabouni, H. M. Sabbir Ahmad, Wei Xiao, Christos G. Cassandras, Wenchao Li

    Abstract: We address the problem of controlling Connected and Automated Vehicles (CAVs) in conflict areas of a traffic network subject to hard safety constraints. It has been shown that such problems can be solved through a combination of tractable optimal control problems and Control Barrier Functions (CBFs) that guarantee the satisfaction of all constraints. These solutions can be reduced to a sequence of… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.12089, arXiv:2209.13053

  22. arXiv:2306.00932  [pdf

    cs.AI cs.DB

    Cross Modal Data Discovery over Structured and Unstructured Data Lakes

    Authors: Mohamed Y. Eltabakh, Mayuresh Kunjir, Ahmed Elmagarmid, Mohammad Shahmeer Ahmad

    Abstract: Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical… ▽ More

    Submitted 16 July, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Report number: 17

  23. arXiv:2305.17690  [pdf, other

    cs.CL

    HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

    Authors: Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, Aneesh Bose, Guneet Singh Kohli, Ibrahim Said Ahmad, Ketan Kotwal, Sayan Deb Sarkar, Ondřej Bojar, Habeebah Adamu Kakudi

    Abstract: This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fa… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023 as a long paper (Findings)

  24. arXiv:2305.16818  [pdf, other

    cs.MA cs.AI eess.SY

    Trust-Aware Resilient Control and Coordination of Connected and Automated Vehicles

    Authors: H M Sabbir Ahmad, Ehsan Sabouni, Wei Xiao, Christos G. Cassandras, Wenchao Li

    Abstract: We address the security of a network of Connected and Automated Vehicles (CAVs) cooperating to navigate through a conflict area. Adversarial attacks such as Sybil attacks can cause safety violations resulting in collisions and traffic jams. In addition, uncooperative (but not necessarily adversarial) CAVs can also induce similar adversarial effects on the traffic network. We propose a decentralize… ▽ More

    Submitted 2 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Keywords: Resilient control and coordination, Cybersecurity, Safety guaranteed coordination, Connected And Autonomous Vehicles

  25. arXiv:2305.06897  [pdf, other

    cs.CL cs.AI cs.IR

    AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

    Authors: Odunayo Ogundepo, Tajuddeen R. Gwadabe, Clara E. Rivera, Jonathan H. Clark, Sebastian Ruder, David Ifeoluwa Adelani, Bonaventure F. P. Dossou, Abdou Aziz DIOP, Claytone Sikasote, Gilles Hacheme, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Emezue, Albert Njoroge Kahira, Shamsuddeen H. Muhammad, Akintunde Oladipo, Abraham Toluwase Owodunni, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Akari Asai, Tunde Oluwaseyi Ajayi, Clemencia Siro, Steven Arthur , et al. (27 additional authors not shown)

    Abstract: African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  26. arXiv:2305.00076  [pdf, other

    cs.CL

    HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-Information for Multi-Level Sexism Classification

    Authors: Saminu Mohammad Aliyu, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Saheed Abdullahi Salahudeen, Aliyu Yusuf, Falalu Ibrahim Lawan

    Abstract: We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and Reddit dataset. We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures

  27. arXiv:2304.13634  [pdf, other

    cs.CL

    HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis

    Authors: Saheed Abdullahi Salahudeen, Falalu Ibrahim Lawan, Ahmad Mustapha Wali, Amina Abubakar Imam, Aliyu Rabiu Shuaibu, Aliyu Yusuf, Nur Bala Rabiu, Musa Bello, Shamsuddeen Umaru Adamu, Saminu Mohammad Aliyu, Murja Sani Gadanya, Sanah Abdullahi Muaz, Mahmoud Said Ahmad, Abdulkadir Abdullahi, Abdulmalik Yusuf Jamoh

    Abstract: We present the findings of SemEval-2023 Task 12, a shared task on sentiment analysis for low-resource African languages using Twitter dataset. The task featured three subtasks; subtask A is monolingual sentiment classification with 12 tracks which are all monolingual languages, subtask B is multilingual sentiment classification using the tracks in subtask A and subtask C is a zero-shot sentiment c… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  28. arXiv:2304.06845  [pdf, other

    cs.CL

    SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)

    Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif M. Mohammad, Meriem Beloucif, Sebastian Ruder

    Abstract: We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github.com/afrisenti-semeval/afrisent-semeval-2023. AfriSenti-SemEval is a sentiment classification challenge in 14 African languages: Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oro… ▽ More

    Submitted 1 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 19 pages, 5 figures, 6 tables

  29. arXiv:2303.16909  [pdf, other

    cs.DB cs.AI

    RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes

    Authors: Mohammad Shahmeer Ahmad, Zan Ahmad Naeem, Mohamed Eltabakh, Mourad Ouzzani, Nan Tang

    Abstract: Can foundation models (such as ChatGPT) clean your data? In this proposal, we demonstrate that indeed ChatGPT can assist in data cleaning by suggesting corrections for specific cells in a data table (scenario 1). However, ChatGPT may struggle with datasets it has never encountered before (e.g., local enterprise data) or when the user requires an explanation of the source of the suggested clean val… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  30. arXiv:2302.08956  [pdf, other

    cs.CL

    AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

    Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku , et al. (1 additional authors not shown)

    Abstract: Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti… ▽ More

    Submitted 4 November, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: 14 pages, 3 Figures, 10 Tables

  31. A Bayesian Generative Adversarial Network (GAN) to Generate Synthetic Time-Series Data, Application in Combined Sewer Flow Prediction

    Authors: Amin E. Bakhshipour, Alireza Koochali, Ulrich Dittmer, Ali Haghighi, Sheraz Ahmad, Andreas Dengel

    Abstract: Despite various breakthroughs in machine learning and data analysis techniques for improving smart operation and management of urban water infrastructures, some key limitations obstruct this progress. Among these shortcomings, the absence of freely available data due to data privacy or high costs of data gathering and the nonexistence of adequate rare or extreme events in the available data plays… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted in WDSA/CCWI 2022 Conference

  32. A Comparative Study of Pretrained Language Models for Long Clinical Text

    Authors: Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

    Abstract: Objective: Clinical knowledge enriched transformer models (e.g., ClinicalBERT) have state-of-the-art results on clinical NLP (natural language processing) tasks. One of the core limitations of these transformer models is the substantial memory consumption due to their full self-attention mechanism, which leads to the performance degradation in long clinical texts. To overcome this, we propose to l… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2201.11838

  33. arXiv:2301.05264  [pdf, other

    cs.CR cs.AI cs.ET

    Security-Aware Approximate Spiking Neural Networks

    Authors: Syed Tihaam Ahmad, Ayesha Siddique, Khaza Anuarul Hoque

    Abstract: Deep Neural Networks (DNNs) and Spiking Neural Networks (SNNs) are both known for their susceptibility to adversarial attacks. Therefore, researchers in the recent past have extensively studied the robustness and defense of DNNs and SNNs under adversarial attacks. Compared to accurate SNNs (AccSNN), approximate SNNs (AxSNNs) are known to be up to 4X more energy-efficient for ultra-low power applic… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Accepted full paper in DATE 2023

  34. arXiv:2301.01369  [pdf, other

    eess.IV cs.CV cs.LG

    Brain Tissue Segmentation Across the Human Lifespan via Supervised Contrastive Learning

    Authors: Xiaoyang Chen, Jinjian Wu, Wenjiao Lyu, Yicheng Zou, Kim-Han Thung, Siyuan Liu, Ye Wu, Sahar Ahmad, Pew-Thian Yap

    Abstract: Automatic segmentation of brain MR images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is critical for tissue volumetric analysis and cortical surface reconstruction. Due to dramatic structural and appearance changes associated with developmental and aging processes, existing brain tissue segmentation methods are only viable for specific age groups. Consequently, methods… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  35. Flexible Supervised Autonomy for Exploration in Subterranean Environments

    Authors: Harel Biggie, Eugene R. Rush, Danny G. Riley, Shakeeb Ahmad, Michael T. Ohradzansky, Kyle Harlow, Michael J. Miles, Daniel Torres, Steve McGuire, Eric W. Frew, Christoffer Heckman, J. Sean Humbert

    Abstract: While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue… ▽ More

    Submitted 11 April, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: Field Robotics special issue: DARPA Subterranean Challenge, Advancement and Lessons Learned from the Finals

  36. arXiv:2211.15262  [pdf, other

    cs.CL

    HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

    Authors: Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Ibrahim Said Ahmad

    Abstract: Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: To appear in the Proceedings of the Sixth Workshop on Widening Natural Language Processing at EMNLP2022

  37. arXiv:2211.14669  [pdf, other

    cs.LG cs.AI cs.GT

    Game Theoretic Mixed Experts for Combinational Adversarial Machine Learning

    Authors: Ethan Rathbun, Kaleel Mahmood, Sohaib Ahmad, Caiwen Ding, Marten van Dijk

    Abstract: Recent advances in adversarial machine learning have shown that defenses considered to be robust are actually susceptible to adversarial attacks which are specifically customized to target their weaknesses. These defenses include Barrage of Random Transforms (BaRT), Friendly Adversarial Training (FAT), Trash is Treasure (TiT) and ensemble models made up of Vision Transformers (ViTs), Big Transfer… ▽ More

    Submitted 29 April, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

    Comments: 17pages, 10 figures

    ACM Class: I.2; I.4

  38. arXiv:2211.09392  [pdf, other

    cs.CV cs.AI cs.LG

    Data Dimension Reduction makes ML Algorithms efficient

    Authors: Wisal Khan, Muhammad Turab, Waqas Ahmad, Syed Hasnat Ahmad, Kelash Kumar, Bin Luo

    Abstract: Data dimension reduction (DDR) is all about mapping data from high dimensions to low dimensions, various techniques of DDR are being used for image dimension reduction like Random Projections, Principal Component Analysis (PCA), the Variance approach, LSA-Transform, the Combined and Direct approaches, and the New Random Approach. Auto-encoders (AE) are used to learn end-to-end mapping. In this pap… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Our paper is accepted at International Conference On Emerging Technologies In Electronics, Computing And Communication (ICETECC) 2022

  39. arXiv:2211.03674  [pdf, other

    cs.CG cs.LG

    Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds

    Authors: Stefan Rass, Sandra König, Shahzad Ahmad, Maksim Goman

    Abstract: Given a set of points in the Euclidean space $\mathbb{R}^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\mathbb{R}^\ell$ with. Hence, the distance $d(\mathbf x,\mathbf y)=δ$ between two points is fixed by the choice of $\mathbf x$ and $\mathbf y$ and $d$. We study the related problem of fixing the value $δ$,… ▽ More

    Submitted 25 April, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

  40. Vision-Based Robust Lane Detection and Tracking under Different Challenging Environmental Conditions

    Authors: Samia Sultana, Boshir Ahmed, Manoranjan Paul, Muhammad Rafiqul Islam, Shamim Ahmad

    Abstract: Lane marking detection is fundamental for both advanced driving assistance systems. However, detecting lane is highly challenging when the visibility of a road lane marking is low due to real-life challenging environment and adverse weather. Most of the lane detection methods suffer from four types of challenges: (i) light effects i.e., shadow, glare of light, reflection etc.; (ii) Obscured visibi… ▽ More

    Submitted 14 June, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 19 pages, 11 figures, submitted to IEEE Access

  41. arXiv:2210.03072  [pdf, ps, other

    cs.CV cs.HC

    IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C)

    Authors: Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Sanka Rasnayaka, Sachith Seneviratne, Vipula Dissanayake, Jonathan Liebers, Ashhadul Islam, Samir Brahim Belhaouari, Sumaiya Ahmad, Suraiya Jabin

    Abstract: This paper describes the experimental framework and results of the IJCB 2022 Mobile Behavioral Biometrics Competition (MobileB2C). The aim of MobileB2C is benchmarking mobile user authentication systems based on behavioral biometric traits transparently acquired by mobile devices during ordinary Human-Computer Interaction (HCI), using a novel public database, BehavePassDB, and a standard experimen… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  42. arXiv:2209.11020  [pdf, other

    cs.CV cs.CR

    Privacy Attacks Against Biometric Models with Fewer Samples: Incorporating the Output of Multiple Models

    Authors: Sohaib Ahmad, Benjamin Fuller, Kaleel Mahmood

    Abstract: Authentication systems are vulnerable to model inversion attacks where an adversary is able to approximate the inverse of a target machine learning model. Biometric models are a prime candidate for this type of attack. This is because inverting a biometric model allows the attacker to produce a realistic biometric input to spoof biometric authentication systems. One of the main constraints in co… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: This is a major revision of a paper titled "Inverting Biometric Models with Fewer Samples: Incorporating the Output of Multiple Models" by the same authors that appears at IJCB 2022

  43. arXiv:2208.04825  [pdf, other

    eess.IV cs.CV

    Longitudinal Prediction of Postnatal Brain Magnetic Resonance Images via a Metamorphic Generative Adversarial Network

    Authors: Yunzhi Huang, Sahar Ahmad, Luyi Han, Shuai Wang, Zhengwang Wu, Weili Lin, Gang Li, Li Wang, Pew-Thian Yap

    Abstract: Missing scans are inevitable in longitudinal studies due to either subject dropouts or failed scans. In this paper, we propose a deep learning framework to predict missing scans from acquired scans, catering to longitudinal infant studies. Prediction of infant brain MRI is challenging owing to the rapid contrast and structural changes particularly during the first year of life. We introduce a trus… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  44. arXiv:2207.02726  [pdf, other

    cs.LG cs.AI cs.HC eess.SP

    Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

    Authors: Ana Lucic, Sheeraz Ahmad, Amanda Furtado Brinhosa, Vera Liao, Himani Agrawal, Umang Bhatt, Krishnaram Kenthapadi, Alice Xiang, Maarten de Rijke, Nicholas Drabowski

    Abstract: When using medical images for diagnosis, either by clinicians or artificial intelligence (AI) systems, it is important that the images are of high quality. When an image is of low quality, the medical exam that produced the image often needs to be redone. In telemedicine, a common problem is that the quality issue is only flagged once the patient has left the clinic, meaning they must return in or… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML 2022 Workshop on Interpretable ML in Healthcare

  45. arXiv:2205.01133  [pdf, other

    cs.CL cs.CV cs.LG

    Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation

    Authors: Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Hassan Muhammad, Ibrahim Sa'id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci, Bello Shehu Bello

    Abstract: Multi-modal Machine Translation (MMT) enables the use of visual information to enhance the quality of translations. The visual information can serve as a valuable piece of context information to decrease the ambiguity of input sentences. Despite the increasing popularity of such a technique, good and sizeable datasets are scarce, limiting the full extent of their potential. Hausa, a Chadic languag… ▽ More

    Submitted 6 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: Accepted at Language Resources and Evaluation Conference 2022 (LREC2022)

  46. arXiv:2201.11838  [pdf, ps, other

    cs.CL cs.AI

    Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

    Authors: Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad, Hanyin Wang, Yuan Luo

    Abstract: Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory cons… ▽ More

    Submitted 15 April, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

  47. arXiv:2201.08277  [pdf, other

    cs.CL cs.AI

    NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

    Authors: Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, Ibrahim Said Ahmad, Idris Abdulmumin, Bello Shehu Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahudeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, Pavel Brazdil

    Abstract: Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria (Hausa, Igbo, Nigerian-Pidgin, and Yorùbá ) consisting of around 30,000 annotated tweets per language (and 14,000 for Nigerian-Pidgin… ▽ More

    Submitted 18 June, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022, 13 pages, 2 figures

  48. arXiv:2201.00042  [pdf, other

    cs.NE cs.AI cs.LG q-bio.NC

    Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

    Authors: Abhiram Iyer, Karan Grewal, Akash Velu, Lucas Oliveira Souza, Jeremy Forest, Subutai Ahmad

    Abstract: A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another… ▽ More

    Submitted 25 April, 2022; v1 submitted 31 December, 2021; originally announced January 2022.

    Comments: 31 pages, 17 figures

    Journal ref: Frontiers in Neurorobotics 16 2022 (1-23)

  49. arXiv:2112.13896  [pdf, other

    cs.LG cs.AI cs.AR cs.NE

    Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

    Authors: Kevin Lee Hunter, Lawrence Spracklen, Subutai Ahmad

    Abstract: In principle, sparse neural networks should be significantly more efficient than traditional dense networks. Neurons in the brain exhibit two types of sparsity; they are sparsely interconnected and sparsely active. These two types of sparsity, called weight sparsity and activation sparsity, when combined, offer the potential to reduce the computational cost of neural networks by two orders of magn… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

    Comments: 32 pages and 20 figures

  50. arXiv:2110.04390  [pdf, other

    cs.RO

    Multi-Agent Autonomy: Advancements and Challenges in Subterranean Exploration

    Authors: Michael T. Ohradzansky, Eugene R. Rush, Danny G. Riley, Andrew B. Mills, Shakeeb Ahmad, Steve McGuire, Harel Biggie, Kyle Harlow, Michael J. Miles, Eric W. Frew, Christoffer Heckman, J. Sean Humbert

    Abstract: Artificial intelligence has undergone immense growth and maturation in recent years, though autonomous systems have traditionally struggled when fielded in diverse and previously unknown environments. DARPA is seeking to change that with the Subterranean Challenge, by providing roboticists the opportunity to support civilian and military first responders in complex and high-risk underground scenar… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: 39 pages, 21 figures, Field Robotics special issue: Advancements and lessons learned during Phase I & II of the DARPA Subterranean Challenge