subscribe to arXiv mailings

arXiv:2407.08754 [pdf]

Unraveling the Versatility and Impact of Multi-Objective Optimization: Algorithms, Applications, and Trends for Solving Complex Real-World Problems

Authors: Noor A. Rashed, Yossra H. Ali, Tarik A. Rashid, A. Salih

Abstract: Multi-Objective Optimization (MOO) techniques have become increasingly popular in recent years due to their potential for solving real-world problems in various fields, such as logistics, finance, environmental management, and engineering. These techniques offer comprehensive solutions that traditional single-objective approaches fail to provide. Due to the many innovative algorithms, it has been… ▽ More Multi-Objective Optimization (MOO) techniques have become increasingly popular in recent years due to their potential for solving real-world problems in various fields, such as logistics, finance, environmental management, and engineering. These techniques offer comprehensive solutions that traditional single-objective approaches fail to provide. Due to the many innovative algorithms, it has been challenging for researchers to choose the optimal algorithms for solving their problems. This paper examines recently developed MOO-based algorithms. MOO is introduced along with Pareto optimality and trade-off analysis. In real-world case studies, MOO algorithms address complicated decision-making challenges. This paper examines algorithmic methods, applications, trends, and issues in multi-objective optimization research. This exhaustive review explains MOO algorithms, their methods, and their applications to real-world problems. This paper aims to contribute further advancements in MOO research. No singular strategy is superior; instead, selecting a particular method depends on the natural optimization problem, the computational resources available, and the specific objectives of the optimization tasks. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 21 pages

arXiv:2407.03369 [pdf]

FOXANN: A Method for Boosting Neural Network Performance

Authors: Mahmood A. Jumaah, Yossra H. Ali, Tarik A. Rashid, S. Vimal

Abstract: Artificial neural networks play a crucial role in machine learning and there is a need to improve their performance. This paper presents FOXANN, a novel classification model that combines the recently developed Fox optimizer with ANN to solve ML problems. Fox optimizer replaces the backpropagation algorithm in ANN; optimizes synaptic weights; and achieves high classification accuracy with a minimu… ▽ More Artificial neural networks play a crucial role in machine learning and there is a need to improve their performance. This paper presents FOXANN, a novel classification model that combines the recently developed Fox optimizer with ANN to solve ML problems. Fox optimizer replaces the backpropagation algorithm in ANN; optimizes synaptic weights; and achieves high classification accuracy with a minimum loss, improved model generalization, and interpretability. The performance of FOXANN is evaluated on three standard datasets: Iris Flower, Breast Cancer Wisconsin, and Wine. The results presented in this paper are derived from 100 epochs using 10-fold cross-validation, ensuring that all dataset samples are involved in both the training and validation stages. Moreover, the results show that FOXANN outperforms traditional ANN and logistic regression methods as well as other models proposed in the literature such as ABC-ANN, ABC-MNN, CROANN, and PSO-DNN, achieving a higher accuracy of 0.9969 and a lower validation loss of 0.0028. These results demonstrate that FOXANN is more effective than traditional methods and other proposed models across standard datasets. Thus, FOXANN effectively addresses the challenges in ML algorithms and improves classification performance. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 12 pages

arXiv:2407.00518 [pdf, other]

When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration

Authors: Philipp Allgeuer, Hassan Ali, Stefan Wermter

Abstract: We investigate the use of Large Language Models (LLMs) to equip neural robotic agents with human-like social and cognitive competencies, for the purpose of open-ended human-robot conversation and collaboration. We introduce a modular and extensible methodology for grounding an LLM with the sensory perceptions and capabilities of a physical robot, and integrate multiple deep learning models through… ▽ More We investigate the use of Large Language Models (LLMs) to equip neural robotic agents with human-like social and cognitive competencies, for the purpose of open-ended human-robot conversation and collaboration. We introduce a modular and extensible methodology for grounding an LLM with the sensory perceptions and capabilities of a physical robot, and integrate multiple deep learning models throughout the architecture in a form of system integration. The integrated models encompass various functions such as speech recognition, speech generation, open-vocabulary object detection, human pose estimation, and gesture detection, with the LLM serving as the central text-based coordinating unit. The qualitative and quantitative results demonstrate the huge potential of LLMs in providing emergent cognition and interactive language-oriented control of robots in a natural and social manner. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Journal ref: International Conference on Artificial Neural Networks 2024

arXiv:2406.08344 [pdf, other]

Blind Image Deblurring using FFT-ReLU with Deep Learning Pipeline Integration

Authors: Abdul Mohaimen Al Radi, Prothito Shovon Majumder, Syed Mumtahin Mahmud, Mahdi Mohd Hossain Noki, Md. Haider Ali, Md. Mosaddek Khan

Abstract: Blind image deblurring is the process of deriving a sharp image and a blur kernel from a blurred image. Blurry images are typically modeled as the convolution of a sharp image with a blur kernel, necessitating the estimation of the unknown blur kernel to perform blind image deblurring effectively. Existing approaches primarily focus on domain-specific features of images, such as salient edges, dar… ▽ More Blind image deblurring is the process of deriving a sharp image and a blur kernel from a blurred image. Blurry images are typically modeled as the convolution of a sharp image with a blur kernel, necessitating the estimation of the unknown blur kernel to perform blind image deblurring effectively. Existing approaches primarily focus on domain-specific features of images, such as salient edges, dark channels, and light streaks. These features serve as probabilistic priors to enhance the estimation of the blur kernel. For improved generality, we propose a novel prior (ReLU sparsity prior) that estimates blur kernel effectively across all distributions of images (natural, facial, text, low-light, saturated etc). Our approach demonstrates superior efficiency, with inference times up to three times faster, while maintaining high accuracy in PSNR, SSIM, and error ratio metrics. We also observe noticeable improvement in the performance of the state-of-the-art architectures (in terms of aforementioned metrics) in deep learning based approaches when our method is used as a post-processing unit. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 20 pages, 13 figures

arXiv:2406.00667 [pdf, other]

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Authors: Sulaiman Khan, Md. Rafiul Biswas, Alina Murad, Hazrat Ali, Zubair Shah

Abstract: Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantl… ▽ More Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (\textit{gemini-1.0-pro-vision-latest}) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT-4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted in Fifth IEEE Workshop on Artificial Intelligence for HealthCare, IEEE 25th International Conference on Information Reuse and Integration for Data Science

arXiv:2405.01612 [pdf]

Effective Delegation and Leadership in Software Management

Authors: Star Dawood Mirkhan, Skala Kamaran Omer, Hussein Mohammed Ali, Mahmood Yashar Hamza, Tarik Ahmed Rashid, Poornima Nedunchezhian

Abstract: Delegation and leadership are critical components of software management, as they play a crucial role in determining the success of the software development process. This study examined the relationship between delegation and leadership in software management and the impact of these factors on project outcomes. Results showed that effective delegation and transformational leadership styles can imp… ▽ More Delegation and leadership are critical components of software management, as they play a crucial role in determining the success of the software development process. This study examined the relationship between delegation and leadership in software management and the impact of these factors on project outcomes. Results showed that effective delegation and transformational leadership styles can improve workflow, enhance team motivation and productivity, and ultimately lead to successful software development projects. The findings of this study have important implications for software management practices, as they suggest that organizations and software managers should prioritize the development of effective delegation and leadership practices to ensure the success of their software development initiatives. Further research is needed to explore the complex interplay between delegation and leadership in software management and to identify best practices for improving these processes. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 9 pages

arXiv:2405.01608 [pdf]

A Comprehensive Study on Automated Testing with the Software Lifecycle

Authors: Hussein Mohammed Ali, Mahmood Yashar Hamza, Tarik Ahmed Rashid

Abstract: The software development lifecycle depends heavily on the testing process, which is an essential part of finding issues and reviewing the quality of software. Software testing can be done in two ways: manually and automatically. With an emphasis on its primary function within the software lifecycle, the relevance of testing in general, and the advantages that come with it, this article aims to giv… ▽ More The software development lifecycle depends heavily on the testing process, which is an essential part of finding issues and reviewing the quality of software. Software testing can be done in two ways: manually and automatically. With an emphasis on its primary function within the software lifecycle, the relevance of testing in general, and the advantages that come with it, this article aims to give a thorough review of automated testing. Finding time- and cost-effective methods for software testing. The research examines how automated testing makes it easier to evaluate software quality, how it saves time as compared to manual testing, and how it differs from each of them in terms of benefits and drawbacks. The process of testing software applications is simplified, customized to certain testing situations, and can be successfully carried out by using automated testing tools. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 9

arXiv:2404.15168 [pdf, other]

Artificial Neural Networks to Recognize Speakers Division from Continuous Bengali Speech

Authors: Hasmot Ali, Md. Fahad Hossain, Md. Mehedi Hasan, Sheikh Abujar, Sheak Rashed Haider Noori

Abstract: Voice based applications are ruling over the era of automation because speech has a lot of factors that determine a speakers information as well as speech. Modern Automatic Speech Recognition (ASR) is a blessing in the field of Human-Computer Interaction (HCI) for efficient communication among humans and devices using Artificial Intelligence technology. Speech is one of the easiest mediums of comm… ▽ More Voice based applications are ruling over the era of automation because speech has a lot of factors that determine a speakers information as well as speech. Modern Automatic Speech Recognition (ASR) is a blessing in the field of Human-Computer Interaction (HCI) for efficient communication among humans and devices using Artificial Intelligence technology. Speech is one of the easiest mediums of communication because it has a lot of identical features for different speakers. Nowadays it is possible to determine speakers and their identity using their speech in terms of speaker recognition. In this paper, we presented a method that will provide a speakers geographical identity in a certain region using continuous Bengali speech. We consider eight different divisions of Bangladesh as the geographical region. We applied the Mel Frequency Cepstral Coefficient (MFCC) and Delta features on an Artificial Neural Network to classify speakers division. We performed some preprocessing tasks like noise reduction and 8-10 second segmentation of raw audio before feature extraction. We used our dataset of more than 45 hours of audio data from 633 individual male and female speakers. We recorded the highest accuracy of 85.44%. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.08424 [pdf, other]

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task

Authors: Hassan Ali, Philipp Allgeuer, Stefan Wermter

Abstract: Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collab… ▽ More Intention-based Human-Robot Interaction (HRI) systems allow robots to perceive and interpret user actions to proactively interact with humans and adapt to their behavior. Therefore, intention prediction is pivotal in creating a natural interactive collaboration between humans and robots. In this paper, we examine the use of Large Language Models (LLMs) for inferring human intention during a collaborative object categorization task with a physical robot. We introduce a hierarchical approach for interpreting user non-verbal cues, like hand gestures, body poses, and facial expressions and combining them with environment states and user verbal cues captured using an existing Automatic Speech Recognition (ASR) system. Our evaluation demonstrates the potential of LLMs to interpret non-verbal cues and to combine them with their context-understanding capabilities and real-world knowledge to support intention prediction during human-robot interaction. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.05210 [pdf, other]

TIPS: Threat Sharing Information Platform for Enhanced Security

Authors: Lakshmi Rama Kiran Pasumarthy, Hisham Ali, William J Buchanan, Jawad Ahmad, Audun Josang, Vasileios Mavroeidis, Mouad Lemoudden

Abstract: There is an increasing need to share threat information for the prevention of widespread cyber-attacks. While threat-related information sharing can be conducted through traditional information exchange methods, such as email communications etc., these methods are often weak in terms of their trustworthiness and privacy. Additionally, the absence of a trust infrastructure between different informa… ▽ More There is an increasing need to share threat information for the prevention of widespread cyber-attacks. While threat-related information sharing can be conducted through traditional information exchange methods, such as email communications etc., these methods are often weak in terms of their trustworthiness and privacy. Additionally, the absence of a trust infrastructure between different information-sharing domains also poses significant challenges. These challenges include redactment of information, the Right-to-be-forgotten, and access control to the information-sharing elements. These access issues could be related to time bounds, the trusted deletion of data, and the location of accesses. This paper presents an abstraction of a trusted information-sharing process which integrates Attribute-Based Encryption (ABE), Homomorphic Encryption (HE) and Zero Knowledge Proof (ZKP) integrated into a permissioned ledger, specifically Hyperledger Fabric (HLF). It then provides a protocol exchange between two threat-sharing agents that share encrypted messages through a trusted channel. This trusted channel can only be accessed by those trusted in the sharing and could be enabled for each data-sharing element or set up for long-term sharing. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18614 [pdf, other]

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

Authors: Hafiz Tiomoko Ali, Umberto Michieli, Ji Joong Moon, Daehyun Kim, Mete Ozay

Abstract: The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability… ▽ More The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: ICASSP 2024. Copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other

arXiv:2402.16562 [pdf]

Q-FOX Learning: Breaking Tradition in Reinforcement Learning

Authors: Mahmood A. Jumaah, Yossra H. Ali, Tarik A. Rashid

Abstract: Reinforcement learning (RL) is a subset of artificial intelligence (AI) where agents learn the best action by interacting with the environment, making it suitable for tasks that do not require labeled data or direct supervision. Hyperparameters (HP) tuning refers to choosing the best parameter that leads to optimal solutions in RL algorithms. Manual or random tuning of the HP may be a crucial proc… ▽ More Reinforcement learning (RL) is a subset of artificial intelligence (AI) where agents learn the best action by interacting with the environment, making it suitable for tasks that do not require labeled data or direct supervision. Hyperparameters (HP) tuning refers to choosing the best parameter that leads to optimal solutions in RL algorithms. Manual or random tuning of the HP may be a crucial process because variations in this parameter lead to changes in the overall learning aspects and different rewards. In this paper, a novel and automatic HP-tuning method called Q-FOX is proposed. This uses both the FOX optimizer, a new optimization method inspired by nature that mimics red foxes' hunting behavior, and the commonly used, easy-to-implement RL Q-learning algorithm to solve the problem of HP tuning. Moreover, a new objective function is proposed which prioritizes the reward over the mean squared error (MSE) and learning time (steps). Q-FOX has been evaluated on two OpenAI Gym environment control tasks: Cart Pole and Frozen Lake. It exposed greater cumulative rewards than HP tuning with other optimizers, such as PSO, GA, Bee, or randomly selected HP. The cumulative reward for the Cart Pole task was 32.08, and for the Frozen Lake task was 0.95. Despite the robustness of Q-FOX, it has limitations. It cannot be used directly in real-word problems before choosing the HP in a simulation environment because its processes work iteratively, making it time-consuming. The results indicate that Q-FOX has played an essential role in HP tuning for RL algorithms to effectively solve different control tasks. △ Less

Submitted 29 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.11050 [pdf, other]

Adaptive Constellation Multiple Access for Beyond 5G Wireless Systems

Authors: Indu L. Shakya, Falah H. Ali

Abstract: We propose a novel nonorthogonal multiple access (NOMA) scheme referred as adaptive constellation multiple access (ACMA) which addresses key limitations of existing NOMA schemes for beyond 5G wireless systems. Unlike the latter, that are often constrained in choices of allocation of power, modulations and phases to allow enough separation of clusters from users combined signals, ACMA is power, mod… ▽ More We propose a novel nonorthogonal multiple access (NOMA) scheme referred as adaptive constellation multiple access (ACMA) which addresses key limitations of existing NOMA schemes for beyond 5G wireless systems. Unlike the latter, that are often constrained in choices of allocation of power, modulations and phases to allow enough separation of clusters from users combined signals, ACMA is power, modulation and phase agnostic forming unified constellations instead where distances of all possible neighbouring points are optimized. It includes an algorithm at basestation (BS) calculating phase offsets for users signals such that, when combined, it gives best minimum Euclidean distance of points from all possibilities. The BS adaptively changes the phase offsets whenever system parameters change. We also propose an enhanced receiver using a modified maximum likelihood (MML) method that dynamically exploits information from the BS to blindly estimate correct phase offsets and exploit them to enhance data rate and error performances. Superiority of this scheme, which may also be referred to as AC NOMA, is verified through extensive analyses and simulations. △ Less

Submitted 28 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 5 pages, 6 figures, Submission to an IEEE Journal

arXiv:2402.05158 [pdf, other]

Enhancement of Bengali OCR by Specialized Models and Advanced Techniques for Diverse Document Types

Authors: AKM Shahariar Azad Rabby, Hasmot Ali, Md. Majedul Islam, Sheikh Abujar, Fuad Rahman

Abstract: This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and han… ▽ More This research paper presents a unique Bengali OCR system with some capabilities. The system excels in reconstructing document layouts while preserving structure, alignment, and images. It incorporates advanced image and signature detection for accurate extraction. Specialized models for word segmentation cater to diverse document types, including computer-composed, letterpress, typewriter, and handwritten documents. The system handles static and dynamic handwritten inputs, recognizing various writing styles. Furthermore, it has the ability to recognize compound characters in Bengali. Extensive data collection efforts provide a diverse corpus, while advanced technical components optimize character and word recognition. Additional contributions include image, logo, signature and table recognition, perspective correction, layout reconstruction, and a queuing module for efficient and scalable processing. The system demonstrates outstanding performance in efficient and accurate text extraction and analysis. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 8 pages, 7 figures, 4 table Link of the paper https://openaccess.thecvf.com/content/WACV2024W/WVLL/html/Rabby_Enhancement_of_Bengali_OCR_by_Specialized_Models_and_Advanced_Techniques_WACVW_2024_paper.html

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 1102-1109

arXiv:2401.15417 [pdf, other]

Fault Diagnosis on Induction Motor using Machine Learning and Signal Processing

Authors: Muhammad Samiullah, Hasan Ali, Shehryar Zahoor, Anas Ali

Abstract: The detection and identification of induction motor faults using machine learning and signal processing is a valuable approach to avoiding plant disturbances and shutdowns in the context of Industry 4.0. In this work, we present a study on the detection and identification of induction motor faults using machine learning and signal processing with MATLAB Simulink. We developed a model of a three-ph… ▽ More The detection and identification of induction motor faults using machine learning and signal processing is a valuable approach to avoiding plant disturbances and shutdowns in the context of Industry 4.0. In this work, we present a study on the detection and identification of induction motor faults using machine learning and signal processing with MATLAB Simulink. We developed a model of a three-phase induction motor in MATLAB Simulink to generate healthy and faulty motor data. The data collected included stator currents, rotor currents, input power, slip, rotor speed, and efficiency. We generated four faults in the induction motor: open circuit fault, short circuit fault, overload, and broken rotor bars. We collected a total of 150,000 data points with a 60-40% ratio of healthy to faulty motor data. We applied Fast Fourier Transform (FFT) to detect and identify healthy and unhealthy conditions and added a distinctive feature in our data. The generated dataset was trained different machine learning models. On comparing the accuracy of the models on the test set, we concluded that the Decision Tree algorithm performed the best with an accuracy of about 92%. Our study contributes to the literature by providing a valuable approach to fault detection and classification with machine learning models for industrial applications. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 6 pages, 17 figures, 2 tables

arXiv:2401.07216 [pdf, other]

doi 10.1145/3627508.3638309

Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot

Authors: Sachin Pathiyan Cherumanal, Lin Tian, Futoon M. Abushaqra, Angel Felipe Magnossao de Paula, Kaixin Ji, Danula Hettiachchi, Johanne R. Trippas, Halil Ali, Falk Scholer, Damiano Spina

Abstract: Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a cus… ▽ More Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals' practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: Accepted at 2024 ACM SIGIR CHIIR

arXiv:2401.03389 [pdf]

doi 10.33166/AETiC.2024.01.001

Optimisation and Performance Computation of a Phase Frequency Detector Module for IoT Devices

Authors: Md. Shahriar Khan Hemel, Mamun Bin Ibne Reaz, Sawal Hamid Bin Md Ali, Mohammad Arif Sobhan Bhuiyan, Mahdi H. Miraz

Abstract: The Internet of Things (IoT) is pivotal in transforming the way we live and interact with our surroundings. To cope with the advancement in technologies, it is vital to acquire accuracy with the speed. A phase frequency detector (PFD) is a critical device to regulate and provide accurate frequency in IoT devices. Designing a PFD poses challenges in achieving precise phase detection, minimising dea… ▽ More The Internet of Things (IoT) is pivotal in transforming the way we live and interact with our surroundings. To cope with the advancement in technologies, it is vital to acquire accuracy with the speed. A phase frequency detector (PFD) is a critical device to regulate and provide accurate frequency in IoT devices. Designing a PFD poses challenges in achieving precise phase detection, minimising dead zones, optimising power consumption, and ensuring robust performance across various operational frequencies, necessitating complex engineering and innovative solutions. This study delves into optimising a PFD circuit, designed using 90 nm standard CMOS technology, aiming to achieve superior operational frequencies. An efficient and high-frequency PFD design is crafted and analysed using cadence virtuoso. The study focused on investigating the impact of optimising PFD design. With the optimised PFD, an operational frequency of 5 GHz has been achieved, along with a power consumption of only 29 μW. The dead zone of the PFD was only 25 ps. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Journal ref: Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN: 2516-029X, pp. 13-21, Vol. 8, No. 1, 1st January 2024, Available: http://aetic.theiaer.org/archive/v8/v8n1/p1.html

arXiv:2401.03191 [pdf, other]

DistFormer: Enhancing Local and Global Features for Monocular Per-Object Distance Estimation

Authors: Aniello Panariello, Gianluca Mancusi, Fedy Haj Ali, Angelo Porrello, Simone Calderara, Rita Cucchiara

Abstract: Accurate per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics. Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information, which encodes the semantics of the scene as well as the spatial relations with neighboring objects. However, these approaches may struggle with… ▽ More Accurate per-object distance estimation is crucial in safety-critical applications such as autonomous driving, surveillance, and robotics. Existing approaches rely on two scales: local information (i.e., the bounding box proportions) or global information, which encodes the semantics of the scene as well as the spatial relations with neighboring objects. However, these approaches may struggle with long-range objects and in the presence of strong occlusions or unusual visual patterns. In this respect, our work aims to strengthen both local and global cues. Our architecture -- named DistFormer -- builds upon three major components acting jointly: i) a robust context encoder extracting fine-grained per-object representations; ii) a masked encoder-decoder module exploiting self-supervision to promote the learning of useful per-object features; iii) a global refinement module that aggregates object representations and computes a joint, spatially-consistent estimation. To evaluate the effectiveness of DistFormer, we conduct experiments on the standard KITTI dataset and the large-scale NuScenes and MOTSynth datasets. Such datasets cover various indoor/outdoor environments, changing weather conditions, appearances, and camera viewpoints. Our comprehensive analysis shows that DistFormer outperforms existing methods. Moreover, we further delve into its generalization capabilities, showing its regularization benefits in zero-shot synth-to-real transfer. △ Less

Submitted 6 January, 2024; originally announced January 2024.

arXiv:2312.09162 [pdf, other]

Approximation Algorithms for Preference Aggregation Using CP-Nets

Authors: Abu Mohammmad Hammad Ali, Boting Yang, Sandra Zilles

Abstract: This paper studies the design and analysis of approximation algorithms for aggregating preferences over combinatorial domains, represented using Conditional Preference Networks (CP-nets). Its focus is on aggregating preferences over so-called \emph{swaps}, for which optimal solutions in general are already known to be of exponential size. We first analyze a trivial 2-approximation algorithm that s… ▽ More This paper studies the design and analysis of approximation algorithms for aggregating preferences over combinatorial domains, represented using Conditional Preference Networks (CP-nets). Its focus is on aggregating preferences over so-called \emph{swaps}, for which optimal solutions in general are already known to be of exponential size. We first analyze a trivial 2-approximation algorithm that simply outputs the best of the given input preferences, and establish a structural condition under which the approximation ratio of this algorithm is improved to $4/3$. We then propose a polynomial-time approximation algorithm whose outputs are provably no worse than those of the trivial algorithm, but often substantially better. A family of problem instances is presented for which our improved algorithm produces optimal solutions, while, for any $\varepsilon$, the trivial algorithm can\emph{not}\/ attain a $(2-\varepsilon)$-approximation. These results may lead to the first polynomial-time approximation algorithm that solves the CP-net aggregation problem for swaps with an approximation ratio substantially better than $2$. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 11 pages, main body and appendix. Full version of a paper accepted at the 38th Annual AAAI Conference on Artificial Intelligence

arXiv:2312.02699 [pdf, other]

Enhancing Vehicle Entrance and Parking Management: Deep Learning Solutions for Efficiency and Security

Authors: Muhammad Umer Ramzan, Usman Ali, Syed Haider Abbas Naqvi, Zeeshan Aslam, Tehseen, Husnain Ali, Muhammad Faheem

Abstract: The auto-management of vehicle entrance and parking in any organization is a complex challenge encompassing record-keeping, efficiency, and security concerns. Manual methods for tracking vehicles and finding parking spaces are slow and a waste of time. To solve the problem of auto management of vehicle entrance and parking, we have utilized state-of-the-art deep learning models and automated the p… ▽ More The auto-management of vehicle entrance and parking in any organization is a complex challenge encompassing record-keeping, efficiency, and security concerns. Manual methods for tracking vehicles and finding parking spaces are slow and a waste of time. To solve the problem of auto management of vehicle entrance and parking, we have utilized state-of-the-art deep learning models and automated the process of vehicle entrance and parking into any organization. To ensure security, our system integrated vehicle detection, license number plate verification, and face detection and recognition models to ensure that the person and vehicle are registered with the organization. We have trained multiple deep-learning models for vehicle detection, license number plate detection, face detection, and recognition, however, the YOLOv8n model outperformed all the other models. Furthermore, License plate recognition is facilitated by Google's Tesseract-OCR Engine. By integrating these technologies, the system offers efficient vehicle detection, precise identification, streamlined record keeping, and optimized parking slot allocation in buildings, thereby enhancing convenience, accuracy, and security. Future research opportunities lie in fine-tuning system performance for a wide range of real-world applications. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted for publication in the 25th International Multitopic Conference (INMIC) IEEE 2023, 6 Pages, 3 figures

arXiv:2311.16895 [pdf, other]

Optimization Theory Based Deep Reinforcement Learning for Resource Allocation in Ultra-Reliable Wireless Networked Control Systems

Authors: Hamida Qumber Ali, Amirhassan Babazadeh Darabi, Sinem Coleri

Abstract: The design of Wireless Networked Control System (WNCS) requires addressing critical interactions between control and communication systems with minimal complexity and communication overhead while providing ultra-high reliability. This paper introduces a novel optimization theory based deep reinforcement learning (DRL) framework for the joint design of controller and communication systems. The obje… ▽ More The design of Wireless Networked Control System (WNCS) requires addressing critical interactions between control and communication systems with minimal complexity and communication overhead while providing ultra-high reliability. This paper introduces a novel optimization theory based deep reinforcement learning (DRL) framework for the joint design of controller and communication systems. The objective of minimum power consumption is targeted while satisfying the schedulability and rate constraints of the communication system in the finite blocklength regime and stability constraint of the control system. Decision variables include the sampling period in the control system, and blocklength and packet error probability in the communication system. The proposed framework contains two stages: optimization theory and DRL. In the optimization theory stage, following the formulation of the joint optimization problem, optimality conditions are derived to find the mathematical relations between the optimal values of the decision variables. These relations allow the decomposition of the problem into multiple building blocks. In the DRL stage, the blocks that are simplified but not tractable are replaced by DRL. Via extensive simulations, the proposed optimization theory based DRL approach is demonstrated to outperform the optimization theory and pure DRL based approaches, with close to optimal performance and much lower complexity. △ Less

Submitted 19 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 14 pages, 11 figures

arXiv:2311.08774 [pdf, other]

Two-stage Joint Transductive and Inductive learning for Nuclei Segmentation

Authors: Hesham Ali, Idriss Tondji, Mennatullah Siam

Abstract: AI-assisted nuclei segmentation in histopathological images is a crucial task in the diagnosis and treatment of cancer diseases. It decreases the time required to manually screen microscopic tissue images and can resolve the conflict between pathologists during diagnosis. Deep Learning has proven useful in such a task. However, lack of labeled data is a significant barrier for deep learning-based… ▽ More AI-assisted nuclei segmentation in histopathological images is a crucial task in the diagnosis and treatment of cancer diseases. It decreases the time required to manually screen microscopic tissue images and can resolve the conflict between pathologists during diagnosis. Deep Learning has proven useful in such a task. However, lack of labeled data is a significant barrier for deep learning-based approaches. In this study, we propose a novel approach to nuclei segmentation that leverages the available labelled and unlabelled data. The proposed method combines the strengths of both transductive and inductive learning, which have been previously attempted separately, into a single framework. Inductive learning aims at approximating the general function and generalizing to unseen test data, while transductive learning has the potential of leveraging the unlabelled test data to improve the classification. To the best of our knowledge, this is the first study to propose such a hybrid approach for medical image segmentation. Moreover, we propose a novel two-stage transductive inference scheme. We evaluate our approach on MoNuSeg benchmark to demonstrate the efficacy and potential of our method. △ Less

Submitted 17 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 5 pages

arXiv:2310.05032 [pdf, other]

PASSION: Permissioned Access Control for Segmented Devices and Identity for IoT Networks

Authors: Hisham Ali, Mwrwan Abubakar, Jawad Ahmad, William J. Buchanan, Zakwan Jaroucheh

Abstract: In recent years, there has been a significant proliferation of industrial Internet of Things (IoT) applications, with a wide variety of use cases being developed and put into operation. As the industrial IoT landscape expands, the establishment of secure and reliable infrastructure becomes crucial to instil trust among users and stakeholders, particularly in addressing fundamental concerns such as… ▽ More In recent years, there has been a significant proliferation of industrial Internet of Things (IoT) applications, with a wide variety of use cases being developed and put into operation. As the industrial IoT landscape expands, the establishment of secure and reliable infrastructure becomes crucial to instil trust among users and stakeholders, particularly in addressing fundamental concerns such as traceability, integrity protection, and privacy that some industries still encounter today. This paper introduces a privacy-preserving method in the industry's IoT systems using blockchain-based data access control for remote industry safety monitoring and maintaining event information confidentiality, integrity and authenticity. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.03614 [pdf]

Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally

Authors: Shawqi Al-Maliki, Adnan Qayyum, Hassan Ali, Mohamed Abdallah, Junaid Qadir, Dinh Thai Hoang, Dusit Niyato, Ala Al-Fuqaha

Abstract: Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulner… ▽ More Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2310.01426 [pdf, other]

REMEDI: REinforcement learning-driven adaptive MEtabolism modeling of primary sclerosing cholangitis DIsease progression

Authors: Chang Hu, Krishnakant V. Saboo, Ahmad H. Ali, Brian D. Juran, Konstantinos N. Lazaridis, Ravishankar K. Iyer

Abstract: Primary sclerosing cholangitis (PSC) is a rare disease wherein altered bile acid metabolism contributes to sustained liver injury. This paper introduces REMEDI, a framework that captures bile acid dynamics and the body's adaptive response during PSC progression that can assist in exploring treatments. REMEDI merges a differential equation (DE)-based mechanistic model that describes bile acid metab… ▽ More Primary sclerosing cholangitis (PSC) is a rare disease wherein altered bile acid metabolism contributes to sustained liver injury. This paper introduces REMEDI, a framework that captures bile acid dynamics and the body's adaptive response during PSC progression that can assist in exploring treatments. REMEDI merges a differential equation (DE)-based mechanistic model that describes bile acid metabolism with reinforcement learning (RL) to emulate the body's adaptations to PSC continuously. An objective of adaptation is to maintain homeostasis by regulating enzymes involved in bile acid metabolism. These enzymes correspond to the parameters of the DEs. REMEDI leverages RL to approximate adaptations in PSC, treating homeostasis as a reward signal and the adjustment of the DE parameters as the corresponding actions. On real-world data, REMEDI generated bile acid dynamics and parameter adjustments consistent with published findings. Also, our results support discussions in the literature that early administration of drugs that suppress bile acid synthesis may be effective in PSC treatment. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 8 pages, 5 figures, 4 appendices

arXiv:2309.11476 [pdf]

CellSecure: Securing Image Data in Industrial Internet-of-Things via Cellular Automata and Chaos-Based Encryption

Authors: Hassan Ali, Muhammad Shahbaz Khan, Maha Driss, Jawad Ahmad, William J. Buchanan, Nikolaos Pitropakis

Abstract: In the era of Industrial IoT (IIoT) and Industry 4.0, ensuring secure data transmission has become a critical concern. Among other data types, images are widely transmitted and utilized across various IIoT applications, ranging from sensor-generated visual data and real-time remote monitoring to quality control in production lines. The encryption of these images is essential for maintaining operat… ▽ More In the era of Industrial IoT (IIoT) and Industry 4.0, ensuring secure data transmission has become a critical concern. Among other data types, images are widely transmitted and utilized across various IIoT applications, ranging from sensor-generated visual data and real-time remote monitoring to quality control in production lines. The encryption of these images is essential for maintaining operational integrity, data confidentiality, and seamless integration with analytics platforms. This paper addresses these critical concerns by proposing a robust image encryption algorithm tailored for IIoT and Cyber-Physical Systems (CPS). The algorithm combines Rule-30 cellular automata with chaotic scrambling and substitution. The Rule 30 cellular automata serves as an efficient mechanism for generating pseudo-random sequences that enable fast encryption and decryption cycles suitable for real-time sensor data in industrial settings. Most importantly, it induces non-linearity in the encryption algorithm. Furthermore, to increase the chaotic range and keyspace of the algorithm, which is vital for security in distributed industrial networks, a hybrid chaotic map, i.e., logistic-sine map is utilized. Extensive security analysis has been carried out to validate the efficacy of the proposed algorithm. Results indicate that our algorithm achieves close-to-ideal values, with an entropy of 7.99 and a correlation of 0.002. This enhances the algorithm's resilience against potential cyber-attacks in the industrial domain. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.02783 [pdf]

Improving diagnosis and prognosis of lung cancer using vision transformers: A scoping review

Authors: Hazrat Ali, Farida Mohsen, Zubair Shah

Abstract: Vision transformer-based methods are advancing the field of medical artificial intelligence and cancer imaging, including lung cancer applications. Recently, many researchers have developed vision transformer-based AI methods for lung cancer diagnosis and prognosis. This scoping review aims to identify the recent developments on vision transformer-based AI methods for lung cancer imaging applicati… ▽ More Vision transformer-based methods are advancing the field of medical artificial intelligence and cancer imaging, including lung cancer applications. Recently, many researchers have developed vision transformer-based AI methods for lung cancer diagnosis and prognosis. This scoping review aims to identify the recent developments on vision transformer-based AI methods for lung cancer imaging applications. It provides key insights into how vision transformers complemented the performance of AI and deep learning methods for lung cancer. Furthermore, the review also identifies the datasets that contributed to advancing the field. Of the 314 retrieved studies, this review included 34 studies published from 2020 to 2022. The most commonly addressed task in these studies was the classification of lung cancer types, such as lung squamous cell carcinoma versus lung adenocarcinoma, and identifying benign versus malignant pulmonary nodules. Other applications included survival prediction of lung cancer patients and segmentation of lungs. The studies lacked clear strategies for clinical transformation. SWIN transformer was a popular choice of the researchers; however, many other architectures were also reported where vision transformer was combined with convolutional neural networks or UNet model. It can be concluded that vision transformer-based models are increasingly in popularity for developing AI methods for lung cancer applications. However, their computational complexity and clinical relevance are important factors to be considered for future research work. This review provides valuable insights for researchers in the field of AI and healthcare to advance the state-of-the-art in lung cancer diagnosis and prognosis. We provide an interactive dashboard on lung-cancer.onrender.com/. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: submitted to BMC Medical Imaging journal

arXiv:2308.10834 [pdf, other]

SRSS: A New Chaos-Based Single-Round Single S-Box Image Encryption Scheme for Highly Auto-Correlated Data

Authors: Muhammad Shahbaz Khan, Jawad Ahmad, Hisham Ali, Nikolaos Pitropakis, Ahmed Al-Dubai, Baraq Ghaleb, William J. Buchanan

Abstract: With the advent of digital communication, securing digital images during transmission and storage has become a critical concern. The traditional s-box substitution methods often fail to effectively conceal the information within highly auto-correlated regions of an image. This paper addresses the security issues presented by three prevalent S-box substitution methods, i.e., single S-box, multiple… ▽ More With the advent of digital communication, securing digital images during transmission and storage has become a critical concern. The traditional s-box substitution methods often fail to effectively conceal the information within highly auto-correlated regions of an image. This paper addresses the security issues presented by three prevalent S-box substitution methods, i.e., single S-box, multiple S-boxes, and multiple rounds with multiple S-boxes, especially when handling images with highly auto-correlated pixels. To resolve the addressed security issues, this paper proposes a new scheme SRSS-the Single Round Single S-Box encryption scheme. SRSS uses a single S-box for substitution in just one round to break the pixel correlations and encrypt the plaintext image effectively. Additionally, this paper introduces a new Chaos-based Random Operation Selection System-CROSS, which nullifies the requirement for multiple S-boxes, thus reducing the encryption scheme's complexity. By randomly selecting the operation to be performed on each pixel, driven by a chaotic sequence, the proposed scheme effectively scrambles even high auto-correlation areas. When compared to the substitution methods mentioned above, the proposed encryption scheme exhibited exceptionally well in just a single round with a single S-box. The close-to-ideal statistical security analysis results, i.e., an entropy of 7.89 and a correlation coefficient of 0.007, validate the effectiveness of the proposed scheme. This research offers an innovative path forward for securing images in applications requiring low computational complexity and fast encryption and decryption speeds. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 6 Pages

arXiv:2308.06393 [pdf, other]

R2S100K: Road-Region Segmentation Dataset For Semi-Supervised Autonomous Driving in the Wild

Authors: Muhammad Atif Butt, Hassan Ali, Adnan Qayyum, Waqas Sultani, Ala Al-Fuqaha, Junaid Qadir

Abstract: Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2S100K) -- a large-scale… ▽ More Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2S100K) -- a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R2S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 KM of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling (EDS) based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2308.01270 [pdf]

BCDDO: Binary Child Drawing Development Optimization

Authors: Abubakr S. Issa, Yossra H. Ali, Tarik A. Rashid

Abstract: A lately created metaheuristic algorithm called Child Drawing Development Optimization (CDDO) has proven to be effective in a number of benchmark tests. A Binary Child Drawing Development Optimization (BCDDO) is suggested for choosing the wrapper features in this study. To achieve the best classification accuracy, a subset of crucial features is selected using the suggested BCDDO. The proposed fea… ▽ More A lately created metaheuristic algorithm called Child Drawing Development Optimization (CDDO) has proven to be effective in a number of benchmark tests. A Binary Child Drawing Development Optimization (BCDDO) is suggested for choosing the wrapper features in this study. To achieve the best classification accuracy, a subset of crucial features is selected using the suggested BCDDO. The proposed feature selection technique's efficiency and effectiveness are assessed using the Harris Hawk, Grey Wolf, Salp, and Whale optimization algorithms. The suggested approach has significantly outperformed the previously discussed techniques in the area of feature selection to increase classification accuracy. Moderate COVID, breast cancer, and big COVID are the three datasets utilized in this study. The classification accuracy for each of the three datasets was (98.75, 98.83%, and 99.36) accordingly. △ Less

Submitted 11 April, 2024; v1 submitted 19 July, 2023; originally announced August 2023.

Comments: 13 pages

arXiv:2307.05193 [pdf, other]

Membership Inference Attacks on DNNs using Adversarial Perturbations

Authors: Hassan Ali, Adnan Qayyum, Ala Al-Fuqaha, Junaid Qadir

Abstract: Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attac… ▽ More Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.01232 [pdf, other]

Robust Surgical Tools Detection in Endoscopic Videos with Noisy Data

Authors: Adnan Qayyum, Hassan Ali, Massimo Caputo, Hunaid Vohra, Taofeek Akinosho, Sofiat Abioye, Ilhem Berrou, Paweł Capik, Junaid Qadir, Muhammad Bilal

Abstract: Over the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitizing clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assess… ▽ More Over the past few years, surgical data science has attracted substantial interest from the machine learning (ML) community. Various studies have demonstrated the efficacy of emerging ML techniques in analysing surgical data, particularly recordings of procedures, for digitizing clinical and non-clinical functions like preoperative planning, context-aware decision-making, and operating skill assessment. However, this field is still in its infancy and lacks representative, well-annotated datasets for training robust models in intermediate ML tasks. Also, existing datasets suffer from inaccurate labels, hindering the development of reliable models. In this paper, we propose a systematic methodology for developing robust models for surgical tool detection using noisy data. Our methodology introduces two key innovations: (1) an intelligent active learning strategy for minimal dataset identification and label correction by human experts; and (2) an assembling strategy for a student-teacher model-based self-training framework to achieve the robust classification of 14 surgical tools in a semi-supervised fashion. Furthermore, we employ weighted data loaders to handle difficult class labels and address class imbalance issues. The proposed methodology achieves an average F1-score of 85.88\% for the ensemble model-based self-training with class weights, and 80.88\% without class weights for noisy labels. Also, our proposed method significantly outperforms existing approaches, which effectively demonstrates its effectiveness. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2304.07514 [pdf, other]

PI-FL: Personalized and Incentivized Federated Learning

Authors: Ahmad Faraz Khan, Xinran Wang, Qi Le, Azal Ahmad Khan, Haider Ali, Jie Ding, Ali Butt, Ali Anwar

Abstract: Personalized FL has been widely used to cater to heterogeneity challenges with non-IID data. A primary obstacle is considering the personalization process from the client's perspective to preserve their autonomy. Allowing the clients to participate in personalized FL decisions becomes significant due to privacy and security concerns, where the clients may not be at liberty to share private informa… ▽ More Personalized FL has been widely used to cater to heterogeneity challenges with non-IID data. A primary obstacle is considering the personalization process from the client's perspective to preserve their autonomy. Allowing the clients to participate in personalized FL decisions becomes significant due to privacy and security concerns, where the clients may not be at liberty to share private information necessary for producing good quality personalized models. Moreover, clients with high-quality data and resources are reluctant to participate in the FL process without reasonable incentive. In this paper, we propose PI-FL, a one-shot personalization solution complemented by a token-based incentive mechanism that rewards personalized training. PI-FL outperforms other state-of-the-art approaches and can generate good-quality personalized models while respecting clients' privacy. △ Less

Submitted 27 April, 2023; v1 submitted 15 April, 2023; originally announced April 2023.

arXiv:2304.06020 [pdf, other]

VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs

Authors: Moayed Haji Ali, Andrew Bond, Tolga Birdal, Duygu Ceylan, Levent Karacan, Erkut Erdem, Aykut Erdem

Abstract: We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hi… ▽ More We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hindered by the difficulty of representing and controlling videos in the latent space of GANs. In particular, videos are composed of content (i.e., appearance) and complex motion components that require a special mechanism to disentangle and control. To achieve this, VidStyleODE encodes the video content in a pre-trained StyleGAN $\mathcal{W}_+$ space and benefits from a latent ODE component to summarize the spatiotemporal dynamics of the input video. Our novel continuous video generation process then combines the two to generate high-quality and temporally consistent videos with varying frame rates. We show that our proposed method enables a variety of applications on real videos: text-guided appearance manipulation, motion manipulation, image animation, and video interpolation and extrapolation. Project website: https://cyberiada.github.io/VidStyleODE △ Less

Submitted 12 April, 2023; originally announced April 2023.

Journal ref: ICCV 2023

arXiv:2304.03536 [pdf]

Leveraging GANs for data scarcity of COVID-19: Beyond the hype

Authors: Hazrat Ali, Christer Gronlund, Zubair Shah

Abstract: Artificial Intelligence (AI)-based models can help in diagnosing COVID-19 from lung CT scans and X-ray images; however, these models require large amounts of data for training and validation. Many researchers studied Generative Adversarial Networks (GANs) for producing synthetic lung CT scans and X-Ray images to improve the performance of AI-based models. It is not well explored how good GAN-based… ▽ More Artificial Intelligence (AI)-based models can help in diagnosing COVID-19 from lung CT scans and X-ray images; however, these models require large amounts of data for training and validation. Many researchers studied Generative Adversarial Networks (GANs) for producing synthetic lung CT scans and X-Ray images to improve the performance of AI-based models. It is not well explored how good GAN-based methods performed to generate reliable synthetic data. This work analyzes 43 published studies that reported GANs for synthetic data generation. Many of these studies suffered data bias, lack of reproducibility, and lack of feedback from the radiologists or other domain experts. A common issue in these studies is the unavailability of the source code, hindering reproducibility. The included studies reported rescaling of the input images to train the existing GANs architecture without providing clinical insights on how the rescaling was motivated. Finally, even though GAN-based methods have the potential for data augmentation and improving the training of AI-based models, these methods fall short in terms of their use in clinical practice. This paper highlights research hotspots in countering the data scarcity problem, identifies various issues as well as potentials, and provides recommendations to guide future research. These recommendations might be useful to improve acceptability for the GAN-based approaches for data augmentation as GANs for data augmentation are increasingly becoming popular in the AI and medical imaging research community. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: submitted to 2023 CVPR workshop on Generative Models for Computer Vision

arXiv:2303.09063 [pdf, other]

Plant Disease Detection using Region-Based Convolutional Neural Network

Authors: Hasin Rehana, Muhammad Ibrahim, Md. Haider Ali

Abstract: Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases an… ▽ More Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases and boost the yield. Most of the farmers use generalized pesticides and fertilizers in the entire fields without specifically knowing the condition of the plants. Thus the production cost oftentimes increases, and, not only that, sometimes this becomes detrimental to the yield. Deep Learning models are found to be very effective to automatically detect plant diseases from images of plants, thereby reducing the need for human specialists. This paper aims at building a lightweight deep learning model for predicting leaf disease in tomato plants. By modifying the region-based convolutional neural network, we design an efficient and effective model that demonstrates satisfactory empirical performance on a benchmark dataset. Our proposed model can easily be deployed in a larger system where drones take images of leaves and these images will be fed into our model to know the health condition. △ Less

Submitted 12 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2303.02669 [pdf, other]

doi 10.1109/tits.2023.3343971

Consistent Valid Physically-Realizable Adversarial Attack against Crowd-flow Prediction Models

Authors: Hassan Ali, Muhammad Atif Butt, Fethi Filali, Ala Al-Fuqaha, Junaid Qadir

Abstract: Recent works have shown that deep learning (DL) models can effectively learn city-wide crowd-flow patterns, which can be used for more effective urban planning and smart city management. However, DL models have been known to perform poorly on inconspicuous adversarial perturbations. Although many works have studied these adversarial perturbations in general, the adversarial vulnerabilities of deep… ▽ More Recent works have shown that deep learning (DL) models can effectively learn city-wide crowd-flow patterns, which can be used for more effective urban planning and smart city management. However, DL models have been known to perform poorly on inconspicuous adversarial perturbations. Although many works have studied these adversarial perturbations in general, the adversarial vulnerabilities of deep crowd-flow prediction models in particular have remained largely unexplored. In this paper, we perform a rigorous analysis of the adversarial vulnerabilities of DL-based crowd-flow prediction models under multiple threat settings, making three-fold contributions. (1) We propose CaV-detect by formally identifying two novel properties - Consistency and Validity - of the crowd-flow prediction inputs that enable the detection of standard adversarial inputs with 0% false acceptance rate (FAR). (2) We leverage universal adversarial perturbations and an adaptive adversarial loss to present adaptive adversarial attacks to evade CaV-detect defense. (3) We propose CVPR, a Consistent, Valid and Physically-Realizable adversarial attack, that explicitly inducts the consistency and validity priors in the perturbation generation mechanism. We find out that although the crowd-flow models are vulnerable to adversarial perturbations, it is extremely challenging to simulate these perturbations in physical settings, notably when CaV-detect is in place. We also show that CVPR attack considerably outperforms the adaptively modified standard attacks in FAR and adversarial loss metrics. We conclude with useful insights emerging from our work and highlight promising future research directions. △ Less

Submitted 5 March, 2023; originally announced March 2023.

Journal ref: IEEE Transactions on Intelligent Transportation Systems (2023)

arXiv:2212.01772 [pdf, other]

doi 10.1007/978-3-031-26438-2_12

Brain Tumor Synthetic Data Generation with Adaptive StyleGANs

Authors: Usama Tariq, Rizwan Qureshi, Anas Zafar, Danyal Aftab, Jia Wu, Tanvir Alam, Zubair Shah, Hazrat Ali

Abstract: Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of… ▽ More Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted in AICS conference

arXiv:2211.07290 [pdf, other]

AI-Based Emotion Recognition: Promise, Peril, and Prescriptions for Prosocial Path

Authors: Siddique Latif, Hafiz Shehbaz Ali, Muhammad Usama, Rajib Rana, Björn Schuller, Junaid Qadir

Abstract: Automated emotion recognition (AER) technology can detect humans' emotional states in real-time using facial expressions, voice attributes, text, body movements, and neurological signals and has a broad range of applications across many sectors. It helps businesses get a much deeper understanding of their customers, enables monitoring of individuals' moods in healthcare, education, or the automoti… ▽ More Automated emotion recognition (AER) technology can detect humans' emotional states in real-time using facial expressions, voice attributes, text, body movements, and neurological signals and has a broad range of applications across many sectors. It helps businesses get a much deeper understanding of their customers, enables monitoring of individuals' moods in healthcare, education, or the automotive industry, and enables identification of violence and threat in forensics, to name a few. However, AER technology also risks using artificial intelligence (AI) to interpret sensitive human emotions. It can be used for economic and political power and against individual rights. Human emotions are highly personal, and users have justifiable concerns about privacy invasion, emotional manipulation, and bias. In this paper, we present the promises and perils of AER applications. We discuss the ethical challenges related to the data and AER systems and highlight the prescriptions for prosocial perspectives for future AER applications. We hope this work will help AI researchers and developers design prosocial AER applications. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: Under review in IEEE TAC

arXiv:2211.00902 [pdf, other]

Spot the fake lungs: Generating Synthetic Medical Images using Neural Diffusion Models

Authors: Hazrat Ali, Shafaq Murad, Zubair Shah

Abstract: Generative models are becoming popular for the synthesis of medical images. Recently, neural diffusion models have demonstrated the potential to generate photo-realistic images of objects. However, their potential to generate medical images is not explored yet. In this work, we explore the possibilities of synthesis of medical images using neural diffusion models. First, we use a pre-trained DALLE… ▽ More Generative models are becoming popular for the synthesis of medical images. Recently, neural diffusion models have demonstrated the potential to generate photo-realistic images of objects. However, their potential to generate medical images is not explored yet. In this work, we explore the possibilities of synthesis of medical images using neural diffusion models. First, we use a pre-trained DALLE2 model to generate lungs X-Ray and CT images from an input text prompt. Second, we train a stable diffusion model with 3165 X-Ray images and generate synthetic images. We evaluate the synthetic image data through a qualitative analysis where two independent radiologists label randomly chosen samples from the generated data as real, fake, or unsure. Results demonstrate that images generated with the diffusion model can translate characteristics that are otherwise very specific to certain medical conditions in chest X-Ray or CT images. Careful tuning of the model can be very promising. To the best of our knowledge, this is the first attempt to generate lungs X-Ray and CT images using neural diffusion models. This work aims to introduce a new dimension in artificial intelligence for medical imaging. Given that this is a new topic, the paper will serve as an introduction and motivation for the research community to explore the potential of diffusion models for medical image synthesis. We have released the synthetic images on https://www.kaggle.com/datasets/hazrat/awesomelungs. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 8 pages. Submitted to AICS 2022 conference

arXiv:2210.13462 [pdf, other]

doi 10.1038/s41598-022-22514-4

Artificial Intelligence-Based Methods for Fusion of Electronic Health Records and Imaging Data

Authors: Farida Mohsen, Hazrat Ali, Nady El Hajj, Zubair Shah

Abstract: Healthcare data are inherently multimodal, including electronic health records (EHR), medical images, and multi-omics data. Combining these multimodal data sources contributes to a better understanding of human health and provides optimal personalized healthcare. Advances in artificial intelligence (AI) technologies, particularly machine learning (ML), enable the fusion of these different data mod… ▽ More Healthcare data are inherently multimodal, including electronic health records (EHR), medical images, and multi-omics data. Combining these multimodal data sources contributes to a better understanding of human health and provides optimal personalized healthcare. Advances in artificial intelligence (AI) technologies, particularly machine learning (ML), enable the fusion of these different data modalities to provide multimodal insights. To this end, in this scoping review, we focus on synthesizing and analyzing the literature that uses AI techniques to fuse multimodal medical data for different clinical applications. More specifically, we focus on studies that only fused EHR with medical imaging data to develop various AI methods for clinical applications. We present a comprehensive analysis of the various fusion strategies, the diseases and clinical outcomes for which multimodal fusion was used, the ML algorithms used to perform multimodal fusion for each clinical application, and the available multimodal medical datasets. We followed the PRISMA-ScR guidelines. We searched Embase, PubMed, Scopus, and Google Scholar to retrieve relevant studies. We extracted data from 34 studies that fulfilled the inclusion criteria. In our analysis, a typical workflow was observed: feeding raw data, fusing different data modalities by applying conventional machine learning (ML) or deep learning (DL) algorithms, and finally, evaluating the multimodal fusion through clinical outcome predictions. Specifically, early fusion was the most used technique in most applications for multimodal learning (22 out of 34 studies). We found that multimodality fusion models outperformed traditional single-modality models for the same task. Disease diagnosis and prediction were the most common clinical outcomes (reported in 20 and 10 studies, respectively) from a clinical outcome perspective. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: Accepted in Nature Scientific Reports. 20 pages

Journal ref: Sci Rep 12, 17981 (2022)

arXiv:2210.13289 [pdf, other]

doi 10.1145/3614426

Secure and Trustworthy Artificial Intelligence-Extended Reality (AI-XR) for Metaverses

Authors: Adnan Qayyum, Muhammad Atif Butt, Hassan Ali, Muhammad Usman, Osama Halabi, Ala Al-Fuqaha, Qammer H. Abbasi, Muhammad Ali Imran, Junaid Qadir

Abstract: Metaverse is expected to emerge as a new paradigm for the next-generation Internet, providing fully immersive and personalised experiences to socialize, work, and play in self-sustaining and hyper-spatio-temporal virtual world(s). The advancements in different technologies like augmented reality, virtual reality, extended reality (XR), artificial intelligence (AI), and 5G/6G communication will be… ▽ More Metaverse is expected to emerge as a new paradigm for the next-generation Internet, providing fully immersive and personalised experiences to socialize, work, and play in self-sustaining and hyper-spatio-temporal virtual world(s). The advancements in different technologies like augmented reality, virtual reality, extended reality (XR), artificial intelligence (AI), and 5G/6G communication will be the key enablers behind the realization of AI-XR metaverse applications. While AI itself has many potential applications in the aforementioned technologies (e.g., avatar generation, network optimization, etc.), ensuring the security of AI in critical applications like AI-XR metaverse applications is profoundly crucial to avoid undesirable actions that could undermine users' privacy and safety, consequently putting their lives in danger. To this end, we attempt to analyze the security, privacy, and trustworthiness aspects associated with the use of various AI techniques in AI-XR metaverse applications. Specifically, we discuss numerous such challenges and present a taxonomy of potential solutions that could be leveraged to develop secure, private, robust, and trustworthy AI-XR applications. To highlight the real implications of AI-associated adversarial threats, we designed a metaverse-specific case study and analyzed it through the adversarial lens. Finally, we elaborate upon various open issues that require further research interest from the community. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 24 pages, 11 figures

Journal ref: ACM Computing Surveys (2023)

arXiv:2210.06040 [pdf, other]

Question Answering Over Biological Knowledge Graph via Amazon Alexa

Authors: Md. Rezaul Karim, Hussain Ali, Prinon Das, Mohamed Abdelwaheb, Stefan Decker

Abstract: Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A knowledge graph (KG) can be constructed by integrating such facts and data and be u… ▽ More Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A knowledge graph (KG) can be constructed by integrating such facts and data and be used for data integration, exploration, and federated queries. However, exploration and querying large-scale KGs is tedious for certain groups of users due to a lack of knowledge about underlying data assets or semantic technologies. A question-answering (QA) system allows the answer of natural language questions over KGs automatically using triples contained in a KG. Recently, the use and adaption of digital assistants are getting wider owing to their capability at enabling users to voice commands to control smart systems or devices. This paper is about using Amazon Alexa's voice-enabled interface for QA over KGs. As a proof-of-concept, we use the well-known DisgeNET KG, which contains knowledge covering 1.13 million gene-disease associations between 21,671 genes and 30,170 diseases, disorders, and clinical or abnormal human phenotypes. Our study shows how Alex could be of help to find facts about certain biological entities from large-scale knowledge bases. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: This paper is based on the Knowledge Graph Lab course (https://dbis.rwth-aachen.de/dbis/index.php/) offered at Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany, and a joint collaboration with Osthus GmbH (https://www.osthus.com/), Aachen, Germany

arXiv:2209.14022 [pdf, other]

Leveraging machine learning for less developed languages: Progress on Urdu text detection

Authors: Hazrat Ali

Abstract: Text detection in natural scene images has applications for autonomous driving, navigation help for elderly and blind people. However, the research on Urdu text detection is usually hindered by lack of data resources. We have developed a dataset of scene images with Urdu text. We present the use of machine learning methods to perform detection of Urdu text from the scene images. We extract text re… ▽ More Text detection in natural scene images has applications for autonomous driving, navigation help for elderly and blind people. However, the research on Urdu text detection is usually hindered by lack of data resources. We have developed a dataset of scene images with Urdu text. We present the use of machine learning methods to perform detection of Urdu text from the scene images. We extract text regions using channel enhanced Maximally Stable Extremal Region (MSER) method. First, we classify text and noise based on their geometric properties. Next, we use a support vector machine for early discarding of non-text regions. To further remove the non-text regions, we use histogram of oriented gradients (HoG) features obtained and train a second SVM classifier. This improves the overall performance on text region detection within the scene images. To support research on Urdu text, We aim to make the data freely available for research use. We also aim to highlight the challenges and the research gap for Urdu text detection. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: Accepted at NeurIPS ML4D workshop. arXiv admin note: text overlap with arXiv:2109.08060

Journal ref: NeurIPS ML4D 2021

arXiv:2208.09269 [pdf, other]

Feature Selection Enhancement and Feature Space Visualization for Speech-Based Emotion Recognition

Authors: Sofia Kanwal, Sohail Asghar, Hazrat Ali

Abstract: Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is an… ▽ More Robust speech emotion recognition relies on the quality of the speech features. We present speech features enhancement strategy that improves speech emotion recognition. We used the INTERSPEECH 2010 challenge feature-set. We identified subsets from the features set and applied Principle Component Analysis to the subsets. Finally, the features are fused horizontally. The resulting feature set is analyzed using t-distributed neighbour embeddings (t-SNE) before the application of features for emotion recognition. The method is compared with the state-of-the-art methods used in the literature. The empirical evidence is drawn using two well-known datasets: Emotional Speech Dataset (EMO-DB) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) for two languages, German and English, respectively. Our method achieved an average recognition gain of 11.5\% for six out of seven emotions for the EMO-DB dataset, and 13.8\% for seven out of eight emotions for the RAVDESS dataset as compared to the baseline study. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: Accepted at PeerJ Computer Science

arXiv:2208.04947 [pdf, other]

Visual Heart Rate Estimation from RGB Facial Video using Spectral Reflectance

Authors: Bharath Ramakrishnan, Ruijia Deng, Hassan Ali

Abstract: Estimation of the Heart rate from the facial video has a number of applications in the medical and fitness industries. Additionally, it has become useful in the field of gaming as well. Several approaches have been proposed to seamlessly obtain the Heart rate from the facial video, but these approaches have had issues in dealing with motion and illumination artifacts. In this work, we propose a re… ▽ More Estimation of the Heart rate from the facial video has a number of applications in the medical and fitness industries. Additionally, it has become useful in the field of gaming as well. Several approaches have been proposed to seamlessly obtain the Heart rate from the facial video, but these approaches have had issues in dealing with motion and illumination artifacts. In this work, we propose a reliable HR estimation framework using the spectral reflectance of the user, which makes it robust to motion and illumination disturbances. We employ deep learning-based frameworks such as Faster RCNNs to perform face detection as opposed to the Viola Jones algorithm employed by previous approaches. We evaluate our method on the MAHNOB HCI dataset and found that the proposed method is able to outperform previous approaches.Estimation of the Heart rate from facial video has a number of applications in the medical and the fitness industries. Additionally, it has become useful in the field of gaming as well. Several approaches have been proposed to seamlessly obtain the Heart rate from the facial video, but these approaches have had issues in dealing with motion and illumination artifacts. In this work, we propose a reliable HR estimation framework using the spectral reflectance of the user, which makes it robust to motion and illumination disturbances. We employ deep learning-based frameworks such as Faster RCNNs to perform face detection as opposed to the Viola-Jones algorithm employed by previous approaches. We evaluate our method on the MAHNOB HCI dataset and found that the proposed method is able to outperform previous approaches. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Submitted as a student abstract to AAAI 2023

arXiv:2207.10807 [pdf]

A Machine Learning Approach for Driver Identification Based on CAN-BUS Sensor Data

Authors: Md. Abbas Ali Khan, Mphammad Hanif Ali, AKM Fazlul Haque, Md. Tarek Habib

Abstract: Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through s… ▽ More Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through supervised learning algorithms based on driving behavior analysis. To determine the driver, a driver verification technique is proposed that evaluate driving pattern using the measurement of CAN sensor data. In this paper on-board diagnostic (OBD-II) is used to capture the data from the CAN-BUS sensor and the sensors are listed under SAE J1979 statement. According to the service of OBD-II, drive identification is possible. However, we have gained two types of accuracy on a complete data set with 10 drivers and a partial data set with two drivers. The accuracy is good with less number of drivers compared to the higher number of drivers. We have achieved statistically significant results in terms of accuracy in contrast to the baseline algorithm △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.02358 [pdf, other]

doi 10.1109/TCSII.2022.3181132

Implementation of a Modified U-Net for Medical Image Segmentation on Edge Devices

Authors: Owais Ali, Hazrat Ali, Syed Ayaz Ali Shah, Aamir Shahzad

Abstract: Deep learning techniques, particularly convolutional neural networks, have shown great potential in computer vision and medical imaging applications. However, deep learning models are computationally demanding as they require enormous computational power and specialized processing hardware for model training. To make these models portable and compatible for prototyping, their implementation on low… ▽ More Deep learning techniques, particularly convolutional neural networks, have shown great potential in computer vision and medical imaging applications. However, deep learning models are computationally demanding as they require enormous computational power and specialized processing hardware for model training. To make these models portable and compatible for prototyping, their implementation on low-power devices is imperative. In this work, we present the implementation of Modified U-Net on Intel Movidius Neural Compute Stick 2 (NCS-2) for the segmentation of medical images. We selected U-Net because, in medical image segmentation, U-Net is a prominent model that provides improved performance for medical image segmentation even if the dataset size is small. The modified U-Net model is evaluated for performance in terms of dice score. Experiments are reported for segmentation task on three medical imaging datasets: BraTs dataset of brain MRI, heart MRI dataset, and Ziehl-Neelsen sputum smear microscopy image (ZNSDB) dataset. For the proposed model, we reduced the number of parameters from 30 million in the U-Net model to 0.49 million in the proposed architecture. Experimental results show that the modified U-Net provides comparable performance while requiring significantly lower resources and provides inference on the NCS-2. The maximum dice scores recorded are 0.96 for the BraTs dataset, 0.94 for the heart MRI dataset, and 0.74 for the ZNSDB dataset. △ Less

Submitted 6 June, 2022; originally announced June 2022.

Comments: Preprint of paper accepted in IEEE Transactions on Circuits and Systems II: Express Brief

arXiv:2206.01793 [pdf]

doi 10.1007/s00521-022-07419-7

R2U++: A Multiscale Recurrent Residual U-Net with Dense Skip Connections for Medical Image Segmentation

Authors: Mehreen Mubashar, Hazrat Ali, Christer Gronlund, Shoaib Azmat

Abstract: U-Net is a widely adopted neural network in the domain of medical image segmentation. Despite its quick embracement by the medical imaging community, its performance suffers on complicated datasets. The problem can be ascribed to its simple feature extracting blocks: encoder/decoder, and the semantic gap between encoder and decoder. Variants of U-Net (such as R2U-Net) have been proposed to address… ▽ More U-Net is a widely adopted neural network in the domain of medical image segmentation. Despite its quick embracement by the medical imaging community, its performance suffers on complicated datasets. The problem can be ascribed to its simple feature extracting blocks: encoder/decoder, and the semantic gap between encoder and decoder. Variants of U-Net (such as R2U-Net) have been proposed to address the problem of simple feature extracting blocks by making the network deeper, but it does not deal with the semantic gap problem. On the other hand, another variant UNET++ deals with the semantic gap problem by introducing dense skip connections but has simple feature extraction blocks. To overcome these issues, we propose a new U-Net based medical image segmentation architecture R2U++. In the proposed architecture, the adapted changes from vanilla U-Net are: (1) the plain convolutional backbone is replaced by a deeper recurrent residual convolution block. The increased field of view with these blocks aids in extracting crucial features for segmentation which is proven by improvement in the overall performance of the network. (2) The semantic gap between encoder and decoder is reduced by dense skip pathways. These pathways accumulate features coming from multiple scales and apply concatenation accordingly. The modified architecture has embedded multi-depth models, and an ensemble of outputs taken from varying depths improves the performance on foreground objects appearing at various scales in the images. The performance of R2U++ is evaluated on four distinct medical imaging modalities: electron microscopy (EM), X-rays, fundus, and computed tomography (CT). The average gain achieved in IoU score is 1.5+-0.37% and in dice score is 0.9+-0.33% over UNET++, whereas, 4.21+-2.72 in IoU and 3.47+-1.89 in dice score over R2U-Net across different medical imaging segmentation datasets. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: Paper accepted in Neural Computing and Applications (2022). Please cite the final version available from Springer website https://link.springer.com/article/10.1007/s00521-022-07419-7

arXiv:2205.15862 [pdf, ps, other]

doi 10.1007/s12559-023-10174-z

Snapture -- A Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition

Authors: Hassan Ali, Doreen Jirak, Stefan Wermter

Abstract: As robots are expected to get more involved in people's everyday lives, frameworks that enable intuitive user interfaces are in demand. Hand gesture recognition systems provide a natural way of communication and, thus, are an integral part of seamless Human-Robot Interaction (HRI). Recent years have witnessed an immense evolution of computational models powered by deep learning. However, state-of-… ▽ More As robots are expected to get more involved in people's everyday lives, frameworks that enable intuitive user interfaces are in demand. Hand gesture recognition systems provide a natural way of communication and, thus, are an integral part of seamless Human-Robot Interaction (HRI). Recent years have witnessed an immense evolution of computational models powered by deep learning. However, state-of-the-art models fall short in expanding across different gesture domains, such as emblems and co-speech. In this paper, we propose a novel hybrid hand gesture recognition system. Our architecture enables learning both static and dynamic gestures: by capturing a so-called "snapshot" of the gesture performance at its peak, we integrate the hand pose along with the dynamic movement. Moreover, we present a method for analyzing the motion profile of a gesture to uncover its dynamic characteristics and which allows regulating a static channel based on the amount of motion. Our evaluation demonstrates the superiority of our approach on two gesture benchmarks compared to a CNNLSTM baseline. We also provide an analysis on a gesture class basis that unveils the potential of our Snapture architecture for performance improvements. Thanks to its modular implementation, our framework allows the integration of other multimodal data like facial expressions and head tracking, which are important cues in HRI scenarios, into one architecture. Thus, our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots. △ Less

Submitted 27 February, 2024; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: In Cognitive Computation(Accepted:30/06/2023, Published:17/07/2023),20 pages,20 figures,4 tables;Please find the published version/info to cite: https://doi.org/10.1007/s12559-023-10174-z;Repositories: https://zenodo.org/doi/10.5281/zenodo.10679196, https://zenodo.org/doi/10.5281/zenodo.10693816;This work was co-funded by Horizon Europe project TERAIS under Grant agreement number 101079338

ACM Class: I.2.10; I.5.4; I.4.9

Journal ref: Cognitive Computation 15, 2014-2033 (2023)

Showing 1–50 of 114 results for author: Ali, H