subscribe to arXiv mailings

3DEgo: 3D Editing on the Go!

Authors: Umar Khalid, Hasan Iqbal, Azib Farooq, Jing Hua, Chen Chen

Abstract: We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the datas… ▽ More We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the dataset with edited images to achieve a 3D scene with text fidelity. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization. We apply a diffusion model to edit video frames prior to 3D scene creation by incorporating our designed noise blender module for enhancing multi-view editing consistency, a step that does not require additional training or fine-tuning of T2I diffusion models. 3DEgo utilizes 3D Gaussian Splatting to create 3D scenes from the multi-view consistent edited frames, capitalizing on the inherent temporal continuity and explicit point cloud data. 3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources, as validated by extensive evaluations on six datasets, including our own prepared GS25 dataset. Project Page: https://3dego.github.io/ △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024 Accepted Paper

arXiv:2406.11592 [pdf, other]

ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models

Authors: Muhammad Ali Farooq, Wang Yao, Peter Corcoran

Abstract: In this research work we have proposed high-level ChildDiffusion framework capable of generating photorealistic child facial samples and further embedding several intelligent augmentations on child facial data using short text prompts, detailed textual guidance from LLMs, and further image to image transformation using text guidance control conditioning thus providing an opportunity to curate full… ▽ More In this research work we have proposed high-level ChildDiffusion framework capable of generating photorealistic child facial samples and further embedding several intelligent augmentations on child facial data using short text prompts, detailed textual guidance from LLMs, and further image to image transformation using text guidance control conditioning thus providing an opportunity to curate fully synthetic large scale child datasets. The framework is validated by rendering high-quality child faces representing ethnicity data, micro expressions, face pose variations, eye blinking effects, facial accessories, different hair colours and styles, aging, multiple and different child gender subjects in a single frame. Addressing privacy concerns regarding child data acquisition requires a comprehensive approach that involves legal, ethical, and technological considerations. Keeping this in view this framework can be adapted to synthesise child facial data which can be effectively used for numerous downstream machine learning tasks. The proposed method circumvents common issues encountered in generative AI tools, such as temporal inconsistency and limited control over the rendered outputs. As an exemplary use case we have open-sourced child ethnicity data consisting of 2.5k child facial samples of five different classes which includes African, Asian, White, South Asian/ Indian, and Hispanic races by deploying the model in production inference phase. The rendered data undergoes rigorous qualitative as well as quantitative tests to cross validate its efficacy and further fine-tuning Yolo architecture for detecting and classifying child ethnicity as an exemplary downstream machine learning task. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE Transactions Journal for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.07777 [pdf, other]

Unifying Interpretability and Explainability for Alzheimer's Disease Progression Prediction

Authors: Raja Farrukh Ali, Stephanie Milani, John Woods, Emmanuel Adenij, Ayesha Farooq, Clayton Mansel, Jeffrey Burns, William Hsu

Abstract: Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important ques… ▽ More Reinforcement learning (RL) has recently shown promise in predicting Alzheimer's disease (AD) progression due to its unique ability to model domain knowledge. However, it is not clear which RL algorithms are well-suited for this task. Furthermore, these methods are not inherently explainable, limiting their applicability in real-world clinical scenarios. Our work addresses these two important questions. Using a causal, interpretable model of AD, we first compare the performance of four contemporary RL algorithms in predicting brain cognition over 10 years using only baseline (year 0) data. We then apply SHAP (SHapley Additive exPlanations) to explain the decisions made by each algorithm in the model. Our approach combines interpretability with explainability to provide insights into the key factors influencing AD progression, offering both global and individual, patient-level analysis. Our findings show that only one of the RL methods is able to satisfactorily model disease progression, but the post-hoc explanations indicate that all methods fail to properly capture the importance of amyloid accumulation, one of the pathological hallmarks of Alzheimer's disease. Our work aims to merge predictive accuracy with transparency, assisting clinicians and researchers in enhancing disease progression modeling for informed healthcare decisions. Code is available at https://github.com/rfali/xrlad. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Previous versions accepted to NeurIPS 2023's XAIA and AAAI 2024's XAI4DRL workshops

arXiv:2406.06932 [pdf, other]

Synthetic Face Ageing: Evaluation, Analysis and Facilitation of Age-Robust Facial Recognition Algorithms

Authors: Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

Abstract: The ability to accurately recognize an individual's face with respect to human aging factor holds significant importance for various private as well as government sectors such as customs and public security bureaus, passport office, and national database systems. Therefore, developing a robust age-invariant face recognition system is of crucial importance to address the challenges posed by ageing… ▽ More The ability to accurately recognize an individual's face with respect to human aging factor holds significant importance for various private as well as government sectors such as customs and public security bureaus, passport office, and national database systems. Therefore, developing a robust age-invariant face recognition system is of crucial importance to address the challenges posed by ageing and maintain the reliability and accuracy of facial recognition technology. In this research work, the focus is to explore the feasibility of utilizing synthetic ageing data to improve the robustness of face recognition models that can eventually help in recognizing people at broader age intervals. To achieve this, we first design set of experiments to evaluate state-of-the-art synthetic ageing methods. In the next stage we explore the effect of age intervals on a current deep learning-based face recognition algorithm by using synthetic ageing data as well as real ageing data to perform rigorous training and validation. Moreover, these synthetic age data have been used in facilitating face recognition algorithms. Experimental results show that the recognition rate of the model trained on synthetic ageing images is 3.33% higher than the results of the baseline model when tested on images with an age gap of 40 years, which prove the potential of synthetic age data which has been quantified to enhance the performance of age-invariant face recognition systems. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2402.06969 [pdf, other]

Synthesizing CTA Image Data for Type-B Aortic Dissection using Stable Diffusion Models

Authors: Ayman Abaid, Muhammad Ali Farooq, Niamh Hynes, Peter Corcoran, Ihsan Ullah

Abstract: Stable Diffusion (SD) has gained a lot of attention in recent years in the field of Generative AI thus helping in synthesizing medical imaging data with distinct features. The aim is to contribute to the ongoing effort focused on overcoming the limitations of data scarcity and improving the capabilities of ML algorithms for cardiovascular image processing. Therefore, in this study, the possibility… ▽ More Stable Diffusion (SD) has gained a lot of attention in recent years in the field of Generative AI thus helping in synthesizing medical imaging data with distinct features. The aim is to contribute to the ongoing effort focused on overcoming the limitations of data scarcity and improving the capabilities of ML algorithms for cardiovascular image processing. Therefore, in this study, the possibility of generating synthetic cardiac CTA images was explored by fine-tuning stable diffusion models based on user defined text prompts, using only limited number of CTA images as input. A comprehensive evaluation of the synthetic data was conducted by incorporating both quantitative analysis and qualitative assessment, where a clinician assessed the quality of the generated data. It has been shown that Cardiac CTA images can be successfully generated using using Text to Image (T2I) stable diffusion model. The results demonstrate that the tuned T2I CTA diffusion model was able to generate images with features that are typically unique to acute type B aortic dissection (TBAD) medical conditions. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: Submitted in IEEE EMBC 2024 Conference

arXiv:2402.00544 [pdf, other]

Quantum-Assisted Hilbert-Space Gaussian Process Regression

Authors: Ahmad Farooq, Cristian A. Galvis-Florez, Simo Särkkä

Abstract: Gaussian processes are probabilistic models that are commonly used as functional priors in machine learning. Due to their probabilistic nature, they can be used to capture the prior information on the statistics of noise, smoothness of the functions, and training data uncertainty. However, their computational complexity quickly becomes intractable as the size of the data set grows. We propose a Hi… ▽ More Gaussian processes are probabilistic models that are commonly used as functional priors in machine learning. Due to their probabilistic nature, they can be used to capture the prior information on the statistics of noise, smoothness of the functions, and training data uncertainty. However, their computational complexity quickly becomes intractable as the size of the data set grows. We propose a Hilbert space approximation-based quantum algorithm for Gaussian process regression to overcome this limitation. Our method consists of a combination of classical basis function expansion with quantum computing techniques of quantum principal component analysis, conditional rotations, and Hadamard and Swap tests. The quantum principal component analysis is used to estimate the eigenvalues while the conditional rotations and the Hadamard and Swap tests are employed to evaluate the posterior mean and variance of the Gaussian process. Our method provides polynomial computational complexity reduction over the classical method. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2401.05159 [pdf, other]

Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNN

Authors: Muhammad Ali Farooq, Wang Yao, Michael Schukat, Mark A Little, Peter Corcoran

Abstract: This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanc… ▽ More This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning and a small amount of data representation in text-to-image latent diffusion models. The optimally tuned model is further used for rendering high-quality skin lesion synthetic data with diverse and realistic characteristics, providing a valuable supplement and diversity to the existing training data. We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models, assessing its effectiveness in enhancing model performance and generalization to unseen real-world data. Our experimental results demonstrate the efficacy of the synthetic data generated through stable diffusion models helps in improving the robustness and adaptability of end-to-end CNN and vision transformer models on two different real-world skin lesion datasets. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Paper is submitted in EMBC 2024 Conference

arXiv:2311.06307 [pdf]

Synthetic Speaking Children -- Why We Need Them and How to Make Them

Authors: Muhammad Ali Farooq, Dan Bigioi, Rishabh Jain, Wang Yao, Mariam Yiwere, Peter Corcoran

Abstract: Contemporary Human Computer Interaction (HCI) research relies primarily on neural network models for machine vision and speech understanding of a system user. Such models require extensively annotated training datasets for optimal performance and when building interfaces for users from a vulnerable population such as young children, GDPR introduces significant complexities in data collection, mana… ▽ More Contemporary Human Computer Interaction (HCI) research relies primarily on neural network models for machine vision and speech understanding of a system user. Such models require extensively annotated training datasets for optimal performance and when building interfaces for users from a vulnerable population such as young children, GDPR introduces significant complexities in data collection, management, and processing. Motivated by the training needs of an Edge AI smart toy platform this research explores the latest advances in generative neural technologies and provides a working proof of concept of a controllable data generation pipeline for speech driven facial training data at scale. In this context, we demonstrate how StyleGAN2 can be finetuned to create a gender balanced dataset of children's faces. This dataset includes a variety of controllable factors such as facial expressions, age variations, facial poses, and even speech-driven animations with realistic lip synchronization. By combining generative text to speech models for child voice synthesis and a 3D landmark based talking heads pipeline, we can generate highly realistic, entirely synthetic, talking child video clips. These video clips can provide valuable, and controllable, synthetic training data for neural network models, bridging the gap when real data is scarce or restricted due to privacy regulations. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: Presented at SpeD 23

arXiv:2308.04232 [pdf, other]

doi 10.5281/zenodo.8208491

A Comparative Study of Image-to-Image Translation Using GANs for Synthetic Child Race Data

Authors: Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

Abstract: The lack of ethnic diversity in data has been a limiting factor of face recognition techniques in the literature. This is particularly the case for children where data samples are scarce and presents a challenge when seeking to adapt machine vision algorithms that are trained on adult data to work on children. This work proposes the utilization of image-to-image transformation to synthesize data o… ▽ More The lack of ethnic diversity in data has been a limiting factor of face recognition techniques in the literature. This is particularly the case for children where data samples are scarce and presents a challenge when seeking to adapt machine vision algorithms that are trained on adult data to work on children. This work proposes the utilization of image-to-image transformation to synthesize data of different races and thus adjust the ethnicity of children's face data. We consider ethnicity as a style and compare three different Image-to-Image neural network based methods, specifically pix2pix, CycleGAN, and CUT networks to implement Caucasian child data and Asian child data conversion. Experimental validation results on synthetic data demonstrate the feasibility of using image-to-image transformation methods to generate various synthetic child data samples with broader ethnic diversity. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: The Paper is accepted in 25th Irish Machine Vision and Image Processing Conference (IMVIP23)

arXiv:2308.04224 [pdf, other]

doi 10.5281/zenodo.8208368

Will your Doorbell Camera still recognize you as you grow old

Authors: Wang Yao, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

Abstract: Robust authentication for low-power consumer devices such as doorbell cameras poses a valuable and unique challenge. This work explores the effect of age and aging on the performance of facial authentication methods. Two public age datasets, AgeDB and Morph-II have been used as baselines in this work. A photo-realistic age transformation method has been employed to augment a set of high-quality fa… ▽ More Robust authentication for low-power consumer devices such as doorbell cameras poses a valuable and unique challenge. This work explores the effect of age and aging on the performance of facial authentication methods. Two public age datasets, AgeDB and Morph-II have been used as baselines in this work. A photo-realistic age transformation method has been employed to augment a set of high-quality facial images with various age effects. Then the effect of these synthetic aging data on the high-performance deep-learning-based face recognition model is quantified by using various metrics including Receiver Operating Characteristic (ROC) curves and match score distributions. Experimental results demonstrate that long-term age effects are still a significant challenge for the state-of-the-art facial authentication method. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: The Paper is accepted in 25th Irish Machine Vision and Image Processing Conference (IMVIP23)

arXiv:2307.13746 [pdf, other]

doi 10.1109/ACCESS.2023.3321149

ChildGAN: Large Scale Synthetic Child Facial Data Using Domain Adaptation in StyleGAN

Authors: Muhammad Ali Farooq, Wang Yao, Gabriel Costache, Peter Corcoran

Abstract: In this research work, we proposed a novel ChildGAN, a pair of GAN networks for generating synthetic boys and girls facial data derived from StyleGAN2. ChildGAN is built by performing smooth domain transfer using transfer learning. It provides photo-realistic, high-quality data samples. A large-scale dataset is rendered with a variety of smart facial transformations: facial expressions, age progre… ▽ More In this research work, we proposed a novel ChildGAN, a pair of GAN networks for generating synthetic boys and girls facial data derived from StyleGAN2. ChildGAN is built by performing smooth domain transfer using transfer learning. It provides photo-realistic, high-quality data samples. A large-scale dataset is rendered with a variety of smart facial transformations: facial expressions, age progression, eye blink effects, head pose, skin and hair color variations, and variable lighting conditions. The dataset comprises more than 300k distinct data samples. Further, the uniqueness and characteristics of the rendered facial features are validated by running different computer vision application tests which include CNN-based child gender classifier, face localization and facial landmarks detection test, identity similarity evaluation using ArcFace, and lastly running eye detection and eye aspect ratio tests. The results demonstrate that synthetic child facial data of high quality offers an alternative to the cost and complexity of collecting a large-scale dataset from real children. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: The Paper is submitted in IEEE Access Journal

arXiv:2307.13600 [pdf, other]

doi 10.5281/zenodo.8160053

Decisive Data using Multi-Modality Optical Sensors for Advanced Vehicular Systems

Authors: Muhammad Ali Farooq, Waseem Shariff, Mehdi Sefidgar Dilmaghani, Wang Yao, Moazam Soomro, Peter Corcoran

Abstract: Optical sensors have played a pivotal role in acquiring real world data for critical applications. This data, when integrated with advanced machine learning algorithms provides meaningful information thus enhancing human vision. This paper focuses on various optical technologies for design and development of state-of-the-art out-cabin forward vision systems and in-cabin driver monitoring systems.… ▽ More Optical sensors have played a pivotal role in acquiring real world data for critical applications. This data, when integrated with advanced machine learning algorithms provides meaningful information thus enhancing human vision. This paper focuses on various optical technologies for design and development of state-of-the-art out-cabin forward vision systems and in-cabin driver monitoring systems. The focused optical sensors include Longwave Thermal Imaging (LWIR) cameras, Near Infrared (NIR), Neuromorphic/ event cameras, Visible CMOS cameras and Depth cameras. Further the paper discusses different potential applications which can be employed using the unique strengths of each these optical modalities in real time environment. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: The Paper is accepted in 25th Irish Machine Vision and Image Processing Conference (IMVIP23)

arXiv:2301.07613 [pdf]

Development, Optimization, and Deployment of Thermal Forward Vision Systems for Advance Vehicular Applications on Edge Devices

Authors: Muhammad Ali Farooq, Waseem Shariff, Faisal Khan, Peter Corcoran

Abstract: In this research work, we have proposed a thermal tiny-YOLO multi-class object detection (TTYMOD) system as a smart forward sensing system that should remain effective in all weather and harsh environmental conditions using an end-to-end YOLO deep learning framework. It provides enhanced safety and improved awareness features for driver assistance. The system is trained on large-scale thermal publ… ▽ More In this research work, we have proposed a thermal tiny-YOLO multi-class object detection (TTYMOD) system as a smart forward sensing system that should remain effective in all weather and harsh environmental conditions using an end-to-end YOLO deep learning framework. It provides enhanced safety and improved awareness features for driver assistance. The system is trained on large-scale thermal public datasets as well as newly gathered novel open-sourced dataset comprising of more than 35,000 distinct thermal frames. For optimal training and convergence of YOLO-v5 tiny network variant on thermal data, we have employed different optimizers which include stochastic decent gradient (SGD), Adam, and its variant AdamW which has an improved implementation of weight decay. The performance of thermally tuned tiny architecture is further evaluated on the public as well as locally gathered test data in diversified and challenging weather and environmental conditions. The efficacy of a thermally tuned nano network is quantified using various qualitative metrics which include mean average precision, frames per second rate, and average inference time. Experimental outcomes show that the network achieved the best mAP of 56.4% with an average inference time/ frame of 4 milliseconds. The study further incorporates optimization of tiny network variant using the TensorFlow Lite quantization tool this is beneficial for the deployment of deep learning architectures on the edge and mobile devices. For this study, we have used a raspberry pi 4 computing board for evaluating the real-time feasibility performance of an optimized version of the thermal object detection network for the automotive sensor suite. The source code, trained and optimized models and complete validation/ testing results are publicly available at https://github.com/MAli-Farooq/Thermal-YOLO-And-Model-Optimization-Using-TensorFlowLite. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: The paper is accepted and in the publication phase at ICMV 2022 Conference. Link: http://icmv.org/

arXiv:2212.07181 [pdf]

Event-based YOLO Object Detection: Proof of Concept for Forward Perception System

Authors: Waseem Shariff, Muhammad Ali Farooq, Joe Lemley, Peter Corcoran

Abstract: Neuromorphic vision or event vision is an advanced vision technology, where in contrast to the visible camera that outputs pixels, the event vision generates neuromorphic events every time there is a brightness change which exceeds a specific threshold in the field of view (FOV). This study focuses on leveraging neuromorphic event data for roadside object detection. This is a proof of concept towa… ▽ More Neuromorphic vision or event vision is an advanced vision technology, where in contrast to the visible camera that outputs pixels, the event vision generates neuromorphic events every time there is a brightness change which exceeds a specific threshold in the field of view (FOV). This study focuses on leveraging neuromorphic event data for roadside object detection. This is a proof of concept towards building artificial intelligence (AI) based pipelines which can be used for forward perception systems for advanced vehicular applications. The focus is on building efficient state-of-the-art object detection networks with better inference results for fast-moving forward perception using an event camera. In this article, the event-simulated A2D2 dataset is manually annotated and trained on two different YOLOv5 networks (small and large variants). To further assess its robustness, single model testing and ensemble model testing are carried out. △ Less

Submitted 10 January, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: 7 pages, 9 figures, ICMV conference 2022

ACM Class: I.2.10

arXiv:2209.10489 [pdf, other]

doi 10.56541/UAOV9084

Recurrent Super-Resolution Method for Enhancing Low Quality Thermal Facial Data

Authors: David O'Callaghan, Cian Ryan, Waseem Shariff, Muhammad Ali Farooq, Joseph Lemley, Peter Corcoran

Abstract: The process of obtaining high-resolution images from single or multiple low-resolution images of the same scene is of great interest for real-world image and signal processing applications. This study is about exploring the potential usage of deep learning based image super-resolution algorithms on thermal data for producing high quality thermal imaging results for in-cabin vehicular driver monito… ▽ More The process of obtaining high-resolution images from single or multiple low-resolution images of the same scene is of great interest for real-world image and signal processing applications. This study is about exploring the potential usage of deep learning based image super-resolution algorithms on thermal data for producing high quality thermal imaging results for in-cabin vehicular driver monitoring systems. In this work we have proposed and developed a novel multi-image super-resolution recurrent neural network to enhance the resolution and improve the quality of low-resolution thermal imaging data captured from uncooled thermal cameras. The end-to-end fully convolutional neural network is trained from scratch on newly acquired thermal data of 30 different subjects in indoor environmental conditions. The effectiveness of the thermally tuned super-resolution network is validated quantitatively as well as qualitatively on test data of 6 distinct subjects. The network was able to achieve a mean peak signal to noise ratio of 39.24 on the validation dataset for 4x super-resolution, outperforming bicubic interpolation both quantitatively and qualitatively. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: In proceedings of the 24th Irish Machine Vision and Image Processing Conference, Belfast Ireland, 31 August - 2nd September 2022

arXiv:2205.13801 [pdf, other]

Quantitative and Qualitative Assessment of Indoor Exploration Algorithms for Autonomous UAVs

Authors: Adil Farooq, Christos Laoudias, Panayiotis S. Kolios, Theocharis Theocharides

Abstract: Indoor exploration is an important task in disaster relief, emergency response scenarios, and Search And Rescue (SAR) missions. Unmanned Aerial Vehicle (UAV) systems can aid first responders by maneuvering autonomously in areas inside buildings dangerous for humans to traverse, exploring the interior, and providing an accurate and reliable indoor map before the emergency response team takes action… ▽ More Indoor exploration is an important task in disaster relief, emergency response scenarios, and Search And Rescue (SAR) missions. Unmanned Aerial Vehicle (UAV) systems can aid first responders by maneuvering autonomously in areas inside buildings dangerous for humans to traverse, exploring the interior, and providing an accurate and reliable indoor map before the emergency response team takes action. Due to the challenging conditions in such scenarios and the inherent battery limitations and time constraints, we investigate 2D autonomous exploration strategies (e.g., based on 2D LiDAR) for mapping 3D indoor environments. First, we introduce a battery consumption model to consider the battery life aspect for the first time as a critical factor for evaluating the flight endurance of exploration strategies. Second, we perform extensive simulation experiments in diverse indoor environments using various state-of-the-art 2D LiDAR-based exploration strategies. We evaluate our findings in terms of various quantitative and qualitative performance indicators, suggesting that these strategies behave differently depending on the complexity of the environment and initial conditions, e.g., the entry point. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 2022 International Conference on Unmanned Aircraft Systems (ICUAS), June 21-24, 2022, Dubrovnik, Croatia (accepted)

arXiv:2203.13461 [pdf]

Interpretation of Chest x-rays affected by bullets using deep transfer learning

Authors: Shaheer Khan, Azib Farooq, Israr Khan, Muhammad Gulraiz Khan, Abdul Razzaq

Abstract: The potential of deep learning, especially in medical imaging, initiated astonishing results and improved the methodologies after every passing day. Deep learning in radiology provides the opportunity to classify, detect and segment different diseases automatically. In the proposed study, we worked on a non-trivial aspect of medical imaging where we classified and localized the X-Rays affected by… ▽ More The potential of deep learning, especially in medical imaging, initiated astonishing results and improved the methodologies after every passing day. Deep learning in radiology provides the opportunity to classify, detect and segment different diseases automatically. In the proposed study, we worked on a non-trivial aspect of medical imaging where we classified and localized the X-Rays affected by bullets. We tested Images on different classification and localization models to get considerable accuracy. The replicated data set used in the study was replicated on different images of chest X-Rays. The proposed model worked not only on chest radiographs but other body organs X-rays like leg, abdomen, head, even the training dataset based on chest radiographs. Custom models have been used for classification and localization purposes after tuning parameters. Finally, the results of our findings manifested using different frameworks. This might assist the research enlightening towards this field. To the best of our knowledge, this is the first study on the detection and classification of radiographs affected by bullets using deep learning. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: 10 pages, 8 figures, 1 table, Paper is intended to publish in journal and first author reserves all copyrights of reproduction

arXiv:2201.01661 [pdf]

doi 10.1109/TIV.2022.3158094

Evaluation of Thermal Imaging on Embedded GPU Platforms for Application in Vehicular Assistance Systems

Authors: Muhammad Ali Farooq, Waseem Shariff, Peter Corcoran

Abstract: This study is focused on evaluating the real-time performance of thermal object detection for smart and safe vehicular systems by deploying the trained networks on GPU & single-board EDGE-GPU computing platforms for onboard automotive sensor suite testing. A novel large-scale thermal dataset comprising of > 35,000 distinct frames is acquired, processed, and open-sourced in challenging weather and… ▽ More This study is focused on evaluating the real-time performance of thermal object detection for smart and safe vehicular systems by deploying the trained networks on GPU & single-board EDGE-GPU computing platforms for onboard automotive sensor suite testing. A novel large-scale thermal dataset comprising of > 35,000 distinct frames is acquired, processed, and open-sourced in challenging weather and environmental scenarios. The dataset is a recorded from lost-cost yet effective uncooled LWIR thermal camera, mounted stand-alone and on an electric vehicle to minimize mechanical vibrations. State-of-the-art YOLO-V5 networks variants are trained using four different public datasets as well newly acquired local dataset for optimal generalization of DNN by employing SGD optimizer. The effectiveness of trained networks is validated on extensive test data using various quantitative metrics which include precision, recall curve, mean average precision, and frames per second. The smaller network variant of YOLO is further optimized using TensorRT inference accelerator to explicitly boost the frames per second rate. Optimized network engine increases the frames per second rate by 3.5 times when testing on low power edge devices thus achieving 11 fps on Nvidia Jetson Nano and 60 fps on Nvidia Xavier NX development boards. △ Less

Submitted 5 January, 2022; originally announced January 2022.

Comments: 14 pages, 9 tables, and 27 figures

Journal ref: Published in IEEE-TIV Journal in 2023

arXiv:2111.15340 [pdf, other]

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

Authors: Sara Atito, Muhammad Awais, Ammarah Farooq, Zhenhua Feng, Josef Kittler

Abstract: Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training… ▽ More Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Supervised Learning (SSL), in principle, is free of this limitation, the choice of pretext task facilitating SSL is perpetuating this shortcoming by driving the learning process towards a single concept output. This study aims to investigate the possibility of modelling all the concepts present in an image without using labels. In this aspect the proposed SSL frame-work MC-SSL0.0 is a step towards Multi-Concept Self-Supervised Learning (MC-SSL) that goes beyond modelling single dominant label in an image to effectively utilise the information from all the concepts present in it. MC-SSL0.0 consists of two core design concepts, group masked model learning and learning of pseudo-concept for data token using a momentum encoder (teacher-student) framework. The experimental results on multi-label and multi-class image classification downstream tasks demonstrate that MC-SSL0.0 not only surpasses existing SSL methods but also outperforms supervised transfer learning. The source code will be made publicly available for community to train on bigger corpus. △ Less

Submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.13156 [pdf, other]

Global Interaction Modelling in Vision Transformer via Super Tokens

Authors: Ammarah Farooq, Muhammad Awais, Sara Ahmed, Josef Kittler

Abstract: With the popularity of Transformer architectures in computer vision, the research focus has shifted towards developing computationally efficient designs. Window-based local attention is one of the major techniques being adopted in recent works. These methods begin with very small patch size and small embedding dimensions and then perform strided convolution (patch merging) in order to reduce the f… ▽ More With the popularity of Transformer architectures in computer vision, the research focus has shifted towards developing computationally efficient designs. Window-based local attention is one of the major techniques being adopted in recent works. These methods begin with very small patch size and small embedding dimensions and then perform strided convolution (patch merging) in order to reduce the feature map size and increase embedding dimensions, hence, forming a pyramidal Convolutional Neural Network (CNN) like design. In this work, we investigate local and global information modelling in transformers by presenting a novel isotropic architecture that adopts local windows and special tokens, called Super tokens, for self-attention. Specifically, a single Super token is assigned to each image window which captures the rich local details for that window. These tokens are then employed for cross-window communication and global representation learning. Hence, most of the learning is independent of the image patches $(N)$ in the higher layers, and the class embedding is learned solely based on the Super tokens $(N/M^2)$ where $M^2$ is the window size. In standard image classification on Imagenet-1K, the proposed Super tokens based transformer (STT-S25) achieves 83.5\% accuracy which is equivalent to Swin transformer (Swin-B) with circa half the number of parameters (49M) and double the inference time throughput. The proposed Super token transformer offers a lightweight and promising backbone for visual recognition tasks. △ Less

Submitted 25 November, 2021; originally announced November 2021.

arXiv:2109.09854 [pdf]

Object Detection in Thermal Spectrum for Advanced Driver-Assistance Systems (ADAS)

Authors: Muhammad Ali Farooq, Peter Corcoran, Cosmin Rotariu, Waseem Shariff

Abstract: Object detection in thermal infrared spectrum provides more reliable data source in low-lighting conditions and different weather conditions, as it is useful both in-cabin and outside for pedestrian, animal, and vehicular detection as well as for detecting street-signs & lighting poles. This paper is about exploring and adapting state-of-the-art object detection and classifier framework on thermal… ▽ More Object detection in thermal infrared spectrum provides more reliable data source in low-lighting conditions and different weather conditions, as it is useful both in-cabin and outside for pedestrian, animal, and vehicular detection as well as for detecting street-signs & lighting poles. This paper is about exploring and adapting state-of-the-art object detection and classifier framework on thermal vision with seven distinct classes for advanced driver-assistance systems (ADAS). The trained network variants on public datasets are validated on test data with three different test approaches which include test-time with no augmentation, test-time augmentation, and test-time with model ensembling. Additionally, the efficacy of trained networks is tested on locally gathered novel test-data captured with an uncooled LWIR prototype thermal camera in challenging weather and environmental scenarios. The performance analysis of trained models is investigated by computing precision, recall, and mean average precision scores (mAP). Furthermore, the trained model architecture is optimized using TensorRT inference accelerator and deployed on resource-constrained edge hardware Nvidia Jetson Nano to explicitly reduce the inference time on GPU as well as edge devices for further real-time onboard installations. △ Less

Submitted 27 October, 2021; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: This work is carried under EU funded project (https://www.heliaus.eu/)

arXiv:2107.04133 [pdf]

doi 10.1007/978-3-030-74728-2_8

Effectiveness of State-of-the-Art Super Resolution Algorithms in Surveillance Environment

Authors: Muhammad Ali Farooq, Ammar Ali Khan, Ansar Ahmad, Rana Hammad Raza

Abstract: Image Super Resolution (SR) finds applications in areas where images need to be closely inspected by the observer to extract enhanced information. One such focused application is an offline forensic analysis of surveillance feeds. Due to the limitations of camera hardware, camera pose, limited bandwidth, varying illumination conditions, and occlusions, the quality of the surveillance feed is signi… ▽ More Image Super Resolution (SR) finds applications in areas where images need to be closely inspected by the observer to extract enhanced information. One such focused application is an offline forensic analysis of surveillance feeds. Due to the limitations of camera hardware, camera pose, limited bandwidth, varying illumination conditions, and occlusions, the quality of the surveillance feed is significantly degraded at times, thereby compromising monitoring of behavior, activities, and other sporadic information in the scene. For the proposed research work, we have inspected the effectiveness of four conventional yet effective SR algorithms and three deep learning-based SR algorithms to seek the finest method that executes well in a surveillance environment with limited training data op-tions. These algorithms generate an enhanced resolution output image from a sin-gle low-resolution (LR) input image. For performance analysis, a subset of 220 images from six surveillance datasets has been used, consisting of individuals with varying distances from the camera, changing illumination conditions, and complex backgrounds. The performance of these algorithms has been evaluated and compared using both qualitative and quantitative metrics. These SR algo-rithms have also been compared based on face detection accuracy. By analyzing and comparing the performance of all the algorithms, a Convolutional Neural Network (CNN) based SR technique using an external dictionary proved to be best by achieving robust face detection accuracy and scoring optimal quantitative metric results under different surveillance conditions. This is because the CNN layers progressively learn more complex features using an external dictionary. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Journal ref: Springer, Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1376), 2021

arXiv:2103.03503 [pdf]

NPT-Loss: A Metric Loss with Implicit Mining for Face Recognition

Authors: Syed Safwan Khalid, Muhammad Awais, Chi-Ho Chan, Zhenhua Feng, Ammarah Farooq, Ali Akbari, Josef Kittler

Abstract: Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the appropriate design of a loss function that ensures discrimination between various identities. The state-of-the-art (SOTA) solutions utilise normalised Softmax loss with additive and/or multiplicative margins. Despite being popular, these Sof… ▽ More Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the appropriate design of a loss function that ensures discrimination between various identities. The state-of-the-art (SOTA) solutions utilise normalised Softmax loss with additive and/or multiplicative margins. Despite being popular, these Softmax+margin based losses are not theoretically motivated and the effectiveness of a margin is justified only intuitively. In this work, we utilise an alternative framework that offers a more direct mechanism of achieving discrimination among the features of various identities. We propose a novel loss that is equivalent to a triplet loss with proxies and an implicit mechanism of hard-negative mining. We give theoretical justification that minimising the proposed loss ensures a minimum separability between all identities. The proposed loss is simple to implement and does not require heavy hyper-parameter tuning as in the SOTA solutions. We give empirical evidence that despite its simplicity, the proposed loss consistently achieves SOTA performance in various benchmarks for both high-resolution and low-resolution FR tasks. △ Less

Submitted 5 March, 2021; originally announced March 2021.

arXiv:2103.01756 [pdf]

Current eHealth Challenges and recent trends in eHealth applications

Authors: Muhammad Mudassar Qureshi, Amjad Farooq, Muhammad Mazhar Qureshi

Abstract: eHealth (Health Informatics/Medical Informatics) field is growing worldwide due to acknowledge of reputable Organizations such as World Health Organization, Institute of Medicine in USA and several others. This field is facing number of challenges and there is need to classify these challenges mentioned by different researchers of this area. The purpose of this study is to classify different eHeal… ▽ More eHealth (Health Informatics/Medical Informatics) field is growing worldwide due to acknowledge of reputable Organizations such as World Health Organization, Institute of Medicine in USA and several others. This field is facing number of challenges and there is need to classify these challenges mentioned by different researchers of this area. The purpose of this study is to classify different eHealth challenges in broader categories. We also analyzed recent eHealth Applications to identify current trends of such applications. In this paper, we identify stakeholders who are responsible to contribute in a particular eHealth challenge. Through eHealth application analysis, we categories these applications based on different factors. We identify different socio-economic benefits, which these applications can provide. We also present ecosystem of an eHealth application. We gave recommendations for eHealth challenges relevant to Information Technology domain. We conclude our discussion by specifying areas for future research and recommending researchers to work on identify which type of disease can control and manage by different eHealth applications. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2101.08238 [pdf, other]

AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification

Authors: Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid

Abstract: Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations induced by the semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual… ▽ More Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations induced by the semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual representations. The underlying building block, named AXM-Block, is a unified multi-layer network that dynamically exploits the multi-scale knowledge from both modalities and re-calibrates each modality according to shared semantics. To complement the convolutional design, contextual attention is applied in the text branch to manipulate long-term dependencies. Moreover, we propose a unique design to enhance visual part-based feature coherence and locality information. Our framework is novel in its ability to implicitly learn aligned semantics between modalities during the feature learning stage. The unified feature learning effectively utilizes textual data as a super-annotation signal for visual representation learning and automatically rejects irrelevant information. The entire AXM-Net is trained end-to-end on CUHK-PEDES data. We report results on two tasks, person search and cross-modal Re-ID. The AXM-Net outperforms the current state-of-the-art (SOTA) methods and achieves 64.44\% Rank@1 on the CUHK-PEDES test set. It also outperforms its competitors by $>$10\% in cross-viewpoint text-to-image Re-ID scenarios on CrossRe-ID and CUHK-SYSU datasets. △ Less

Submitted 20 July, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

Comments: AAAI-2022 (Oral Paper)

arXiv:2010.10368 [pdf, other]

A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation

Authors: Ali Akbari, Muhammad Awais, Zhen-Hua Feng, Ammarah Farooq, Josef Kittler

Abstract: The most existing studies in the facial age estimation assume training and test images are captured under similar shooting conditions. However, this is rarely valid in real-world applications, where training and test sets usually have different characteristics. In this paper, we advocate a cross-dataset protocol for age estimation benchmarking. In order to improve the cross-dataset age estimation… ▽ More The most existing studies in the facial age estimation assume training and test images are captured under similar shooting conditions. However, this is rarely valid in real-world applications, where training and test sets usually have different characteristics. In this paper, we advocate a cross-dataset protocol for age estimation benchmarking. In order to improve the cross-dataset age estimation performance, we mitigate the inherent bias caused by the learning algorithm itself. To this end, we propose a novel loss function that is more effective for neural network training. The relative smoothness of the proposed loss function is its advantage with regards to the optimisation process performed by stochastic gradient descent (SGD). Compared with existing loss functions, the lower gradient of the proposed loss function leads to the convergence of SGD to a better optimum point, and consequently a better generalisation. The cross-dataset experimental results demonstrate the superiority of the proposed method over the state-of-the-art algorithms in terms of accuracy and generalisation capability. △ Less

Submitted 26 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

arXiv:2006.12369 [pdf, other]

Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reduction

Authors: Adam Farooq, Yordan P. Raykov, Petar Raykov, Max A. Little

Abstract: Ubiquitous linear Gaussian exploratory tools such as principle component analysis (PCA) and factor analysis (FA) remain widely used as tools for: exploratory analysis, pre-processing, data visualization and related tasks. However, due to their rigid assumptions including crowding of high dimensional data, they have been replaced in many settings by more flexible and still interpretable latent feat… ▽ More Ubiquitous linear Gaussian exploratory tools such as principle component analysis (PCA) and factor analysis (FA) remain widely used as tools for: exploratory analysis, pre-processing, data visualization and related tasks. However, due to their rigid assumptions including crowding of high dimensional data, they have been replaced in many settings by more flexible and still interpretable latent feature models. The Feature allocation is usually modelled using discrete latent variables assumed to follow either parametric Beta-Bernoulli distribution or Bayesian nonparametric prior. In this work we propose a simple and tractable parametric feature allocation model which can address key limitations of current latent feature decomposition techniques. The new framework allows for explicit control over the number of features used to express each point and enables a more flexible set of allocation distributions including feature allocations with different sparsity levels. This approach is used to derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction in a wide case of scenarios. We derive both standard Gibbs sampler, as well as, an expectation-maximization inference algorithms that converge orders of magnitude faster to a reasonable point estimate solution. The utility of the proposed aPPCA model is demonstrated for standard PCA tasks such as feature learning, data visualization and data whitening. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST and when applied for interpreting autoencoder features. We also demonstrate an application of the aPPCA to more robust blind source separation for functional magnetic resonance imaging (fMRI). △ Less

Submitted 28 February, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Interactive demo available at https://colab.research.google.com/drive/1KrrHmAu6mV7tutZtYnpEbVibxs4GCwIo?usp=sharing

ACM Class: I.5.1

arXiv:2005.01923 [pdf]

doi 10.1109/QoMEX48832.2020.9123079

Generating Thermal Image Data Samples using 3D Facial Modelling Techniques and Deep Learning Methodologies

Authors: Muhammad Ali Farooq, Peter Corcoran

Abstract: Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tuft… ▽ More Methods for generating synthetic data have become of increasing importance to build large datasets required for Convolution Neural Networks (CNN) based deep learning techniques for a wide range of computer vision applications. In this work, we extend existing methodologies to show how 2D thermal facial data can be mapped to provide 3D facial models. For the proposed research work we have used tufts datasets for generating 3D varying face poses by using a single frontal face pose. The system works by refining the existing image quality by performing fusion based image preprocessing operations. The refined outputs have better contrast adjustments, decreased noise level and higher exposedness of the dark regions. It makes the facial landmarks and temperature patterns on the human face more discernible and visible when compared to original raw data. Different image quality metrics are used to compare the refined version of images with original images. In the next phase of the proposed study, the refined version of images is used to create 3D facial geometry structures by using Convolution Neural Networks (CNN). The generated outputs are then imported in blender software to finally extract the 3D thermal facial outputs of both males and females. The same technique is also used on our thermal face data acquired using prototype thermal camera (developed under Heliaus EU project) in an indoor lab environment which is then used for generating synthetic 3D face data along with varying yaw face angles and lastly facial depth map is generated. △ Less

Submitted 7 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: Paper accpeted in QOMEX IEEE 2020 Conference copyright submitted to IEEE

arXiv:2003.06794 [pdf]

doi 10.1109/FIT.2017.00071

Performance Evaluation of Advanced Deep Learning Architectures for Offline Handwritten Character Recognition

Authors: Moazam Soomro, Muhammad Ali Farooq, Rana Hammad Raza

Abstract: This paper presents a hand-written character recognition comparison and performance evaluation for robust and precise classification of different hand-written characters. The system utilizes advanced multilayer deep neural network by collecting features from raw pixel values. The hidden layers stack deep hierarchies of non-linear features since learning complex features from conventional neural ne… ▽ More This paper presents a hand-written character recognition comparison and performance evaluation for robust and precise classification of different hand-written characters. The system utilizes advanced multilayer deep neural network by collecting features from raw pixel values. The hidden layers stack deep hierarchies of non-linear features since learning complex features from conventional neural networks is very challenging. Two state of the art deep learning architectures were used which includes Caffe AlexNet and GoogleNet models in NVIDIA DIGITS.The frameworks were trained and tested on two different datasets for incorporating diversity and complexity. One of them is the publicly available dataset i.e. Chars74K comprising of 7705 characters and has upper and lowercase English alphabets, along with numerical digits. While the other dataset created locally consists of 4320 characters. The local dataset consists of 62 classes and was created by 40 subjects. It also consists upper and lowercase English alphabets, along with numerical digits. The overall dataset is divided in the ratio of 80% for training and 20% for testing phase. The time required for training phase is approximately 90 minutes. For validation part, the results obtained were compared with the groundtruth. The accuracy level achieved with AlexNet was 77.77% and 88.89% with Google Net. The higher accuracy level of GoogleNet is due to its unique combination of inception modules, each including pooling, convolutions at various scales and concatenation procedures. △ Less

Submitted 15 March, 2020; originally announced March 2020.

Comments: FIT 2017 Paper Published in IEEE FIT 2017

arXiv:2003.06356 [pdf]

Advanced Deep Learning Methodologies for Skin Cancer Classification in Prodromal Stages

Authors: Muhammad Ali Farooq, Asma Khatoon, Viktor Varkarakis, Peter Corcoran

Abstract: Technology-assisted platforms provide reliable solutions in almost every field these days. One such important application in the medical field is the skin cancer classification in preliminary stages that need sensitive and precise data analysis. For the proposed study the Kaggle skin cancer dataset is utilized. The proposed study consists of two main phases. In the first phase, the images are prep… ▽ More Technology-assisted platforms provide reliable solutions in almost every field these days. One such important application in the medical field is the skin cancer classification in preliminary stages that need sensitive and precise data analysis. For the proposed study the Kaggle skin cancer dataset is utilized. The proposed study consists of two main phases. In the first phase, the images are preprocessed to remove the clutters thus producing a refined version of training images. To achieve that, a sharpening filter is applied followed by a hair removal algorithm. Different image quality measurement metrics including Peak Signal to Noise (PSNR), Mean Square Error (MSE), Maximum Absolute Squared Deviation (MXERR) and Energy Ratio/ Ratio of Squared Norms (L2RAT) are used to compare the overall image quality before and after applying preprocessing operations. The results from the aforementioned image quality metrics prove that image quality is not compromised however it is upgraded by applying the preprocessing operations. The second phase of the proposed research work incorporates deep learning methodologies that play an imperative role in accurate, precise and robust classification of the lesion mole. This has been reflected by using two state of the art deep learning models: Inception-v3 and MobileNet. The experimental results demonstrate notable improvement in train and validation accuracy by using the refined version of images of both the networks, however, the Inception-v3 network was able to achieve better validation accuracy thus it was finally selected to evaluate it on test data. The final test accuracy using state of art Inception-v3 network was 86%. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: Paper Published in AICS 2019

arXiv:2003.06276 [pdf]

doi 10.1109/BIBE.2016.53

Automatic Lesion Detection System (ALDS) for Skin Cancer Classification Using SVM and Neural Classifiers

Authors: Muhammad Ali Farooq, Muhammad Aatif Mobeen Azhar, Rana Hammad Raza

Abstract: Technology aided platforms provide reliable tools in almost every field these days. These tools being supported by computational power are significant for applications that need sensitive and precise data analysis. One such important application in the medical field is Automatic Lesion Detection System (ALDS) for skin cancer classification. Computer aided diagnosis helps physicians and dermatologi… ▽ More Technology aided platforms provide reliable tools in almost every field these days. These tools being supported by computational power are significant for applications that need sensitive and precise data analysis. One such important application in the medical field is Automatic Lesion Detection System (ALDS) for skin cancer classification. Computer aided diagnosis helps physicians and dermatologists to obtain a second opinion for proper analysis and treatment of skin cancer. Precise segmentation of the cancerous mole along with surrounding area is essential for proper analysis and diagnosis. This paper is focused towards the development of improved ALDS framework based on probabilistic approach that initially utilizes active contours and watershed merged mask for segmenting out the mole and later SVM and Neural Classifier are applied for the classification of the segmented mole. After lesion segmentation, the selected features are classified to ascertain that whether the case under consideration is melanoma or non-melanoma. The approach is tested for varying datasets and comparative analysis is performed that reflects the effectiveness of the proposed system. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:2003.00808 [pdf, other]

A Convolutional Baseline for Person Re-Identification Using Vision and Language Descriptions

Authors: Ammarah Farooq, Muhammad Awais, Fei Yan, Josef Kittler, Ali Akbari, Syed Safwan Khalid

Abstract: Classical person re-identification approaches assume that a person of interest has appeared across different cameras and can be queried by one of the existing images. However, in real-world surveillance scenarios, frequently no visual information will be available about the queried person. In such scenarios, a natural language description of the person by a witness will provide the only source of… ▽ More Classical person re-identification approaches assume that a person of interest has appeared across different cameras and can be queried by one of the existing images. However, in real-world surveillance scenarios, frequently no visual information will be available about the queried person. In such scenarios, a natural language description of the person by a witness will provide the only source of information for retrieval. In this work, person re-identification using both vision and language information is addressed under all possible gallery and query scenarios. A two stream deep convolutional neural network framework supervised by cross entropy loss is presented. The weights connecting the second last layer to the last layer with class probabilities, i.e., logits of softmax layer are shared in both networks. Canonical Correlation Analysis is performed to enhance the correlation between the two modalities in a joint latent embedding space. To investigate the benefits of the proposed approach, a new testing protocol under a multi modal ReID setting is proposed for the test split of the CUHK-PEDES and CUHK-SYSU benchmarks. The experimental results verify the merits of the proposed system. The learnt visual representations are more robust and perform 22\% better during retrieval as compared to a single modality system. The retrieval with a multi modal query greatly enhances the re-identification capability of the system quantitatively as well as qualitatively. △ Less

Submitted 20 February, 2020; originally announced March 2020.

Comments: 12 pages including references, currently under review in IEEE transactions on Image Processing

arXiv:1911.06304 [pdf, other]

Detecting Safety and Security Faults in PLC Systems with Data Provenance

Authors: Abdullah Al Farooq, Jessica Marquard, Kripa George, Thomas Moyer

Abstract: Programmable Logic Controllers are an integral component for managing many different industrial processes (e.g., smart building management, power generation, water and wastewater management, and traffic control systems), and manufacturing and control industries (e.g., oil and natural gas, chemical, pharmaceutical, pulp and paper, food and beverage, automotive, and aerospace). Despite being used wi… ▽ More Programmable Logic Controllers are an integral component for managing many different industrial processes (e.g., smart building management, power generation, water and wastewater management, and traffic control systems), and manufacturing and control industries (e.g., oil and natural gas, chemical, pharmaceutical, pulp and paper, food and beverage, automotive, and aerospace). Despite being used widely in many critical infrastructures, PLCs use protocols which make these control systems vulnerable to many common attacks, including man-in-the-middle attacks, denial of service attacks, and memory corruption attacks (e.g., array, stack, and heap overflows, integer overflows, and pointer corruption). In this paper, we propose PLC-PROV, a system for tracking the inputs and outputs of the control system to detect violations in the safety and security policies of the system. We consider a smart building as an example of a PLC-based system and show how PLC-PROV can be applied to ensure that the inputs and outputs are consistent with the intended safety and security policies. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Journal ref: 2019 IEEE International Symposium on Technologies for Homeland Security

arXiv:1905.11010 [pdf, ps, other]

Adaptive probabilistic principal component analysis

Authors: Adam Farooq, Yordan P. Raykov, Luc Evers, Max A. Little

Abstract: Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point… ▽ More Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring. △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1812.03966 [pdf, other]

IoTC2: A Formal Method Approach for Detecting Conflicts in Large Scale IoT Systems

Authors: Abdullah Al Farooq, Ehab Al-Shaer, Thomas Moyer, Krishna Kant

Abstract: Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shopping, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can… ▽ More Internet of Things (IoT) has become a common paradigm for different domains such as health care, transportation infrastructure, smart home, smart shopping, and e-commerce. With its interoperable functionality, it is now possible to connect all domains of IoT together for providing competent services to the users. Because numerous IoT devices can connect and communicate at the same time, there can be events that trigger conflicting actions to an actuator or an environmental feature. However, there have been very few research efforts made to detect conflicting situation in IoT system using formal method. This paper provides a formal method approach, IoT Confict Checker (IoTC2), to ensure safety of controller and actuators' behavior with respect to conflicts. Any policy violation results in detection of the conflicts. We defined the safety policies for controller, actions, and triggering events and implemented the those with Prolog to prove the logical completeness and soundness. In addition to that, we have implemented the detection policies in Matlab Simulink Environment with its built-in Model Verification blocks. We created smart home environment in Simulink and showed how the conflicts affect actions and corresponding features. We have also experimented the scalability, efficiency, and accuracy of our method in the simulated environment. △ Less

Submitted 10 December, 2018; originally announced December 2018.

arXiv:1610.02575 [pdf, ps, other]

doi 10.1371/journal.pone.0175876

Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms

Authors: Martin S. Zand, Melissa Trayhan, Samir A. Farooq, Christopher Fucile, Grourab Ghoshal, Robert J. White, Caroline M. Quill, Alexander Rosenberg, Hugo Serrano, Hassan Chafi, Timothy Boudreau

Abstract: Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other. Most healthcare service network models have been constructed from patient claims data, using billing claims to link patients with providers. The data sets can be quite large, making standard methods for network construction computationally challenging and thus requiring… ▽ More Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other. Most healthcare service network models have been constructed from patient claims data, using billing claims to link patients with providers. The data sets can be quite large, making standard methods for network construction computationally challenging and thus requiring the use of alternate construction algorithms. While these alternate methods have seen increasing use in generating healthcare networks, there is little to no literature comparing the differences in the structural properties of the generated networks. To address this issue, we compared the properties of healthcare networks constructed using different algorithms and the 2013 Medicare Part B outpatient claims data. Three different algorithms were compared: binning, sliding frame, and trace-route. Unipartite networks linking either providers or healthcare organizations by shared patients were built using each method. We found that each algorithm produced networks with substantially different topological properties. Provider networks adhered to a power law, and organization networks to a power law with exponential cutoff. Censoring networks to exclude edges with less than 11 shared patients, a common de-identification practice for healthcare network data, markedly reduced edge numbers and greatly altered measures of vertex prominence such as the betweenness centrality. We identified patterns in the distance patients travel between network providers, and most strikingly between providers in the Northeast United States and Florida. We conclude that the choice of network construction algorithm is critical for healthcare network analysis, and discuss the implications for selecting the algorithm best suited to the type of analysis to be performed. △ Less

Submitted 8 October, 2016; originally announced October 2016.

Comments: With links to comprehensive, high resolution figures and networks via figshare.com

arXiv:1112.6219 [pdf]

doi 10.5120/1640-2204

Document Clustering based on Topic Maps

Authors: Muhammad Rafi, M. Shahid Shaikh, Amir Farooq

Abstract: Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to… ▽ More Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algorithms, which are based on either Document Vector model (DVM) or Suffix Tree model (STC), are less efficient in producing results with high cluster quality. This paper introduces a new approach for document clustering based on the Topic Map representation of the documents. The document is being transformed into a compact form. A similarity measure is proposed based upon the inferred information through topic maps data and structures. The suggested method is implemented using agglomerative hierarchal clustering and tested on standard Information retrieval (IR) datasets. The comparative experiment reveals that the proposed approach is effective in improving the cluster quality. △ Less

Submitted 28 December, 2011; originally announced December 2011.

Journal ref: International Journal of Computer Applications 12(1):32-36, December 2010

arXiv:1006.4562 [pdf]

Engineering Semantic Web Applications by Using Object-Oriented Paradigm

Authors: Amjad Farooq, Syed Ahsan, Abad Shah

Abstract: The web information resources are growing explosively in number and volume. Now to retrieve relevant data from web has become very difficult and time-consuming. Semantic Web envisions that these web resources should be developed in machine-processable way in order to handle irrelevancy and manual processing problems. Whereas, the Semantic Web is an extension of current web, in which web resources… ▽ More The web information resources are growing explosively in number and volume. Now to retrieve relevant data from web has become very difficult and time-consuming. Semantic Web envisions that these web resources should be developed in machine-processable way in order to handle irrelevancy and manual processing problems. Whereas, the Semantic Web is an extension of current web, in which web resources are equipped with formal semantics about their interpretation through machines. These web resources are usually contained in web applications and systems, and their formal semantics are normally represented in the form of web-ontologies. In this research paper, an object-oriented design methodology (OODM) is upgraded for developing semantic web applications. OODM has been developed for designing of web applications for the current web. This methodology is good enough to develop web applications. It also provides a systematic approach for the web applications development but it is not helpful in generating machine-pocessable content of web applications in their development. Therefore, this methodology needs to be extended. In this paper, we propose that extension in OODM. This new extended version is referred to as the semantic web object-oriented design methodology (SW-OODM). △ Less

Submitted 23 June, 2010; originally announced June 2010.

Comments: IEEE Publication Format, https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Vol. 2, No. 6, June 2010, NY, USA, ISSN 2151-9617

arXiv:1006.4561 [pdf]

An Efficient Technique for Similarity Identification between Ontologies

Authors: Amjad Farooq, Syed Ahsan, Abad Shah

Abstract: Ontologies usually suffer from the semantic heterogeneity when simultaneously used in information sharing, merging, integrating and querying processes. Therefore, the similarity identification between ontologies being used becomes a mandatory task for all these processes to handle the problem of semantic heterogeneity. In this paper, we propose an efficient technique for similarity measurement bet… ▽ More Ontologies usually suffer from the semantic heterogeneity when simultaneously used in information sharing, merging, integrating and querying processes. Therefore, the similarity identification between ontologies being used becomes a mandatory task for all these processes to handle the problem of semantic heterogeneity. In this paper, we propose an efficient technique for similarity measurement between two ontologies. The proposed technique identifies all candidate pairs of similar concepts without omitting any similar pair. The proposed technique can be used in different types of operations on ontologies such as merging, mapping and aligning. By analyzing its results a reasonable improvement in terms of completeness, correctness and overall quality of the results has been found. △ Less

Submitted 23 June, 2010; originally announced June 2010.

Comments: IEEE Publication Format, https://sites.google.com/site/journalofcomputing/

Journal ref: Journal of Computing, Vol. 2, No. 6, June 2010, NY, USA, ISSN 2151-9617

Showing 1–39 of 39 results for author: Farooq, A