subscribe to arXiv mailings

Robust ADAS: Enhancing Robustness of Machine Learning-based Advanced Driver Assistance Systems for Adverse Weather

Authors: Muhammad Zaeem Shahzad, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: In the realm of deploying Machine Learning-based Advanced Driver Assistance Systems (ML-ADAS) into real-world scenarios, adverse weather conditions pose a significant challenge. Conventional ML models trained on clear weather data falter when faced with scenarios like extreme fog or heavy rain, potentially leading to accidents and safety hazards. This paper addresses this issue by proposing a nove… ▽ More In the realm of deploying Machine Learning-based Advanced Driver Assistance Systems (ML-ADAS) into real-world scenarios, adverse weather conditions pose a significant challenge. Conventional ML models trained on clear weather data falter when faced with scenarios like extreme fog or heavy rain, potentially leading to accidents and safety hazards. This paper addresses this issue by proposing a novel approach: employing a Denoising Deep Neural Network as a preprocessing step to transform adverse weather images into clear weather images, thereby enhancing the robustness of ML-ADAS systems. The proposed method eliminates the need for retraining all subsequent Depp Neural Networks (DNN) in the ML-ADAS pipeline, thus saving computational resources and time. Moreover, it improves driver visualization, which is critical for safe navigation in adverse weather conditions. By leveraging the UNet architecture trained on an augmented KITTI dataset with synthetic adverse weather images, we develop the Weather UNet (WUNet) DNN to remove weather artifacts. Our study demonstrates substantial performance improvements in object detection with WUNet preprocessing under adverse weather conditions. Notably, in scenarios involving extreme fog, our proposed solution improves the mean Average Precision (mAP) score of the YOLOv8n from 4% to 70%. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 7 pages, 10 figures, 1 table

arXiv:2405.03244 [pdf, other]

Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition

Authors: Nishant Suresh Aswani, Amira Guesmi, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Continual learning (CL) has spurred the development of several methods aimed at consolidating previous knowledge across sequential learning. Yet, the evaluations of these methods have primarily focused on the final output, such as changes in the accuracy of predicted classes, overlooking the issue of representational forgetting within the model. In this paper, we propose a novel representation-bas… ▽ More Continual learning (CL) has spurred the development of several methods aimed at consolidating previous knowledge across sequential learning. Yet, the evaluations of these methods have primarily focused on the final output, such as changes in the accuracy of predicted classes, overlooking the issue of representational forgetting within the model. In this paper, we propose a novel representation-based evaluation framework for CL models. This approach involves gathering internal representations from throughout the continual learning process and formulating three-dimensional tensors. The tensors are formed by stacking representations, such as layer activations, generated from several inputs and model `snapshots', throughout the learning process. By conducting tensor component analysis (TCA), we aim to uncover meaningful patterns about how the internal representations evolve, expecting to highlight the merits or shortcomings of examined CL strategies. We conduct our analyses across different model architectures and importance-based continual learning strategies, with a curated task selection. While the results of our approach mirror the difference in performance of various CL strategies, we found that our methodology did not directly highlight specialized clusters of neurons, nor provide an immediate understanding the evolution of filters. We believe a scaled down version of our approach will provide insight into the benefits and pitfalls of using TCA to study continual learning dynamics. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.13915 [pdf, other]

Angle-Aware Coverage with Camera Rotational Motion Control

Authors: Zhiyuan Lu, Muhammad Hanif, Takumi Shimizu, Takeshi Hatanaka

Abstract: This paper presents a novel control strategy for drone networks to improve the quality of 3D structures reconstructed from aerial images by drones. Unlike the existing coverage control strategies for this purpose, our proposed approach simultaneously controls both the camera orientation and drone translational motion, enabling more comprehensive perspectives and enhancing the map's overall quality… ▽ More This paper presents a novel control strategy for drone networks to improve the quality of 3D structures reconstructed from aerial images by drones. Unlike the existing coverage control strategies for this purpose, our proposed approach simultaneously controls both the camera orientation and drone translational motion, enabling more comprehensive perspectives and enhancing the map's overall quality. Subsequently, we present a novel problem formulation, including a new performance function to evaluate the drone positions and camera orientations. We then design a QP-based controller with a control barrier-like function for a constraint on the decay rate of the objective function. The present problem formulation poses a new challenge, requiring significantly greater computational efforts than the case involving only translational motion control. We approach this issue technologically, namely by introducing JAX, utilizing just-in-time (JIT) compilation and Graphical Processing Unit (GPU) acceleration. We finally conduct extensive verifications through simulation in ROS (Robot Operating System) and show the real-time feasibility of the controller and the superiority of the present controller to the conventional method. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 17 pages, 8 figures, 2 tables

arXiv:2403.11515 [pdf, other]

SSAP: A Shape-Sensitive Adversarial Patch for Comprehensive Disruption of Monocular Depth Estimation in Autonomous Navigation Applications

Authors: Amira Guesmi, Muhammad Abdullah Hanif, Ihsen Alouani, Bassem Ouni, Muhammad Shafique

Abstract: Monocular depth estimation (MDE) has advanced significantly, primarily through the integration of convolutional neural networks (CNNs) and more recently, Transformers. However, concerns about their susceptibility to adversarial attacks have emerged, especially in safety-critical domains like autonomous driving and robotic navigation. Existing approaches for assessing CNN-based depth prediction met… ▽ More Monocular depth estimation (MDE) has advanced significantly, primarily through the integration of convolutional neural networks (CNNs) and more recently, Transformers. However, concerns about their susceptibility to adversarial attacks have emerged, especially in safety-critical domains like autonomous driving and robotic navigation. Existing approaches for assessing CNN-based depth prediction methods have fallen short in inducing comprehensive disruptions to the vision system, often limited to specific local areas. In this paper, we introduce SSAP (Shape-Sensitive Adversarial Patch), a novel approach designed to comprehensively disrupt monocular depth estimation (MDE) in autonomous navigation applications. Our patch is crafted to selectively undermine MDE in two distinct ways: by distorting estimated distances or by creating the illusion of an object disappearing from the system's perspective. Notably, our patch is shape-sensitive, meaning it considers the specific shape and scale of the target object, thereby extending its influence beyond immediate proximity. Furthermore, our patch is trained to effectively address different scales and distances from the camera. Experimental results demonstrate that our approach induces a mean depth estimation error surpassing 0.5, impacting up to 99% of the targeted region for CNN-based MDE models. Additionally, we investigate the vulnerability of Transformer-based MDE models to patch-based attacks, revealing that SSAP yields a significant error of 0.59 and exerts substantial influence over 99% of the target region on these models. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.00830 [pdf, other]

MedAide: Leveraging Large Language Models for On-Premise Medical Assistance on Edge Devices

Authors: Abdul Basit, Khizar Hussain, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Large language models (LLMs) are revolutionizing various domains with their remarkable natural language processing (NLP) abilities. However, deploying LLMs in resource-constrained edge computing and embedded systems presents significant challenges. Another challenge lies in delivering medical assistance in remote areas with limited healthcare facilities and infrastructure. To address this, we intr… ▽ More Large language models (LLMs) are revolutionizing various domains with their remarkable natural language processing (NLP) abilities. However, deploying LLMs in resource-constrained edge computing and embedded systems presents significant challenges. Another challenge lies in delivering medical assistance in remote areas with limited healthcare facilities and infrastructure. To address this, we introduce MedAide, an on-premise healthcare chatbot. It leverages tiny-LLMs integrated with LangChain, providing efficient edge-based preliminary medical diagnostics and support. MedAide employs model optimizations for minimal memory footprint and latency on embedded edge devices without server infrastructure. The training process is optimized using low-rank adaptation (LoRA). Additionally, the model is trained on diverse medical datasets, employing reinforcement learning from human feedback (RLHF) to enhance its domain-specific capabilities. The system is implemented on various consumer GPUs and Nvidia Jetson development board. MedAide achieves 77\% accuracy in medical consultations and scores 56 in USMLE benchmark, enabling an energy-efficient healthcare assistance platform that alleviates privacy concerns due to edge-based deployment, thereby empowering the community. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: 7 pages, 11 figures, ACM conference paper, 33 references

ACM Class: I.2.7

arXiv:2311.12211 [pdf, other]

DefensiveDR: Defending against Adversarial Patches using Dimensionality Reduction

Authors: Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique

Abstract: Adversarial patch-based attacks have shown to be a major deterrent towards the reliable use of machine learning models. These attacks involve the strategic modification of localized patches or specific image areas to deceive trained machine learning models. In this paper, we propose \textit{DefensiveDR}, a practical mechanism using a dimensionality reduction technique to thwart such patch-based at… ▽ More Adversarial patch-based attacks have shown to be a major deterrent towards the reliable use of machine learning models. These attacks involve the strategic modification of localized patches or specific image areas to deceive trained machine learning models. In this paper, we propose \textit{DefensiveDR}, a practical mechanism using a dimensionality reduction technique to thwart such patch-based attacks. Our method involves projecting the sample images onto a lower-dimensional space while retaining essential information or variability for effective machine learning tasks. We perform this using two techniques, Singular Value Decomposition and t-Distributed Stochastic Neighbor Embedding. We experimentally tune the variability to be preserved for optimal performance as a hyper-parameter. This dimension reduction substantially mitigates adversarial perturbations, thereby enhancing the robustness of the given machine learning model. Our defense is model-agnostic and operates without assumptions about access to model decisions or model architectures, making it effective in both black-box and white-box settings. Furthermore, it maintains accuracy across various models and remains robust against several unseen patch-based attacks. The proposed defensive approach improves the accuracy from 38.8\% (without defense) to 66.2\% (with defense) when performing LaVAN and GoogleAp attacks, which supersedes that of the prominent state-of-the-art like LGS (53.86\%) and Jujutsu (60\%). △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.12084 [pdf, other]

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Authors: Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique

Abstract: Adversarial attacks are a major deterrent towards the reliable use of machine learning models. A powerful type of adversarial attacks is the patch-based attack, wherein the adversarial perturbations modify localized patches or specific areas within the images to deceive the trained machine learning model. In this paper, we introduce Outlier Detection and Dimension Reduction (ODDR), a holistic defe… ▽ More Adversarial attacks are a major deterrent towards the reliable use of machine learning models. A powerful type of adversarial attacks is the patch-based attack, wherein the adversarial perturbations modify localized patches or specific areas within the images to deceive the trained machine learning model. In this paper, we introduce Outlier Detection and Dimension Reduction (ODDR), a holistic defense mechanism designed to effectively mitigate patch-based adversarial attacks. In our approach, we posit that input features corresponding to adversarial patches, whether naturalistic or otherwise, deviate from the inherent distribution of the remaining image sample and can be identified as outliers or anomalies. ODDR employs a three-stage pipeline: Fragmentation, Segregation, and Neutralization, providing a model-agnostic solution applicable to both image classification and object detection tasks. The Fragmentation stage parses the samples into chunks for the subsequent Segregation process. Here, outlier detection techniques identify and segregate the anomalous features associated with adversarial perturbations. The Neutralization stage utilizes dimension reduction methods on the outliers to mitigate the impact of adversarial perturbations without sacrificing pertinent information necessary for the machine learning task. Extensive testing on benchmark datasets and state-of-the-art adversarial patches demonstrates the effectiveness of ODDR. Results indicate robust accuracies matching and lying within a small range of clean accuracies (1%-3% for classification and 3%-5% for object detection), with only a marginal compromise of 1%-2% in performance on clean samples, thereby significantly outperforming other defenses. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.10315 [pdf, other]

A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead

Authors: Kamila Zaman, Alberto Marchisio, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is applied to Machine Learning (ML) applications, it forms a Quantum Machine Learning (QML) system. After discussing the basic concepts of QC and its advantages over classical computing, this paper reviews the key aspects of QML in a comprehensive manner. We discuss differe… ▽ More Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is applied to Machine Learning (ML) applications, it forms a Quantum Machine Learning (QML) system. After discussing the basic concepts of QC and its advantages over classical computing, this paper reviews the key aspects of QML in a comprehensive manner. We discuss different QML algorithms and their domain applicability, quantum datasets, hardware technologies, software tools, simulators, and applications. In this survey, we provide valuable information and resources for readers to jumpstart into the current state-of-the-art techniques in the QML field. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2308.06173 [pdf, other]

Physical Adversarial Attacks For Camera-based Smart Systems: Current Trends, Categorization, Applications, Research Challenges, and Future Outlook

Authors: Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammed Shafique

Abstract: In this paper, we present a comprehensive survey of the current trends focusing specifically on physical adversarial attacks. We aim to provide a thorough understanding of the concept of physical adversarial attacks, analyzing their key characteristics and distinguishing features. Furthermore, we explore the specific requirements and challenges associated with executing attacks in the physical wor… ▽ More In this paper, we present a comprehensive survey of the current trends focusing specifically on physical adversarial attacks. We aim to provide a thorough understanding of the concept of physical adversarial attacks, analyzing their key characteristics and distinguishing features. Furthermore, we explore the specific requirements and challenges associated with executing attacks in the physical world. Our article delves into various physical adversarial attack methods, categorized according to their target tasks in different applications, including classification, detection, face recognition, semantic segmentation and depth estimation. We assess the performance of these attack methods in terms of their effectiveness, stealthiness, and robustness. We examine how each technique strives to ensure the successful manipulation of DNNs while mitigating the risk of detection and withstanding real-world distortions. Lastly, we discuss the current challenges and outline potential future research directions in the field of physical adversarial attacks. We highlight the need for enhanced defense mechanisms, the exploration of novel attack strategies, the evaluation of attacks in different application domains, and the establishment of standardized benchmarks and evaluation criteria for physical adversarial attacks. Through this comprehensive survey, we aim to provide a valuable resource for researchers, practitioners, and policymakers to gain a holistic understanding of physical adversarial attacks in computer vision and facilitate the development of robust and secure DNN-based systems. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2308.03108 [pdf, other]

SAAM: Stealthy Adversarial Attack on Monocular Depth Estimation

Authors: Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique

Abstract: In this paper, we investigate the vulnerability of MDE to adversarial patches. We propose a novel \underline{S}tealthy \underline{A}dversarial \underline{A}ttacks on \underline{M}DE (SAAM) that compromises MDE by either corrupting the estimated distance or causing an object to seamlessly blend into its surroundings. Our experiments, demonstrate that the designed stealthy patch successfully causes… ▽ More In this paper, we investigate the vulnerability of MDE to adversarial patches. We propose a novel \underline{S}tealthy \underline{A}dversarial \underline{A}ttacks on \underline{M}DE (SAAM) that compromises MDE by either corrupting the estimated distance or causing an object to seamlessly blend into its surroundings. Our experiments, demonstrate that the designed stealthy patch successfully causes a DNN-based MDE to misestimate the depth of objects. In fact, our proposed adversarial patch achieves a significant 60\% depth error with 99\% ratio of the affected region. Importantly, despite its adversarial nature, the patch maintains a naturalistic appearance, making it inconspicuous to human observers. We believe that this work sheds light on the threat of adversarial attacks in the context of MDE on edge devices. We hope it raises awareness within the community about the potential real-life harm of such attacks and encourages further research into developing more robust and adaptive defense mechanisms. △ Less

Submitted 20 December, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

arXiv:2307.11128 [pdf, other]

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris

Abstract: The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performan… ▽ More The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Under Review at ACM Computing Surveys

arXiv:2307.11124 [pdf, other]

Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

Authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris

Abstract: The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the… ▽ More The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Under Review at ACM Computing Surveys

arXiv:2305.12595 [pdf, other]

Reduce: A Framework for Reducing the Overheads of Fault-Aware Retraining

Authors: Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed fo… ▽ More Fault-aware retraining has emerged as a prominent technique for mitigating permanent faults in Deep Neural Network (DNN) hardware accelerators. However, retraining leads to huge overheads, specifically when used for fine-tuning large DNNs designed for solving complex problems. Moreover, as each fabricated chip can have a distinct fault pattern, fault-aware retraining is required to be performed for each chip individually considering its unique fault map, which further aggravates the problem. To reduce the overall retraining cost, in this work, we introduce the concept of resilience-driven retraining amount selection. To realize this concept, we propose a novel framework, Reduce, that, at first, computes the resilience of the given DNN to faults at different fault rates and with different amounts of retraining. Then, based on the resilience, it computes the amount of retraining required for each chip considering its unique fault map. We demonstrate the effectiveness of our methodology for a systolic array-based DNN accelerator experiencing permanent faults in the computational array. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 2 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2304.12949

arXiv:2305.12590 [pdf, other]

FAQ: Mitigating the Impact of Faults in the Weight Memory of DNN Accelerators through Fault-Aware Quantization

Authors: Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Permanent faults induced due to imperfections in the manufacturing process of Deep Neural Network (DNN) accelerators are a major concern, as they negatively impact the manufacturing yield of the chip fabrication process. Fault-aware training is the state-of-the-art approach for mitigating such faults. However, it incurs huge retraining overheads, specifically when used for large DNNs trained on co… ▽ More Permanent faults induced due to imperfections in the manufacturing process of Deep Neural Network (DNN) accelerators are a major concern, as they negatively impact the manufacturing yield of the chip fabrication process. Fault-aware training is the state-of-the-art approach for mitigating such faults. However, it incurs huge retraining overheads, specifically when used for large DNNs trained on complex datasets. To address this issue, we propose a novel Fault-Aware Quantization (FAQ) technique for mitigating the effects of stuck-at permanent faults in the on-chip weight memory of DNN accelerators at a negligible overhead cost compared to fault-aware retraining while offering comparable accuracy results. We propose a lookup table-based algorithm to achieve ultra-low model conversion time. We present extensive evaluation of the proposed approach using five different DNNs, i.e., ResNet-18, VGG11, VGG16, AlexNet and MobileNetV2, and three different datasets, i.e., CIFAR-10, CIFAR-100 and ImageNet. The results demonstrate that FAQ helps in maintaining the baseline accuracy of the DNNs at low and moderate fault rates without involving costly fault-aware training. For example, for ResNet-18 trained on the CIFAR-10 dataset, at 0.04 fault rate FAQ offers (on average) an increase of 76.38% in accuracy. Similarly, for VGG11 trained on the CIFAR-10 dataset, at 0.04 fault rate FAQ offers (on average) an increase of 70.47% in accuracy. The results also show that FAQ incurs negligible overheads, i.e., less than 5% of the time required to run 1 epoch of retraining. We additionally demonstrate the efficacy of our technique when used in conjunction with fault-aware retraining and show that the use of FAQ inside fault-aware retraining enables fast accuracy recovery. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 8 pages, 15 figures

arXiv:2305.11618 [pdf, other]

DAP: A Dynamic Adversarial Patch for Evading Person Detectors

Authors: Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif, Ihsen Alouani, Muhammad Shafique

Abstract: Patch-based adversarial attacks were proven to compromise the robustness and reliability of computer vision systems. However, their conspicuous and easily detectable nature challenge their practicality in real-world setting. To address this, recent work has proposed using Generative Adversarial Networks (GANs) to generate naturalistic patches that may not attract human attention. However, such app… ▽ More Patch-based adversarial attacks were proven to compromise the robustness and reliability of computer vision systems. However, their conspicuous and easily detectable nature challenge their practicality in real-world setting. To address this, recent work has proposed using Generative Adversarial Networks (GANs) to generate naturalistic patches that may not attract human attention. However, such approaches suffer from a limited latent space making it challenging to produce a patch that is efficient, stealthy, and robust to multiple real-world transformations. This paper introduces a novel approach that produces a Dynamic Adversarial Patch (DAP) designed to overcome these limitations. DAP maintains a naturalistic appearance while optimizing attack efficiency and robustness to real-world transformations. The approach involves redefining the optimization problem and introducing a novel objective function that incorporates a similarity metric to guide the patch's creation. Unlike GAN-based techniques, the DAP directly modifies pixel values within the patch, providing increased flexibility and adaptability to multiple transformations. Furthermore, most clothing-based physical attacks assume static objects and ignore the possible transformations caused by non-rigid deformation due to changes in a person's pose. To address this limitation, a 'Creases Transformation' (CT) block is introduced, enhancing the patch's resilience to a variety of real-world distortions. Experimental results demonstrate that the proposed approach outperforms state-of-the-art attacks, achieving a success rate of up to 82.28% in the digital world when targeting the YOLOv7 detector and 65% in the physical world when targeting YOLOv3tiny detector deployed in edge-based smart cameras. △ Less

Submitted 20 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.12949 [pdf, other]

eFAT: Improving the Effectiveness of Fault-Aware Training for Mitigating Permanent Faults in DNN Hardware Accelerators

Authors: Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Fault-Aware Training (FAT) has emerged as a highly effective technique for addressing permanent faults in DNN accelerators, as it offers fault mitigation without significant performance or accuracy loss, specifically at low and moderate fault rates. However, it leads to very high retraining overheads, especially when used for large DNNs designed for complex AI applications. Moreover, as each fabri… ▽ More Fault-Aware Training (FAT) has emerged as a highly effective technique for addressing permanent faults in DNN accelerators, as it offers fault mitigation without significant performance or accuracy loss, specifically at low and moderate fault rates. However, it leads to very high retraining overheads, especially when used for large DNNs designed for complex AI applications. Moreover, as each fabricated chip can have a distinct fault pattern, FAT is required to be performed for each faulty chip individually, considering its unique fault map, which further aggravates the problem. To reduce the overheads of FAT while maintaining its benefits, we propose (1) the concepts of resilience-driven retraining amount selection, and (2) resilience-driven grouping and fusion of multiple fault maps (belonging to different chips) to perform consolidated retraining for a group of faulty chips. To realize these concepts, in this work, we present a novel framework, eFAT, that computes the resilience of a given DNN to faults at different fault rates and with different levels of retraining, and it uses that knowledge to build a resilience map given a user-defined accuracy constraint. Then, it uses the resilience map to compute the amount of retraining required for each chip, considering its unique fault map. Afterward, it performs resilience and reward-driven grouping and fusion of fault maps to further reduce the number of retraining iterations required for tuning the given DNN for the given set of faulty chips. We demonstrate the effectiveness of our framework for a systolic array-based DNN accelerator experiencing permanent faults in the computational array. Our extensive results for numerous chips show that the proposed technique significantly reduces the retraining cost when used for tuning a DNN for multiple faulty chips. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 8 pages, 13 figures

arXiv:2304.04041 [pdf, other]

doi 10.3389/fnins.2023.1159440

RescueSNN: Enabling Reliable Executions on Spiking Neural Network Accelerators under Permanent Faults

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: To maximize the performance and energy efficiency of Spiking Neural Network (SNN) processing on resource-constrained embedded systems, specialized hardware accelerators/chips are employed. However, these SNN chips may suffer from permanent faults which can affect the functionality of weight memory and neuron behavior, thereby causing potentially significant accuracy degradation and system malfunct… ▽ More To maximize the performance and energy efficiency of Spiking Neural Network (SNN) processing on resource-constrained embedded systems, specialized hardware accelerators/chips are employed. However, these SNN chips may suffer from permanent faults which can affect the functionality of weight memory and neuron behavior, thereby causing potentially significant accuracy degradation and system malfunctioning. Such permanent faults may come from manufacturing defects during the fabrication process, and/or from device/transistor damages (e.g., due to wear out) during the run-time operation. However, the impact of permanent faults in SNN chips and the respective mitigation techniques have not been thoroughly investigated yet. Toward this, we propose RescueSNN, a novel methodology to mitigate permanent faults in the compute engine of SNN chips without requiring additional retraining, thereby significantly cutting down the design time and retraining costs, while maintaining the throughput and quality. The key ideas of our RescueSNN methodology are (1) analyzing the characteristics of SNN under permanent faults; (2) leveraging this analysis to improve the SNN fault-tolerance through effective fault-aware mapping (FAM); and (3) devising lightweight hardware enhancements to support FAM. Our FAM technique leverages the fault map of SNN compute engine for (i) minimizing weight corruption when mapping weight bits on the faulty memory cells, and (ii) selectively employing faulty neurons that do not cause significant accuracy degradation to maintain accuracy and throughput, while considering the SNN operations and processing dataflow. The experimental results show that our RescueSNN improves accuracy by up to 80% while maintaining the throughput reduction below 25% in high fault rate (e.g., 0.5 of the potential fault locations), as compared to running SNNs on the faulty chip without mitigation. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: Accepted for publication at Frontiers in Neuroscience - Section Neuromorphic Engineering

arXiv:2304.04039 [pdf, other]

doi 10.3389/fnins.2022.937782

EnforceSNN: Enabling Resilient and Energy-Efficient Spiking Neural Network Inference considering Approximate DRAMs for Embedded Systems

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Spiking Neural Networks (SNNs) have shown capabilities of achieving high accuracy under unsupervised settings and low operational power/energy due to their bio-plausible computations. Previous studies identified that DRAM-based off-chip memory accesses dominate the energy consumption of SNN processing. However, state-of-the-art works do not optimize the DRAM energy-per-access, thereby hindering th… ▽ More Spiking Neural Networks (SNNs) have shown capabilities of achieving high accuracy under unsupervised settings and low operational power/energy due to their bio-plausible computations. Previous studies identified that DRAM-based off-chip memory accesses dominate the energy consumption of SNN processing. However, state-of-the-art works do not optimize the DRAM energy-per-access, thereby hindering the SNN-based systems from achieving further energy efficiency gains. To substantially reduce the DRAM energy-per-access, an effective solution is to decrease the DRAM supply voltage, but it may lead to errors in DRAM cells (i.e., so-called approximate DRAM). Towards this, we propose \textit{EnforceSNN}, a novel design framework that provides a solution for resilient and energy-efficient SNN inference using reduced-voltage DRAM for embedded systems. The key mechanisms of our EnforceSNN are: (1) employing quantized weights to reduce the DRAM access energy; (2) devising an efficient DRAM mapping policy to minimize the DRAM energy-per-access; (3) analyzing the SNN error tolerance to understand its accuracy profile considering different bit error rate (BER) values; (4) leveraging the information for developing an efficient fault-aware training (FAT) that considers different BER values and bit error locations in DRAM to improve the SNN error tolerance; and (5) developing an algorithm to select the SNN model that offers good trade-offs among accuracy, memory, and energy consumption. The experimental results show that our EnforceSNN maintains the accuracy (i.e., no accuracy loss for BER less-or-equal 10^-3) as compared to the baseline SNN with accurate DRAM, while achieving up to 84.9\% of DRAM energy saving and up to 4.1x speed-up of DRAM data throughput across different network sizes. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: Accepted for publication at Frontiers in Neuroscience - Section Neuromorphic Engineering

arXiv:2303.14009 [pdf, other]

PoisonedGNN: Backdoor Attack on Graph Neural Networks-based Hardware Security Systems

Authors: Lilas Alrahis, Satwik Patnaik, Muhammad Abdullah Hanif, Muhammad Shafique, Ozgur Sinanoglu

Abstract: Graph neural networks (GNNs) have shown great success in detecting intellectual property (IP) piracy and hardware Trojans (HTs). However, the machine learning community has demonstrated that GNNs are susceptible to data poisoning attacks, which result in GNNs performing abnormally on graphs with pre-defined backdoor triggers (realized using crafted subgraphs). Thus, it is imperative to ensure that… ▽ More Graph neural networks (GNNs) have shown great success in detecting intellectual property (IP) piracy and hardware Trojans (HTs). However, the machine learning community has demonstrated that GNNs are susceptible to data poisoning attacks, which result in GNNs performing abnormally on graphs with pre-defined backdoor triggers (realized using crafted subgraphs). Thus, it is imperative to ensure that the adoption of GNNs should not introduce security vulnerabilities in critical security frameworks. Existing backdoor attacks on GNNs generate random subgraphs with specific sizes/densities to act as backdoor triggers. However, for Boolean circuits, backdoor triggers cannot be randomized since the added structures should not affect the functionality of a design. We explore this threat and develop PoisonedGNN as the first backdoor attack on GNNs in the context of hardware design. We design and inject backdoor triggers into the register-transfer- or the gate-level representation of a given design without affecting the functionality to evade some GNN-based detection procedures. To demonstrate the effectiveness of PoisonedGNN, we consider two case studies: (i) Hiding HTs and (ii) IP piracy. Our experiments on TrustHub datasets demonstrate that PoisonedGNN can hide HTs and IP piracy from advanced GNN-based detection platforms with an attack success rate of up to 100%. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: This manuscript is currently under review at IEEE Transactions on Computers

arXiv:2303.02495 [pdf, other]

scaleTRIM: Scalable TRuncation-Based Integer Approximate Multiplier with Linearization and Compensation

Authors: Ebrahim Farahmand, Ali Mahani, Behnam Ghavami, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on th… ▽ More Approximate computing (AC) has become a prominent solution to improve the performance, area, and power/energy efficiency of a digital design at the cost of output accuracy. We propose a novel scalable approximate multiplier that utilizes a lookup table-based compensation unit. To improve energy-efficiency, input operands are truncated to a reduced bitwidth representation (e.g., h bits) based on their leading one positions. Then, a curve-fitting method is employed to map the product term to a linear function, and a piecewise constant error-correction term is used to reduce the approximation error. For computing the piecewise constant error-compensation term, we partition the function space into M segments and compute the compensation factor for each segment by averaging the errors in the segment. The multiplier supports various degrees of truncation and error-compensation to exploit accuracy-efficiency trade-off. The proposed approximate multiplier offers better error metrics such as mean and standard deviation of absolute relative error (MARED and StdARED) compare to a state-of-the-art integer approximate multiplier. The proposed approximate multiplier improves the MARED and StdARED by about 38% and 32% when its energy consumption is about equal to the state-of-the-art approximate multiplier. Moreover, the performance of the proposed approximate multiplier is evaluated in image classification applications using a Deep Neural Network (DNN). The results indicate that the degradation of DNN accuracy is negligible especially due to the compensation properties of our approximate multiplier. △ Less

Submitted 4 May, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2303.01819 [pdf, other]

Exploring Machine Learning Privacy/Utility trade-off from a hyperparameters Lens

Authors: Ayoub Arous, Amira Guesmi, Muhammad Abdullah Hanif, Ihsen Alouani, Muhammad Shafique

Abstract: Machine Learning (ML) architectures have been applied to several applications that involve sensitive data, where a guarantee of users' data privacy is required. Differentially Private Stochastic Gradient Descent (DPSGD) is the state-of-the-art method to train privacy-preserving models. However, DPSGD comes at a considerable accuracy loss leading to sub-optimal privacy/utility trade-offs. Towards i… ▽ More Machine Learning (ML) architectures have been applied to several applications that involve sensitive data, where a guarantee of users' data privacy is required. Differentially Private Stochastic Gradient Descent (DPSGD) is the state-of-the-art method to train privacy-preserving models. However, DPSGD comes at a considerable accuracy loss leading to sub-optimal privacy/utility trade-offs. Towards investigating new ground for better privacy-utility trade-off, this work questions; (i) if models' hyperparameters have any inherent impact on ML models' privacy-preserving properties, and (ii) if models' hyperparameters have any impact on the privacy/utility trade-off of differentially private models. We propose a comprehensive design space exploration of different hyperparameters such as the choice of activation functions, the learning rate and the use of batch normalization. Interestingly, we found that utility can be improved by using Bounded RELU as activation functions with the same privacy-preserving characteristics. With a drop-in replacement of the activation function, we achieve new state-of-the-art accuracy on MNIST (96.02\%), FashionMnist (84.76\%), and CIFAR-10 (44.42\%) without any modification of the learning procedure fundamentals of DPSGD. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2303.01351 [pdf, other]

APARATE: Adaptive Adversarial Patch for CNN-based Monocular Depth Estimation for Autonomous Navigation

Authors: Amira Guesmi, Muhammad Abdullah Hanif, Ihsen Alouani, Muhammad Shafique

Abstract: In recent times, monocular depth estimation (MDE) has experienced significant advancements in performance, largely attributed to the integration of innovative architectures, i.e., convolutional neural networks (CNNs) and Transformers. Nevertheless, the susceptibility of these models to adversarial attacks has emerged as a noteworthy concern, especially in domains where safety and security are para… ▽ More In recent times, monocular depth estimation (MDE) has experienced significant advancements in performance, largely attributed to the integration of innovative architectures, i.e., convolutional neural networks (CNNs) and Transformers. Nevertheless, the susceptibility of these models to adversarial attacks has emerged as a noteworthy concern, especially in domains where safety and security are paramount. This concern holds particular weight for MDE due to its critical role in applications like autonomous driving and robotic navigation, where accurate scene understanding is pivotal. To assess the vulnerability of CNN-based depth prediction methods, recent work tries to design adversarial patches against MDE. However, the existing approaches fall short of inducing a comprehensive and substantially disruptive impact on the vision system. Instead, their influence is partial and confined to specific local areas. These methods lead to erroneous depth predictions only within the overlapping region with the input image, without considering the characteristics of the target object, such as its size, shape, and position. In this paper, we introduce a novel adversarial patch named APARATE. This patch possesses the ability to selectively undermine MDE in two distinct ways: by distorting the estimated distances or by creating the illusion of an object disappearing from the perspective of the autonomous system. Notably, APARATE is designed to be sensitive to the shape and scale of the target object, and its influence extends beyond immediate proximity. APARATE, results in a mean depth estimation error surpassing $0.5$, significantly impacting as much as $99\%$ of the targeted region when applied to CNN-based MDE models. Furthermore, it yields a significant error of $0.34$ and exerts substantial influence over $94\%$ of the target region in the context of Transformer-based MDE. △ Less

Submitted 20 November, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2303.01338 [pdf, other]

AdvRain: Adversarial Raindrops to Attack Camera-based Smart Vision Systems

Authors: Amira Guesmi, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Vision-based perception modules are increasingly deployed in many applications, especially autonomous vehicles and intelligent robots. These modules are being used to acquire information about the surroundings and identify obstacles. Hence, accurate detection and classification are essential to reach appropriate decisions and take appropriate and safe actions at all times. Current studies have dem… ▽ More Vision-based perception modules are increasingly deployed in many applications, especially autonomous vehicles and intelligent robots. These modules are being used to acquire information about the surroundings and identify obstacles. Hence, accurate detection and classification are essential to reach appropriate decisions and take appropriate and safe actions at all times. Current studies have demonstrated that "printed adversarial attacks", known as physical adversarial attacks, can successfully mislead perception models such as object detectors and image classifiers. However, most of these physical attacks are based on noticeable and eye-catching patterns for generated perturbations making them identifiable/detectable by human eye or in test drives. In this paper, we propose a camera-based inconspicuous adversarial attack (\textbf{AdvRain}) capable of fooling camera-based perception systems over all objects of the same class. Unlike mask based fake-weather attacks that require access to the underlying computing hardware or image memory, our attack is based on emulating the effects of a natural weather condition (i.e., Raindrops) that can be printed on a translucent sticker, which is externally placed over the lens of a camera. To accomplish this, we provide an iterative process based on performing a random search aiming to identify critical positions to make sure that the performed transformation is adversarial for a target classifier. Our transformation is based on blurring predefined parts of the captured image corresponding to the areas covered by the raindrop. We achieve a drop in average model accuracy of more than $45\%$ and $40\%$ on VGG19 for ImageNet and Resnet34 for Caltech-101, respectively, using only $20$ raindrops. △ Less

Submitted 5 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2208.00331 [pdf, other]

CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks

Authors: Muhammad Abdullah Hanif, Giuseppe Maria Sarda, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique

Abstract: In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs) have become ubiquitous due to their state-of-the-art performance in complex real-world applications. The high computational complexity of these networks, which translates to increased energy consumption, is the foremost obstacle towards deploying large DNNs in resource-constrained systems. Fixed-Point (FP) implementations… ▽ More In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs) have become ubiquitous due to their state-of-the-art performance in complex real-world applications. The high computational complexity of these networks, which translates to increased energy consumption, is the foremost obstacle towards deploying large DNNs in resource-constrained systems. Fixed-Point (FP) implementations achieved through post-training quantization are commonly used to curtail the energy consumption of these networks. However, the uniform quantization intervals in FP restrict the bit-width of data structures to large values due to the need to represent most of the numbers with sufficient resolution and avoid high quantization errors. In this paper, we leverage the key insight that (in most of the scenarios) DNN weights and activations are mostly concentrated near zero and only a few of them have large magnitudes. We propose CoNLoCNN, a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting: (1) non-uniform quantization of weights enabling simplification of complex multiplication operations; and (2) correlation between activation values enabling partial compensation of quantization errors at low cost without any run-time overheads. To significantly benefit from non-uniform quantization, we also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights while ensuring direct use of the encoded weight for processing using a novel multiply-and-accumulate (MAC) unit design. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 8 pages, 15 figures, 2 tables

arXiv:2204.09514 [pdf, other]

doi 10.1109/VTS52500.2021.9794253

Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems

Authors: Shail Dave, Alberto Marchisio, Muhammad Abdullah Hanif, Amira Guesmi, Aviral Shrivastava, Ihsen Alouani, Muhammad Shafique

Abstract: The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all real-world applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Privacy conc… ▽ More The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all real-world applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Privacy concerns are also becoming a first-order issue. This article summarizes the main challenges in agile development of efficient, reliable and secure ML systems, and then presents an outline of an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: Appears at 40th IEEE VLSI Test Symposium (VTS 2022), 14 pages

arXiv:2203.05523 [pdf, other]

doi 10.1145/3489517.3530657

SoftSNN: Low-Cost Fault Tolerance for Spiking Neural Network Accelerators under Soft Errors

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Specialized hardware accelerators have been designed and employed to maximize the performance efficiency of Spiking Neural Networks (SNNs). However, such accelerators are vulnerable to transient faults (i.e., soft errors), which occur due to high-energy particle strikes, and manifest as bit flips at the hardware layer. These errors can change the weight values and neuron operations in the compute… ▽ More Specialized hardware accelerators have been designed and employed to maximize the performance efficiency of Spiking Neural Networks (SNNs). However, such accelerators are vulnerable to transient faults (i.e., soft errors), which occur due to high-energy particle strikes, and manifest as bit flips at the hardware layer. These errors can change the weight values and neuron operations in the compute engine of SNN accelerators, thereby leading to incorrect outputs and accuracy degradation. However, the impact of soft errors in the compute engine and the respective mitigation techniques have not been thoroughly studied yet for SNNs. A potential solution is employing redundant executions (re-execution) for ensuring correct outputs, but it leads to huge latency and energy overheads. Toward this, we propose SoftSNN, a novel methodology to mitigate soft errors in the weight registers (synapses) and neurons of SNN accelerators without re-execution, thereby maintaining the accuracy with low latency and energy overheads. Our SoftSNN methodology employs the following key steps: (1) analyzing the SNN characteristics under soft errors to identify faulty weights and neuron operations, which are required for recognizing faulty SNN behavior; (2) a Bound-and-Protect technique that leverages this analysis to improve the SNN fault tolerance by bounding the weight values and protecting the neurons from faulty operations; and (3) devising lightweight hardware enhancements for the neural hardware accelerator to efficiently support the proposed technique. The experimental results show that, for a 900-neuron network with even a high fault rate, our SoftSNN maintains the accuracy degradation below 3%, while reducing latency and energy by up to 3x and 2.3x respectively, as compared to the re-execution technique. △ Less

Submitted 11 March, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: To appear at the 59th IEEE/ACM Design Automation Conference (DAC), July 2022, San Francisco, CA, USA

arXiv:2111.07062 [pdf, other]

UNTANGLE: Unlocking Routing and Logic Obfuscation Using Graph Neural Networks-based Link Prediction

Authors: Lilas Alrahis, Satwik Patnaik, Muhammad Abdullah Hanif, Muhammad Shafique, Ozgur Sinanoglu

Abstract: Logic locking aims to prevent intellectual property (IP) piracy and unauthorized overproduction of integrated circuits (ICs). However, initial logic locking techniques were vulnerable to the Boolean satisfiability (SAT)-based attacks. In response, researchers proposed various SAT-resistant locking techniques such as point function-based locking and symmetric interconnection (SAT-hard) obfuscation.… ▽ More Logic locking aims to prevent intellectual property (IP) piracy and unauthorized overproduction of integrated circuits (ICs). However, initial logic locking techniques were vulnerable to the Boolean satisfiability (SAT)-based attacks. In response, researchers proposed various SAT-resistant locking techniques such as point function-based locking and symmetric interconnection (SAT-hard) obfuscation. We focus on the latter since point function-based locking suffers from various structural vulnerabilities. The SAT-hard logic locking technique, InterLock [1], achieves a unified logic and routing obfuscation that thwarts state-of-the-art attacks on logic locking. In this work, we propose a novel link prediction-based attack, UNTANGLE, that successfully breaks InterLock in an oracle-less setting without having access to an activated IC (oracle). Since InterLock hides selected timing paths in key-controlled routing blocks, UNTANGLE reveals the gates and interconnections hidden in the routing blocks upon formulating this task as a link prediction problem. The intuition behind our approach is that ICs contain a large amount of repetition and reuse cores. Hence, UNTANGLE can infer the hidden timing paths by learning the composition of gates in the observed locked netlist or a circuit library leveraging graph neural networks. We show that circuits withstanding SAT-based and other attacks can be unlocked in seconds with 100% precision using UNTANGLE in an oracle-less setting. UNTANGLE is a generic attack platform (which we also open source [2]) that applies to multiplexer (MUX)-based obfuscation, as demonstrated through our experiments on ISCAS-85 and ITC-99 benchmarks locked using InterLock and random MUX-based locking. △ Less

Submitted 13 November, 2021; originally announced November 2021.

Comments: Published in 2021 International Conference On Computer-Aided Design (ICCAD)

arXiv:2109.09829 [pdf, other]

doi 10.1109/ICCAD51958.2021.9643539

Towards Energy-Efficient and Secure Edge AI: A Cross-Layer Framework

Authors: Muhammad Shafique, Alberto Marchisio, Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif

Abstract: The security and privacy concerns along with the amount of data that is required to be processed on regular basis has pushed processing to the edge of the computing systems. Deploying advanced Neural Networks (NN), such as deep neural networks (DNNs) and spiking neural networks (SNNs), that offer state-of-the-art results on resource-constrained edge devices is challenging due to the stringent memo… ▽ More The security and privacy concerns along with the amount of data that is required to be processed on regular basis has pushed processing to the edge of the computing systems. Deploying advanced Neural Networks (NN), such as deep neural networks (DNNs) and spiking neural networks (SNNs), that offer state-of-the-art results on resource-constrained edge devices is challenging due to the stringent memory and power/energy constraints. Moreover, these systems are required to maintain correct functionality under diverse security and reliability threats. This paper first discusses existing approaches to address energy efficiency, reliability, and security issues at different system layers, i.e., hardware (HW) and software (SW). Afterward, we discuss how to further improve the performance (latency) and the energy efficiency of Edge AI systems through HW/SW-level optimizations, such as pruning, quantization, and approximation. To address reliability threats (like permanent and transient faults), we highlight cost-effective mitigation techniques, like fault-aware training and mapping. Moreover, we briefly discuss effective detection and protection techniques to address security threats (like model and data corruption). Towards the end, we discuss how these techniques can be combined in an integrated cross-layer framework for realizing robust and energy-efficient Edge AI systems. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: To appear at the 40th IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2021, Virtual Event

arXiv:2109.01239 [pdf, other]

doi 10.1109/TVT.2023.3263791

A Max-Min Task Offloading Algorithm for Mobile Edge Computing Using Non-Orthogonal Multiple Access

Authors: Vaibhav Kumar, Muhammad Fainan Hanif, Markku Juntti, Le-Nam Tran

Abstract: To mitigate computational power gap between the network core and edges, mobile edge computing (MEC) is poised to play a fundamental role in future generations of wireless networks. In this letter, we consider a non-orthogonal multiple access (NOMA) transmission model to maximize the worst task to be offloaded among all users to the network edge server. A provably convergent and efficient algorithm… ▽ More To mitigate computational power gap between the network core and edges, mobile edge computing (MEC) is poised to play a fundamental role in future generations of wireless networks. In this letter, we consider a non-orthogonal multiple access (NOMA) transmission model to maximize the worst task to be offloaded among all users to the network edge server. A provably convergent and efficient algorithm is developed to solve the considered non-convex optimization problem for maximizing the minimum number of offloaded bits in a multi-user NOMAMEC system. Compared to the approach of optimized orthogonal multiple access (OMA), for given MEC delay, power and energy limits, the NOMA-based system considerably outperforms its OMA-based counterpart in MEC settings. Numerical results demonstrate that the proposed algorithm for NOMA-based MEC is particularly useful for delay sensitive applications. △ Less

Submitted 12 October, 2023; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: 5 pages, 5 figures

Journal ref: IEEE Transactions on Vehicular Technology, vol. 72, no. 9, pp. 12332-12337, Sept. 2023

arXiv:2108.10271 [pdf, other]

doi 10.1109/ICCAD51958.2021.9643524

ReSpawn: Energy-Efficient Fault-Tolerance for Spiking Neural Networks considering Unreliable Memories

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Spiking neural networks (SNNs) have shown a potential for having low energy with unsupervised learning capabilities due to their biologically-inspired computation. However, they may suffer from accuracy degradation if their processing is performed under the presence of hardware-induced faults in memories, which can come from manufacturing defects or voltage-induced approximation errors. Since rece… ▽ More Spiking neural networks (SNNs) have shown a potential for having low energy with unsupervised learning capabilities due to their biologically-inspired computation. However, they may suffer from accuracy degradation if their processing is performed under the presence of hardware-induced faults in memories, which can come from manufacturing defects or voltage-induced approximation errors. Since recent works still focus on the fault-modeling and random fault injection in SNNs, the impact of memory faults in SNN hardware architectures on accuracy and the respective fault-mitigation techniques are not thoroughly explored. Toward this, we propose ReSpawn, a novel framework for mitigating the negative impacts of faults in both the off-chip and on-chip memories for resilient and energy-efficient SNNs. The key mechanisms of ReSpawn are: (1) analyzing the fault tolerance of SNNs; and (2) improving the SNN fault tolerance through (a) fault-aware mapping (FAM) in memories, and (b) fault-aware training-and-mapping (FATM). If the training dataset is not fully available, FAM is employed through efficient bit-shuffling techniques that place the significant bits on the non-faulty memory cells and the insignificant bits on the faulty ones, while minimizing the memory access energy. Meanwhile, if the training dataset is fully available, FATM is employed by considering the faulty memory cells in the data mapping and training processes. The experimental results show that, compared to the baseline SNN without fault-mitigation techniques, ReSpawn with a fault-aware mapping scheme improves the accuracy by up to 70% for a network with 900 neurons without retraining. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: To appear at the 40th IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 2021, Virtual Event

arXiv:2106.08800 [pdf, other]

Design and Analysis of High Performance Heterogeneous Block-based Approximate Adders

Authors: Ebrahim Farahmand, Ali Mahani, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Approximate computing is an emerging paradigm to improve the power and performance efficiency of error-resilient applications. As adders are one of the key components in almost all processing systems, a significant amount of research has been carried out towards designing approximate adders that can offer better efficiency than conventional designs, however, at the cost of some accuracy loss. In t… ▽ More Approximate computing is an emerging paradigm to improve the power and performance efficiency of error-resilient applications. As adders are one of the key components in almost all processing systems, a significant amount of research has been carried out towards designing approximate adders that can offer better efficiency than conventional designs, however, at the cost of some accuracy loss. In this paper, we highlight a new class of energy-efficient approximate adders, namely Heterogeneous Block-based Approximate Adders (HBAA), and propose a generic configurable adder model that can be configured to represent a particular HBAA configuration. An HBAA, in general, is composed of heterogeneous sub-adder blocks of equal length, where each sub-adder can be an approximate sub-adder and have a different configuration. The sub-adders are mainly approximated through inexact logic and carry truncation. Compared to the existing design space, HBAAs provide additional design points that fall on the Pareto-front and offer a better quality-efficiency trade-off in certain scenarios. Furthermore, to enable efficient design space exploration based on user-defined constraints, we propose an analytical model to efficiently evaluate the Probability Mass Function (PMF) of approximation error and other error metrics, such as Mean Error Distance (MED), Normalized Mean Error Distance (NMED) and Error Rate (ER) of HBAAs. The results show that HBAA configurations can provide around 15% reduction in area and up to 17% reduction in energy compared to state-of-the-art approximate adders. △ Less

Submitted 14 September, 2023; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted for publication in ACM Transactions on Embedded Computing Systems (TECS)

arXiv:2105.12374 [pdf, other]

Continual Learning for Real-World Autonomous Systems: Algorithms, Challenges and Frameworks

Authors: Khadija Shaheen, Muhammad Abdullah Hanif, Osman Hasan, Muhammad Shafique

Abstract: Continual learning is essential for all real-world applications, as frozen pre-trained models cannot effectively deal with non-stationary data distributions. The purpose of this study is to review the state-of-the-art methods that allow continuous learning of computational models over time. We primarily focus on the learning algorithms that perform continuous learning in an online fashion from con… ▽ More Continual learning is essential for all real-world applications, as frozen pre-trained models cannot effectively deal with non-stationary data distributions. The purpose of this study is to review the state-of-the-art methods that allow continuous learning of computational models over time. We primarily focus on the learning algorithms that perform continuous learning in an online fashion from considerably large (or infinite) sequential data and require substantially low computational and memory resources. We critically analyze the key challenges associated with continual learning for autonomous real-world systems and compare current methods in terms of computations, memory, and network/model complexity. We also briefly describe the implementations of continuous learning algorithms under three main autonomous systems, i.e., self-driving vehicles, unmanned aerial vehicles, and urban robots. The learning methods of these autonomous systems and their strengths and limitations are extensively explored in this article. △ Less

Submitted 24 February, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.03251 [pdf, other]

Exploiting Vulnerabilities in Deep Neural Networks: Adversarial and Fault-Injection Attacks

Authors: Faiq Khalid, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: From tiny pacemaker chips to aircraft collision avoidance systems, the state-of-the-art Cyber-Physical Systems (CPS) have increasingly started to rely on Deep Neural Networks (DNNs). However, as concluded in various studies, DNNs are highly susceptible to security threats, including adversarial attacks. In this paper, we first discuss different vulnerabilities that can be exploited for generating… ▽ More From tiny pacemaker chips to aircraft collision avoidance systems, the state-of-the-art Cyber-Physical Systems (CPS) have increasingly started to rely on Deep Neural Networks (DNNs). However, as concluded in various studies, DNNs are highly susceptible to security threats, including adversarial attacks. In this paper, we first discuss different vulnerabilities that can be exploited for generating security attacks for neural network-based systems. We then provide an overview of existing adversarial and fault-injection-based attacks on DNNs. We also present a brief analysis to highlight different challenges in the practical implementation of adversarial attacks. Finally, we also discuss various prospective ways to develop robust DNN-based systems that are resilient to adversarial and fault-injection attacks. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: CYBER 2020, The Fifth International Conference on Cyber-Technologies and Cyber-Systems

arXiv:2103.00421 [pdf, other]

doi 10.1109/DAC18074.2021.9586332

SparkXD: A Framework for Resilient and Energy-Efficient Spiking Neural Network Inference using Approximate DRAM

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Spiking Neural Networks (SNNs) have the potential for achieving low energy consumption due to their biologically sparse computation. Several studies have shown that the off-chip memory (DRAM) accesses are the most energy-consuming operations in SNN processing. However, state-of-the-art in SNN systems do not optimize the DRAM energy-per-access, thereby hindering achieving high energy-efficiency. To… ▽ More Spiking Neural Networks (SNNs) have the potential for achieving low energy consumption due to their biologically sparse computation. Several studies have shown that the off-chip memory (DRAM) accesses are the most energy-consuming operations in SNN processing. However, state-of-the-art in SNN systems do not optimize the DRAM energy-per-access, thereby hindering achieving high energy-efficiency. To substantially minimize the DRAM energy-per-access, a key knob is to reduce the DRAM supply voltage but this may lead to DRAM errors (i.e., the so-called approximate DRAM). Towards this, we propose SparkXD, a novel framework that provides a comprehensive conjoint solution for resilient and energy-efficient SNN inference using low-power DRAMs subjected to voltage-induced errors. The key mechanisms of SparkXD are: (1) improving the SNN error tolerance through fault-aware training that considers bit errors from approximate DRAM, (2) analyzing the error tolerance of the improved SNN model to find the maximum tolerable bit error rate (BER) that meets the targeted accuracy constraint, and (3) energy-efficient DRAM data mapping for the resilient SNN model that maps the weights in the appropriate DRAM location to minimize the DRAM access energy. Through these mechanisms, SparkXD mitigates the negative impact of DRAM (approximation) errors, and provides the required accuracy. The experimental results show that, for a target accuracy within 1% of the baseline design (i.e., SNN without DRAM errors), SparkXD reduces the DRAM energy by ca. 40% on average across different network sizes. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: To appear at the 58th IEEE/ACM Design Automation Conference (DAC), December 2021, San Francisco, CA, USA

arXiv:2102.04642 [pdf, other]

Frequency-Shift Chirp Spread Spectrum Communications with Index Modulation

Authors: Muhammad Hanif, Ha H. Nguyen

Abstract: This paper introduces a novel frequency-shift chirp spread spectrum (FSCSS) system with index modulation (IM). By using combinations of orthogonal chirp signals for message representation, the proposed FSCSS-IM system is very flexible to design and can achieve much higher data rates than the conventional FSCSS system under the same bandwidth. The paper presents optimal detection algorithms, both c… ▽ More This paper introduces a novel frequency-shift chirp spread spectrum (FSCSS) system with index modulation (IM). By using combinations of orthogonal chirp signals for message representation, the proposed FSCSS-IM system is very flexible to design and can achieve much higher data rates than the conventional FSCSS system under the same bandwidth. The paper presents optimal detection algorithms, both coherently and non-coherently, for the proposed FSCSS-IM system. Furthermore, a low-complexity non-coherent detection algorithm is also developed to reduce the computational complexity of the receiver, which is shown to achieve near-optimal performance. Results are presented to demonstrate that the proposed system, while enabling much higher data rates, enjoys similar bit-error performance as that of the conventional FSCSS system. △ Less

Submitted 19 May, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: The first version of this paper was submitted to IEEE Internet of Things Journal on July 14, 2020. The revised version was submitted on October 28, 2020, accepted May 15, 2021. The main idea and results of this work are documented in United States Patent #10,778,282, Sept. 2020. (https://researchers.usask.ca/ha-nguyen/documents/patents/us10778282b1-hanif.pdf)

arXiv:2101.12351 [pdf, other]

DNN-Life: An Energy-Efficient Aging Mitigation Framework for Improving the Lifetime of On-Chip Weight Memories in Deep Neural Network Hardware Architectures

Authors: Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Negative Biased Temperature Instability (NBTI)-induced aging is one of the critical reliability threats in nano-scale devices. This paper makes the first attempt to study the NBTI aging in the on-chip weight memories of deep neural network (DNN) hardware accelerators, subjected to complex DNN workloads. We propose DNN-Life, a specialized aging analysis and mitigation framework for DNNs, which join… ▽ More Negative Biased Temperature Instability (NBTI)-induced aging is one of the critical reliability threats in nano-scale devices. This paper makes the first attempt to study the NBTI aging in the on-chip weight memories of deep neural network (DNN) hardware accelerators, subjected to complex DNN workloads. We propose DNN-Life, a specialized aging analysis and mitigation framework for DNNs, which jointly exploits hardware- and software-level knowledge to improve the lifetime of a DNN weight memory with reduced energy overhead. At the software-level, we analyze the effects of different DNN quantization methods on the distribution of the bits of weight values. Based on the insights gained from this analysis, we propose a micro-architecture that employs low-cost memory-write (and read) transducers to achieve an optimal duty-cycle at run time in the weight memory cells, thereby balancing their aging. As a result, our DNN-Life framework enables efficient aging mitigation of weight memory of the given DNN hardware at minimal energy overhead during the inference process. △ Less

Submitted 28 January, 2021; originally announced January 2021.

arXiv:2012.05948 [pdf, other]

GNNUnlock: Graph Neural Networks-based Oracle-less Unlocking Scheme for Provably Secure Logic Locking

Authors: Lilas Alrahis, Satwik Patnaik, Faiq Khalid, Muhammad Abdullah Hanif, Hani Saleh, Muhammad Shafique, Ozgur Sinanoglu

Abstract: In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, witho… ▽ More In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, without requiring an oracle. This approach fits perfectly with the targeted problem since a circuit is a graph with an inherent structure and the protection logic is a sub-graph of nodes (gates) with specific and common characteristics. GNNs are powerful in capturing the nodes' neighborhood properties, facilitating the detection of the protection logic. To rectify any misclassifications induced by the GNN, we additionally propose a connectivity analysis-based post-processing algorithm to successfully remove the predicted protection logic, thereby retrieving the original design. Our extensive experimental evaluation demonstrates that GNNUnlock is 99.24%-100% successful in breaking various benchmarks locked using stripped-functionality logic locking, tenacious and traceless logic locking, and Anti-SAT. Our proposed post-processing enhances the detection accuracy, reaching 100% for all of our tested locked benchmarks. Analysis of the results corroborates that GNNUnlock is powerful enough to break the considered schemes under different parameters, synthesis settings, and technology nodes. The evaluation further shows that GNNUnlock successfully breaks corner cases where even the most advanced state-of-the-art attacks fail. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 6 pages, 4 figures, 6 tables, conference

arXiv:2010.05754 [pdf, other]

doi 10.1109/TCAD.2020.3030610

DESCNet: Developing Efficient Scratchpad Memories for Capsule Network Hardware

Authors: Alberto Marchisio, Vojtech Mrazek, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently proposed by the Google Brain's team, the Capsule Networks (CapsNets) have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules and preserving the spatial relationship between different objects. However, they pose signi… ▽ More Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently proposed by the Google Brain's team, the Capsule Networks (CapsNets) have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules and preserving the spatial relationship between different objects. However, they pose significantly high computational and memory requirements, making their energy-efficient inference a challenging task. This paper provides, for the first time, an in-depth analysis to highlight the design and management related challenges for the (on-chip) memories deployed in hardware accelerators executing fast CapsNets inference. To enable an efficient design, we propose an application-specific memory hierarchy, which minimizes the off-chip memory accesses, while efficiently feeding the data to the hardware accelerator. We analyze the corresponding on-chip memory requirements and leverage it to propose a novel methodology to explore different scratchpad memory designs and their energy/area trade-offs. Afterwards, an application-specific power-gating technique is proposed to further reduce the energy consumption, depending upon the utilization across different operations of the CapsNets. Our results for a selected Pareto-optimal solution demonstrate no performance loss and an energy reduction of 79% for the complete accelerator, including computational units and memories, when compared to a state-of-the-art design executing Google's CapsNet model for the MNIST dataset. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: Accepted for publication at the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

arXiv:2008.01191 [pdf, other]

Deep Learning Techniques for Future Intelligent Cross-Media Retrieval

Authors: Sadaqat ur Rehman, Muhammad Waqas, Shanshan Tu, Anis Koubaa, Obaid ur Rehman, Jawad Ahmad, Muhammad Hanif, Zhu Han

Abstract: With the advancement in technology and the expansion of broadcasting, cross-media retrieval has gained much attention. It plays a significant role in big data applications and consists in searching and finding data from different types of media. In this paper, we provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches in solving cross-media retrieval, nam… ▽ More With the advancement in technology and the expansion of broadcasting, cross-media retrieval has gained much attention. It plays a significant role in big data applications and consists in searching and finding data from different types of media. In this paper, we provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches in solving cross-media retrieval, namely: representation, alignment, and translation. These challenges are evaluated on deep learning (DL) based methods, which are categorized into four main groups: 1) unsupervised methods, 2) supervised methods, 3) pairwise based methods, and 4) rank based methods. Then, we present some well-known cross-media datasets used for retrieval, considering the importance of these datasets in the context in of deep learning based cross-media retrieval approaches. Moreover, we also present an extensive review of the state-of-the-art problems and its corresponding solutions for encouraging deep learning in cross-media retrieval. The fundamental objective of this work is to exploit Deep Neural Networks (DNNs) for bridging the "media gap", and provide researchers and developers with a better understanding of the underlying problems and the potential solutions of deep learning assisted cross-media retrieval. To the best of our knowledge, this is the first comprehensive survey to address cross-media retrieval under deep learning methods. △ Less

Submitted 21 July, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:1804.09539 by other authors

arXiv:2004.10341 [pdf, other]

doi 10.1109/DAC18072.2020.9218672

DRMap: A Generic DRAM Data Mapping Policy for Energy-Efficient Processing of Convolutional Neural Networks

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy. Recently, some DRAM architectures have been proposed to exploit subarray-level parallelism for decreasing the access latency. Towards this, we present a design space exploration methodology to study the l… ▽ More Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy. Recently, some DRAM architectures have been proposed to exploit subarray-level parallelism for decreasing the access latency. Towards this, we present a design space exploration methodology to study the latency and energy of different mapping policies on different DRAM architectures, and identify the pareto-optimal design choices. The results show that the energy-efficient DRAM accesses can be achieved by a mapping policy that orderly prioritizes to maximize the row buffer hits, bank- and subarray-level parallelism. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: To appear at the 57th Design Automation Conference (DAC), July 2020, San Francisco, CA, USA

arXiv:1912.01978 [pdf, other]

FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks

Authors: Mahum Naseer, Mishal Fatima Minhas, Faiq Khalid, Muhammad Abdullah Hanif, Osman Hasan, Muhammad Shafique

Abstract: With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on "known inputs", these NNs can fail absurdly on the "unseen inputs", especially if these real-time inputs deviate from the training dataset distributions, or contain certain… ▽ More With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems. However, despite their impressive performance on "known inputs", these NNs can fail absurdly on the "unseen inputs", especially if these real-time inputs deviate from the training dataset distributions, or contain certain types of input noise. This indicates the low noise tolerance of NNs, which is a major reason for the recent increase of adversarial attacks. This is a serious concern, particularly for safety-critical applications, where inaccurate results lead to dire consequences. We propose a novel methodology that leverages model checking for the Formal Analysis of Neural Network (FANNet) under different input noise ranges. Our methodology allows us to rigorously analyze the noise tolerance of NNs, their input node sensitivity, and the effects of training bias on their performance, e.g., in terms of classification accuracy. For evaluation, we use a feed-forward fully-connected NN architecture trained for the Leukemia classification. Our experimental results show $\pm 11\%$ noise tolerance for the given trained network, identify the most sensitive input nodes, and confirm the biasness of the available training dataset. △ Less

Submitted 14 May, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, France

arXiv:1912.00941 [pdf, other]

FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation

Authors: Le-Ha Hoang, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Deep Neural Networks (DNNs) are widely being adopted for safety-critical applications, e.g., healthcare and autonomous driving. Inherently, they are considered to be highly error-tolerant. However, recent studies have shown that hardware faults that impact the parameters of a DNN (e.g., weights) can have drastic impacts on its classification accuracy. In this paper, we perform a comprehensive erro… ▽ More Deep Neural Networks (DNNs) are widely being adopted for safety-critical applications, e.g., healthcare and autonomous driving. Inherently, they are considered to be highly error-tolerant. However, recent studies have shown that hardware faults that impact the parameters of a DNN (e.g., weights) can have drastic impacts on its classification accuracy. In this paper, we perform a comprehensive error resilience analysis of DNNs subjected to hardware faults (e.g., permanent faults) in the weight memory. The outcome of this analysis is leveraged to propose a novel error mitigation technique which squashes the high-intensity faulty activation values to alleviate their impact. We achieve this by replacing the unbounded activation functions with their clipped versions. We also present a method to systematically define the clipping values of the activation functions that result in increased resilience of the networks against faults. We evaluate our technique on the AlexNet and the VGG-16 DNNs trained for the CIFAR-10 dataset. The experimental results show that our mitigation technique significantly improves the resilience of the DNNs to faults. For example, the proposed technique offers on average 68.92% improvement in the classification accuracy of resilience-optimized VGG-16 model at 1e-5 fault rate, when compared to the base network without any fault mitigation. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: The 23rd Design, Automation and Test in Europe (DATE 2020)

arXiv:1912.00700 [pdf, other]

doi 10.23919/DATE48585.2020.9116393

ReD-CaNe: A Systematic Methodology for Resilience Analysis and Design of Capsule Networks under Approximations

Authors: Alberto Marchisio, Vojtech Mrazek, Muhammad Abudllah Hanif, Muhammad Shafique

Abstract: Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the anal… ▽ More Recent advances in Capsule Networks (CapsNets) have shown their superior learning capability, compared to the traditional Convolutional Neural Networks (CNNs). However, the extremely high complexity of CapsNets limits their fast deployment in real-world applications. Moreover, while the resilience of CNNs have been extensively investigated to enable their energy-efficient implementations, the analysis of CapsNets' resilience is a largely unexplored area, that can provide a strong foundation to investigate techniques to overcome the CapsNets' complexity challenge. Following the trend of Approximate Computing to enable energy-efficient designs, we perform an extensive resilience analysis of the CapsNets inference subjected to the approximation errors. Our methodology models the errors arising from the approximate components (like multipliers), and analyze their impact on the classification accuracy of CapsNets. This enables the selection of approximate components based on the resilience of each operation of the CapsNet inference. We modify the TensorFlow framework to simulate the injection of approximation noise (based on the models of the approximate components) at different computational operations of the CapsNet inference. Our results show that the CapsNets are more resilient to the errors injected in the computations that occur during the dynamic routing (the softmax and the update of the coefficients), rather than other stages like convolutions and activation functions. Our analysis is extremely useful towards designing efficient CapsNet hardware accelerators with approximate components. To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, France

arXiv:1907.07229 [pdf, other]

doi 10.1109/ICCAD45719.2019.8942068

ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Authors: Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the comp… ▽ More The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate. △ Less

Submitted 25 July, 2019; v1 submitted 11 June, 2019; originally announced July 2019.

Comments: Accepted for 2019 IEEE/ACM International Conference On Computer-Aided Design (ICCAD'19)

arXiv:1905.10142 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207533

FasTrCaps: An Integrated Framework for Fast yet Accurate Training of Capsule Networks

Authors: Alberto Marchisio, Beatrice Bussolino, Alessio Colucci, Muhammad Abdullah Hanif, Maurizio Martina, Guido Masera, Muhammad Shafique

Abstract: Recently, Capsule Networks (CapsNets) have shown improved performance compared to the traditional Convolutional Neural Networks (CNNs), by encoding and preserving spatial relationships between the detected features in a better way. This is achieved through the so-called Capsules (i.e., groups of neurons) that encode both the instantiation probability and the spatial information. However, one of th… ▽ More Recently, Capsule Networks (CapsNets) have shown improved performance compared to the traditional Convolutional Neural Networks (CNNs), by encoding and preserving spatial relationships between the detected features in a better way. This is achieved through the so-called Capsules (i.e., groups of neurons) that encode both the instantiation probability and the spatial information. However, one of the major hurdles in the wide adoption of CapsNets is their gigantic training time, which is primarily due to the relatively higher complexity of their new constituting elements that are different from CNNs. In this paper, we implement different optimizations in the training loop of the CapsNets, and investigate how these optimizations affect their training speed and the accuracy. Towards this, we propose a novel framework FasTrCaps that integrates multiple lightweight optimizations and a novel learning rate policy called WarmAdaBatch (that jointly performs warm restarts and adaptive batch size), and steers them in an appropriate way to provide high training-loop speedup at minimal accuracy loss. We also propose weight sharing for capsule layers. The goal is to reduce the hardware requirements of CapsNets by removing unused/redundant connections and capsules, while keeping high accuracy through tests of different learning rate policies and batch sizes. We demonstrate that one of the solutions generated by the FasTrCaps framework can achieve 58.6% reduction in the training time, while preserving the accuracy (even 0.12% accuracy improvement for the MNIST dataset), compared to the CapsNet by Google Brain. The Pareto-optimal solutions generated by FasTrCaps can be leveraged to realize trade-offs between training time and achieved accuracy. We have open-sourced our framework on https://github.com/Alexei95/FasTrCaps. △ Less

Submitted 18 May, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

Comments: Accepted for publication at the 2020 International Joint Conference on Neural Networks (IJCNN)

arXiv:1902.10807 [pdf, other]

doi 10.1145/3316781.3317781

autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Authors: Vojtech Mrazek, Muhammad Abdullah Hanif, Zdenek Vasicek, Lukas Sekanina, Muhammad Shafique

Abstract: Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic o… ▽ More Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately $10^3$ highly important implementations from $10^{23}$ possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor. △ Less

Submitted 1 April, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

Comments: Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, USA

arXiv:1902.10222 [pdf, other]

doi 10.1109/TVLSI.2021.3060509

ROMANet: Fine-Grained Reuse-Driven Off-Chip Memory Access Management and Data Organization for Deep Neural Network Accelerators

Authors: Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique

Abstract: Enabling high energy efficiency is crucial for embedded implementations of deep learning. Several studies have shown that the DRAM-based off-chip memory accesses are one of the most energy-consuming operations in deep neural network (DNN) accelerators, and thereby limit the designs from achieving efficiency gains at the full potential. DRAM access energy varies depending upon the number of accesse… ▽ More Enabling high energy efficiency is crucial for embedded implementations of deep learning. Several studies have shown that the DRAM-based off-chip memory accesses are one of the most energy-consuming operations in deep neural network (DNN) accelerators, and thereby limit the designs from achieving efficiency gains at the full potential. DRAM access energy varies depending upon the number of accesses required as well as the energy consumed per-access. Therefore, searching for a solution towards the minimum DRAM access energy is an important optimization problem. Towards this, we propose the ROMANet methodology that aims at reducing the number of memory accesses, by searching for the appropriate data partitioning and scheduling for each layer of a network using a design space exploration, based on the knowledge of the available on-chip memory and the data reuse factors. Moreover, ROMANet also targets decreasing the number of DRAM row buffer conflicts and misses, by exploiting the DRAM multi-bank burst feature to improve the energy-per-access. Besides providing the energy benefits, our proposed DRAM data mapping also results in an increased effective DRAM throughput, which is useful for latency-constraint scenarios. Our experimental results show that the ROMANet saves DRAM access energy by 12% for the AlexNet, by 36% for the VGG-16, and by 46% for the MobileNet, while also improving the DRAM throughput by 10%, as compared to the state-of-the-art. △ Less

Submitted 2 August, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: Submitted to the IEEE-TVLSI journal, 14 pages, 26 figures

arXiv:1902.01151 [pdf, other]

CapStore: Energy-Efficient Design and Management of the On-Chip Memory for CapsuleNet Inference Accelerators

Authors: Alberto Marchisio, Muhammad Abdullah Hanif, Mohammad Taghi Teimoori, Muhammad Shafique

Abstract: Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently, CapsuleNets have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules. However, they pose high computational and memory requirements, which makes energy-efficient inference a challenging task. In this paper, we perfor… ▽ More Deep Neural Networks (DNNs) have been established as the state-of-the-art algorithm for advanced machine learning applications. Recently, CapsuleNets have improved the generalization ability, as compared to DNNs, due to their multi-dimensional capsules. However, they pose high computational and memory requirements, which makes energy-efficient inference a challenging task. In this paper, we perform an extensive analysis to demonstrate their key limitations due to intense memory accesses and large on-chip memory requirements. To enable efficient CaspuleNet inference accelerators, we propose a specialized on-chip memory hierarchy which minimizes the off-chip memory accesses, while efficiently feeding the data to the accelerator. We analyze the on-chip memory requirements for each memory component of the architecture. By leveraging this analysis, we propose a methodology to explore different on-chip memory designs and a power-gating technique to further reduce the energy consumption, depending upon the utilization across different operations of a CapsuleNet. Our memory designs can significantly reduce the energy consumption of the on-chip memory by up to 86%, when compared to a state-of-the-art memory design. Since the power consumption of the memory elements is the major contributor in the power breakdown of the CapsuleNet accelerator, as we will also show in our analyses, the proposed memory design can effectively reduce the overall energy consumption of the complete CapsuleNet accelerator architecture. △ Less

Submitted 12 April, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

arXiv:1902.01147 [pdf, other]

doi 10.1109/IJCNN48605.2020.9207297

Is Spiking Secure? A Comparative Study on the Security Vulnerabilities of Spiking and Deep Neural Networks

Authors: Alberto Marchisio, Giorgio Nanfa, Faiq Khalid, Muhammad Abdullah Hanif, Maurizio Martina, Muhammad Shafique

Abstract: Spiking Neural Networks (SNNs) claim to present many advantages in terms of biological plausibility and energy efficiency compared to standard Deep Neural Networks (DNNs). Recent works have shown that DNNs are vulnerable to adversarial attacks, i.e., small perturbations added to the input data can lead to targeted or random misclassifications. In this paper, we aim at investigating the key researc… ▽ More Spiking Neural Networks (SNNs) claim to present many advantages in terms of biological plausibility and energy efficiency compared to standard Deep Neural Networks (DNNs). Recent works have shown that DNNs are vulnerable to adversarial attacks, i.e., small perturbations added to the input data can lead to targeted or random misclassifications. In this paper, we aim at investigating the key research question: ``Are SNNs secure?'' Towards this, we perform a comparative study of the security vulnerabilities in SNNs and DNNs w.r.t. the adversarial noise. Afterwards, we propose a novel black-box attack methodology, i.e., without the knowledge of the internal structure of the SNN, which employs a greedy heuristic to automatically generate imperceptible and robust adversarial examples (i.e., attack images) for the given SNN. We perform an in-depth evaluation for a Spiking Deep Belief Network (SDBN) and a DNN having the same number of layers and neurons (to obtain a fair comparison), in order to study the efficiency of our methodology and to understand the differences between SNNs and DNNs w.r.t. the adversarial examples. Our work opens new avenues of research towards the robustness of the SNNs, considering their similarities to the human brain's functionality. △ Less

Submitted 18 May, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: Accepted for publication at the 2020 International Joint Conference on Neural Networks (IJCNN)

arXiv:1901.10258 [pdf, other]

RED-Attack: Resource Efficient Decision based Attack for Machine Learning

Authors: Faiq Khalid, Hassan Ali, Muhammad Abdullah Hanif, Semeen Rehman, Rehan Ahmed, Muhammad Shafique

Abstract: Due to data dependency and model leakage properties, Deep Neural Networks (DNNs) exhibit several security vulnerabilities. Several security attacks exploited them but most of them require the output probability vector. These attacks can be mitigated by concealing the output probability vector. To address this limitation, decision-based attacks have been proposed which can estimate the model but th… ▽ More Due to data dependency and model leakage properties, Deep Neural Networks (DNNs) exhibit several security vulnerabilities. Several security attacks exploited them but most of them require the output probability vector. These attacks can be mitigated by concealing the output probability vector. To address this limitation, decision-based attacks have been proposed which can estimate the model but they require several thousand queries to generate a single untargeted attack image. However, in real-time attacks, resources and attack time are very crucial parameters. Therefore, in resource-constrained systems, e.g., autonomous vehicles where an untargeted attack can have a catastrophic effect, these attacks may not work efficiently. To address this limitation, we propose a resource efficient decision-based methodology which generates the imperceptible attack, i.e., the RED-Attack, for a given black-box model. The proposed methodology follows two main steps to generate the imperceptible attack, i.e., classification boundary estimation and adversarial noise optimization. Firstly, we propose a half-interval search-based algorithm for estimating a sample on the classification boundary using a target image and a randomly selected image from another class. Secondly, we propose an optimization algorithm which first, introduces a small perturbation in some randomly selected pixels of the estimated sample. Then to ensure imperceptibility, it optimizes the distance between the perturbed and target samples. For illustration, we evaluate it for CFAR-10 and German Traffic Sign Recognition (GTSR) using state-of-the-art networks. △ Less

Submitted 30 January, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

Showing 1–50 of 73 results for author: Hanif, M