subscribe to arXiv mailings

Multi-person eye tracking for real-world scene perception in social settings

Authors: Shreshth Saxena, Areez Visram, Neil Lobo, Zahid Mirza, Mehak Rafi Khan, Biranugan Pirabaharan, Alexander Nguyen, Lauren K. Fink

Abstract: Eye movements provide a window into human behaviour, attention, and interaction dynamics. Previous research suggests that eye movements are highly influenced by task, setting, and social others; however, most eye tracking research is conducted in single-person, in-lab settings and is yet to be validated in multi-person, naturalistic contexts. One such prevalent real-world context is the collective… ▽ More Eye movements provide a window into human behaviour, attention, and interaction dynamics. Previous research suggests that eye movements are highly influenced by task, setting, and social others; however, most eye tracking research is conducted in single-person, in-lab settings and is yet to be validated in multi-person, naturalistic contexts. One such prevalent real-world context is the collective viewing of a shared scene in social settings, for example, viewing a concert, film, lecture, sports, etc. Here, we apply mobile eye tracking in a real-world multi-person setup and develop a system to stream, record, and analyse synchronised data. We tested our proposed, open-source system while participants (N=60) watched a live concert and a documentary film screening during a public event. We tackled challenges related to networking bandwidth requirements, real-time monitoring, and gaze projection from individual egocentric perspectives to a common coordinate space for shared gaze analysis. Our system achieves precise time synchronisation and accurate gaze projection in challenging dynamic scenes. Further, to illustrate the potential of collective eye-tracking data, we introduce and evaluate novel analysis metrics and visualisations. Overall, our approach contributes to the development and application of versatile multi-person eye tracking systems in real-world social settings. This advancement enables insight into collaborative behaviour, group dynamics, and social interaction, with high ecological validity. Moreover, it paves the path for innovative, interactive tools that promote collaboration and coordination in social contexts. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Please refer to the supplementary video illustrating the proposed approach in this paper here: https://tinyurl.com/multipersonET

ACM Class: I.4.8; J.4; J.5; C.4; D.2.10

arXiv:2406.13439 [pdf, other]

Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

Authors: Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra

Abstract: Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework d… ▽ More Large Language Models (LLMs) are increasingly relied upon to evaluate text outputs of other LLMs, thereby influencing leaderboards and development decisions. However, concerns persist over the accuracy of these assessments and the potential for misleading conclusions. In this work, we investigate the effectiveness of LLMs as evaluators for text generation tasks. We propose FBI, a novel framework designed to examine the proficiency of Evaluator LLMs in assessing four critical abilities in other LLMs: factual accuracy, instruction following, coherence in long-form writing, and reasoning proficiency. By introducing targeted perturbations in answers generated by LLMs, that clearly impact one of these key capabilities, we test whether an Evaluator LLM can detect these quality drops. By creating a total of 2400 perturbed answers covering 22 perturbation categories, we conduct a comprehensive study using different evaluation strategies on five prominent LLMs commonly used as evaluators in the literature. Our findings reveal significant shortcomings in current Evaluator LLMs, which failed to identify quality drops in over 50\% of cases on average. Single-answer and pairwise evaluations demonstrated notable limitations, whereas reference-based evaluations showed comparatively better performance. These results underscore the unreliable nature of current Evaluator LLMs and advocate for cautious implementation in practical applications. Code and data are available at https://github.com/AI4Bharat/FBI. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.06638 [pdf, other]

Particle Multi-Axis Transformer for Jet Tagging

Authors: Muhammad Usman, M Husnain Shahid, Maheen Ejaz, Ummay Hani, Nayab Fatima, Abdul Rehman Khan, Asifullah Khan, Nasir Majid Mirza

Abstract: Jet tagging is an essential categorization problem in high energy physics. In recent times, Deep Learning has not only risen to the challenge of jet tagging but also significantly improved its performance. In this article, we proposed an idea of a new architecture, Particle Multi-Axis transformer (ParMAT) which is a modified version of Particle transformer (ParT). ParMAT contains local and global… ▽ More Jet tagging is an essential categorization problem in high energy physics. In recent times, Deep Learning has not only risen to the challenge of jet tagging but also significantly improved its performance. In this article, we proposed an idea of a new architecture, Particle Multi-Axis transformer (ParMAT) which is a modified version of Particle transformer (ParT). ParMAT contains local and global spatial interactions within a single unit which improves its ability to handle various input lengths. We trained our model on JETCLASS, a publicly available large dataset that contains 100M jets of 10 different classes of particles. By integrating a parallel attention mechanism and pairwise interactions of particles in the attention mechanism, ParMAT achieves robustness and higher accuracy over the ParT and ParticleNet. The scalability of the model to huge datasets and its ability to automatically extract essential features demonstrate its potential for enhancing jet tagging. △ Less

Submitted 16 July, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.00532 [pdf, other]

Breast Cancer Diagnosis: A Comprehensive Exploration of Explainable Artificial Intelligence (XAI) Techniques

Authors: Samita Bai, Sidra Nasir, Rizwan Ahmed Khan, Sheeraz Arif, Alexandre Meyer, Hubert Konik

Abstract: Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies cont… ▽ More Breast cancer (BC) stands as one of the most common malignancies affecting women worldwide, necessitating advancements in diagnostic methodologies for better clinical outcomes. This article provides a comprehensive exploration of the application of Explainable Artificial Intelligence (XAI) techniques in the detection and diagnosis of breast cancer. As Artificial Intelligence (AI) technologies continue to permeate the healthcare sector, particularly in oncology, the need for transparent and interpretable models becomes imperative to enhance clinical decision-making and patient care. This review discusses the integration of various XAI approaches, such as SHAP, LIME, Grad-CAM, and others, with machine learning and deep learning models utilized in breast cancer detection and classification. By investigating the modalities of breast cancer datasets, including mammograms, ultrasounds and their processing with AI, the paper highlights how XAI can lead to more accurate diagnoses and personalized treatment plans. It also examines the challenges in implementing these techniques and the importance of developing standardized metrics for evaluating XAI's effectiveness in clinical settings. Through detailed analysis and discussion, this article aims to highlight the potential of XAI in bridging the gap between complex AI models and practical healthcare applications, thereby fostering trust and understanding among medical professionals and improving patient outcomes. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20363 [pdf, other]

LLMGeo: Benchmarking Large Language Models on Image Geolocation In-the-wild

Authors: Zhiqiang Wang, Dejia Xu, Rana Muhammad Shahroz Khan, Yanbin Lin, Zhiwen Fan, Xingquan Zhu

Abstract: Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images f… ▽ More Image geolocation is a critical task in various image-understanding applications. However, existing methods often fail when analyzing challenging, in-the-wild images. Inspired by the exceptional background knowledge of multimodal language models, we systematically evaluate their geolocation capabilities using a novel image dataset and a comprehensive evaluation framework. We first collect images from various countries via Google Street View. Then, we conduct training-free and training-based evaluations on closed-source and open-source multi-modal language models. we conduct both training-free and training-based evaluations on closed-source and open-source multimodal language models. Our findings indicate that closed-source models demonstrate superior geolocation abilities, while open-source models can achieve comparable performance through fine-tuning. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, 5 tables, CVPR 2024 Workshop on Computer Vision in the Wild

arXiv:2404.03892 [pdf, other]

Enhancing Breast Cancer Diagnosis in Mammography: Evaluation and Integration of Convolutional Neural Networks and Explainable AI

Authors: Maryam Ahmed, Tooba Bibi, Rizwan Ahmed Khan, Sidra Nasir

Abstract: The Deep learning (DL) models for diagnosing breast cancer from mammographic images often operate as "black boxes", making it difficult for healthcare professionals to trust and understand their decision-making processes. The study presents an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cance… ▽ More The Deep learning (DL) models for diagnosing breast cancer from mammographic images often operate as "black boxes", making it difficult for healthcare professionals to trust and understand their decision-making processes. The study presents an integrated framework combining Convolutional Neural Networks (CNNs) and Explainable Artificial Intelligence (XAI) for the enhanced diagnosis of breast cancer using the CBIS-DDSM dataset. The methodology encompasses an elaborate data preprocessing pipeline and advanced data augmentation techniques to counteract dataset limitations and transfer learning using pre-trained networks such as VGG-16, Inception-V3 and ResNet was employed. A focal point of our study is the evaluation of XAI's effectiveness in interpreting model predictions, highlighted by utilizing the Hausdorff measure to assess the alignment between AI-generated explanations and expert annotations quantitatively. This approach is critical for XAI in promoting trustworthiness and ethical fairness in AI-assisted diagnostics. The findings from our research illustrate the effective collaboration between CNNs and XAI in advancing diagnostic methods for breast cancer, thereby facilitating a more seamless integration of advanced AI technologies within clinical settings. By enhancing the interpretability of AI driven decisions, this work lays the groundwork for improved collaboration between AI systems and medical practitioners, ultimately enriching patient care. Furthermore, the implications of our research extended well beyond the current methodologies. It encourages further research into how to combine multimodal data and improve AI explanations to meet the needs of clinical practice. △ Less

Submitted 27 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01878 [pdf, other]

Real, fake and synthetic faces -- does the coin have three sides?

Authors: Shahzeb Naeem, Ramzi Al-Sharawi, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Hasan Al-Nashash

Abstract: With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two pa… ▽ More With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two parts: firstly, we incorporate eight deep learning models and analyze their performances in distinguishing between the three classes of images. Next, we look to further delve into the similarities and differences between these three sets of images by investigating their image properties both in the context of the entire image as well as in the context of specific regions within the image. ANOVA test was also performed and provided further clarity amongst the patterns associated between the images of the three classes. From our findings, we observe that the investigated deeplearning models found it easier to detect synthetic facial images, with the ViT Patch-16 model performing best on this task with a class-averaged sensitivity, specificity, precision, and accuracy of 97.37%, 98.69%, 97.48%, and 98.25%, respectively. This observation was supported by further analysis of various image properties. We saw noticeable differences across the three category of images. This analysis can help us build better algorithms for facial image generation, and also shows that synthetic, deepfake and real face images are indeed three different classes. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.01438 [pdf]

Generation and Detection of Sign Language Deepfakes -- A Linguistic and Visual Analysis

Authors: Shahzeb Naeem, Muhammad Riyyan Khan, Usman Tariq, Abhinav Dhall, Carlos Ivan Colon, Hasan Al-Nashash

Abstract: A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign lan… ▽ More A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign language expert. This is particularly helpful, given the intricate nature of sign language, a scarcity of sign language experts, and potential benefits for health and education. The objectives of this work encompass constructing a reliable deepfake dataset, evaluating its technical and visual credibility through computer vision and natural language processing models, and assessing the plausibility of the generated content. With over 1200 videos, featuring both previously seen and unseen individuals for the generation model, using the help of a sign language expert, we establish a deepfake dataset in sign language that can further be utilized to detect fake videos that may target certain people of determination. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 13 pages, 13 figures, Computer Vision and Image Understanding Journal

arXiv:2403.06350 [pdf, other]

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

Authors: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

Abstract: Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-re… ▽ More Despite the considerable advancements in English LLMs, the progress in building comparable models for other languages has been hindered due to the scarcity of tailored resources. Our work aims to bridge this divide by introducing an expansive suite of resources specifically designed for the development of Indic LLMs, covering 22 languages, containing a total of 251B tokens and 74.8M instruction-response pairs. Recognizing the importance of both data quality and quantity, our approach combines highly curated manually verified data, unverified yet valuable data, and synthetic data. We build a clean, open-source pipeline for curating pre-training data from diverse sources, including websites, PDFs, and videos, incorporating best practices for crawling, cleaning, flagging, and deduplication. For instruction-fine tuning, we amalgamate existing Indic datasets, translate/transliterate English datasets into Indian languages, and utilize LLaMa2 and Mixtral models to create conversations grounded in articles from Indian Wikipedia and Wikihow. Additionally, we address toxicity alignment by generating toxic prompts for multiple scenarios and then generate non-toxic responses by feeding these toxic prompts to an aligned LLaMa2 model. We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages. The data and other artifacts created as part of this work are released with permissive licenses. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2402.09573 [pdf, other]

Changes by Butterflies: Farsighted Forecasting with Group Reservoir Transformer

Authors: Md Kowsher, Abdul Rafae Khan, Jia Xu

Abstract: In Chaos, a minor divergence between two initial conditions exhibits exponential amplification over time, leading to far-away outcomes, known as the butterfly effect. Thus, the distant future is full of uncertainty and hard to forecast. We introduce Group Reservoir Transformer to predict long-term events more accurately and robustly by overcoming two challenges in Chaos: (1) the extensive historic… ▽ More In Chaos, a minor divergence between two initial conditions exhibits exponential amplification over time, leading to far-away outcomes, known as the butterfly effect. Thus, the distant future is full of uncertainty and hard to forecast. We introduce Group Reservoir Transformer to predict long-term events more accurately and robustly by overcoming two challenges in Chaos: (1) the extensive historical sequences and (2) the sensitivity to initial conditions. A reservoir is attached to a Transformer to efficiently handle arbitrarily long historical lengths, with an extension of a group of reservoirs to reduce the sensitivity to the initialization variations. Our architecture consistently outperforms state-of-the-art models in multivariate time series, including TimeLLM, GPT2TS, PatchTST, DLinear, TimeNet, and the baseline Transformer, with an error reduction of up to -59\% in various fields such as ETTh, ETTm, and air quality, demonstrating that an ensemble of butterfly learning can improve the adequacy and certainty of event prediction, despite of the traveling time to the unknown future. △ Less

Submitted 13 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.15006 [pdf, other]

Airavata: Introducing Hindi Instruction-tuned LLM

Authors: Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan

Abstract: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additional… ▽ More We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata. △ Less

Submitted 26 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: Work in progress

arXiv:2312.15058 [pdf, other]

doi 10.1109/MS.2024.3366111

The State of Documentation Practices of Third-party Machine Learning Models and Datasets

Authors: Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine Stinson, Bram Adams

Abstract: Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model… ▽ More Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 7 pages, 4 figures, IEEESoftware format

Journal ref: IEEE Software 2024

arXiv:2312.13041 [pdf, other]

Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP

Authors: Kasim Tasdemir, Rafiullah Khan, Fahad Siddiqui, Sakir Sezer, Fatih Kurugollu, Sena Busra Yengec-Tasdemir, Alperen Bolat

Abstract: Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and tran… ▽ More Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone. Our approach is tested in a realistic setting and compared with 35 other methods, including Machine Learning-based and transformer models like BERT, on a dataset of over 30,000 SQL sentences. Our results show that this hybrid method effectively detects SQLi in high-traffic environments, offering efficient and accurate protection against SQLi vulnerabilities with computational efficiency. The code is available at https://github.com/gdrlab/cascaded-sqli-detection . △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 11 pages, The code is available at https://github.com/gdrlab/cascaded-sqli-detection This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2312.00634 [pdf]

A Recent Survey of Vision Transformers for Medical Image Segmentation

Authors: Asifullah Khan, Zunaira Rauf, Abdul Rehman Khan, Saima Rathore, Saddam Hussain Khan, Najmus Saher Shah, Umair Farooq, Hifsa Asif, Aqsa Asif, Umme Zahoora, Rafi Ullah Khalil, Suleman Qamar, Umme Hani Asif, Faiza Babar Khan, Abdul Majid, Jeonghwan Gwak

Abstract: Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, inte… ▽ More Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, interconnected structures often encountered in medical data. In recent years, Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation. Their multi-scale attention mechanism enables effective modeling of long-range dependencies between distant structures, crucial for segmenting organs or lesions spanning the image. Additionally, ViTs' ability to discern subtle pattern heterogeneity allows for the precise delineation of intricate boundaries and edges, a critical aspect of accurate medical image segmentation. However, they do lack image-related inductive bias and translational invariance, potentially impacting their performance. Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs) to capture local correlation in addition to the global information in the images. This survey paper provides a detailed review of the recent advancements in ViTs and HVTs for medical image segmentation. Along with the categorization of ViT and HVT-based medical image segmentation approaches, we also present a detailed overview of their real-time applications in several medical image modalities. This survey may serve as a valuable resource for researchers, healthcare practitioners, and students in understanding the state-of-the-art approaches for ViT-based medical image segmentation. △ Less

Submitted 18 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2310.17729 [pdf]

Improving Traffic Density Forecasting in Intelligent Transportation Systems Using Gated Graph Neural Networks

Authors: Razib Hayat Khan, Jonayet Miah, S M Yasir Arafat, M M Mahbubul Syeed, Duc M Ca

Abstract: This study delves into the application of graph neural networks in the realm of traffic forecasting, a crucial facet of intelligent transportation systems. Accurate traffic predictions are vital for functions like trip planning, traffic control, and vehicle routing in such systems. Three prominent GNN architectures Graph Convolutional Networks (Graph Sample and Aggregation) and Gated Graph Neural… ▽ More This study delves into the application of graph neural networks in the realm of traffic forecasting, a crucial facet of intelligent transportation systems. Accurate traffic predictions are vital for functions like trip planning, traffic control, and vehicle routing in such systems. Three prominent GNN architectures Graph Convolutional Networks (Graph Sample and Aggregation) and Gated Graph Neural Networks are explored within the context of traffic prediction. Each architecture's methodology is thoroughly examined, including layer configurations, activation functions,and hyperparameters. The primary goal is to minimize prediction errors, with GGNNs emerging as the most effective choice among the three models. The research outlines outcomes for each architecture, elucidating their predictive performance through root mean squared error and mean absolute error (MAE). Hypothetical results reveal intriguing insights: GCNs display an RMSE of 9.10 and an MAE of 8.00, while GraphSAGE shows improvement with an RMSE of 8.3 and an MAE of 7.5. Gated Graph Neural Networks (GGNNs) exhibit the lowest RMSE at 9.15 and an impressive MAE of 7.1, positioning them as the frontrunner. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.07252 [pdf]

A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Authors: Rashid Khan, Bingding Huang, Haseeb Hassan, Asim Zaman, Zhongfu Ye

Abstract: Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a… ▽ More Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based language model as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 15pages, 10 figures, 5 tables. 2023 the 5th International Conference on Robotics and Computer Vision (ICRCV 2023). arXiv admin note: substantial text overlap with arXiv:2203.01594

arXiv:2309.00064 [pdf, other]

doi 10.1109/ACCESS.2024.3369912

Ethical Framework for Harnessing the Power of AI in Healthcare and Beyond

Authors: Sidra Nasir, Rizwan Ahmed Khan, Samita Bai

Abstract: In the past decade, the deployment of deep learning (Artificial Intelligence (AI)) methods has become pervasive across a spectrum of real-world applications, often in safety-critical contexts. This comprehensive research article rigorously investigates the ethical dimensions intricately linked to the rapid evolution of AI technologies, with a particular focus on the healthcare domain. Delving deep… ▽ More In the past decade, the deployment of deep learning (Artificial Intelligence (AI)) methods has become pervasive across a spectrum of real-world applications, often in safety-critical contexts. This comprehensive research article rigorously investigates the ethical dimensions intricately linked to the rapid evolution of AI technologies, with a particular focus on the healthcare domain. Delving deeply, it explores a multitude of facets including transparency, adept data management, human oversight, educational imperatives, and international collaboration within the realm of AI advancement. Central to this article is the proposition of a conscientious AI framework, meticulously crafted to accentuate values of transparency, equity, answerability, and a human-centric orientation. The second contribution of the article is the in-depth and thorough discussion of the limitations inherent to AI systems. It astutely identifies potential biases and the intricate challenges of navigating multifaceted contexts. Lastly, the article unequivocally accentuates the pressing need for globally standardized AI ethics principles and frameworks. Simultaneously, it aptly illustrates the adaptability of the ethical framework proposed herein, positioned skillfully to surmount emergent challenges. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Journal ref: IEEE Access 2024

arXiv:2308.16571 [pdf, ps, other]

Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach

Authors: Ashrafur Rahman Khan, Asif Azad

Abstract: In the rapidly evolving digital era, the analysis of document layouts plays a pivotal role in automated information extraction and interpretation. In our work, we have trained MViTv2 transformer model architecture with cascaded mask R-CNN on BaDLAD dataset to extract text box, paragraphs, images and tables from a document. After training on 20365 document images for 36 epochs in a 3 phase cycle, w… ▽ More In the rapidly evolving digital era, the analysis of document layouts plays a pivotal role in automated information extraction and interpretation. In our work, we have trained MViTv2 transformer model architecture with cascaded mask R-CNN on BaDLAD dataset to extract text box, paragraphs, images and tables from a document. After training on 20365 document images for 36 epochs in a 3 phase cycle, we achieved a training loss of 0.2125 and a mask loss of 0.19. Our work extends beyond training, delving into the exploration of potential enhancement avenues. We investigate the impact of rotation and flip augmentation, the effectiveness of slicing input images pre-inference, the implications of varying the resolution of the transformer backbone, and the potential of employing a dual-pass inference to uncover missed text-boxes. Through these explorations, we observe a spectrum of outcomes, where some modifications result in tangible performance improvements, while others offer unique insights for future endeavors. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.01760 [pdf, other]

NuInsSeg: A Fully Annotated Dataset for Nuclei Instance Segmentation in H&E-Stained Histological Images

Authors: Amirreza Mahbod, Christine Polak, Katharina Feldmann, Rumsha Khan, Katharina Gelles, Georg Dorffner, Ramona Woitek, Sepideh Hatamikia, Isabella Ellinger

Abstract: In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets f… ▽ More In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available at https://www.kaggle.com/datasets/ipateam/nuinsseg and https://github.com/masih4/NuInsSeg, respectively. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 7 pages, 1 Figure

arXiv:2307.06824 [pdf]

CLAIMED -- the open source framework for building coarse-grained operators for accelerated discovery in science

Authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad

Abstract: In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art… ▽ More In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Received IEEE OSS Award 2023 - https://conferences.computer.org/services/2023/symposia/oss.html

arXiv:2307.04479 [pdf, other]

A Linear Time Quantum Algorithm for Pairwise Sequence Alignment

Authors: Md. Rabiul Islam Khan, Shadman Shahriar, Shaikh Farhan Rafid

Abstract: Sequence Alignment is the process of aligning biological sequences in order to identify similarities between multiple sequences. In this paper, a Quantum Algorithm for finding the optimal alignment between DNA sequences has been demonstrated which works by mapping the sequence alignment problem into a path-searching problem through a 2D graph. The transition, which converges to a fixed path on the… ▽ More Sequence Alignment is the process of aligning biological sequences in order to identify similarities between multiple sequences. In this paper, a Quantum Algorithm for finding the optimal alignment between DNA sequences has been demonstrated which works by mapping the sequence alignment problem into a path-searching problem through a 2D graph. The transition, which converges to a fixed path on the graph, is based on a proposed oracle for profit calculation. By implementing Grover's search algorithm, our proposed approach is able to align a pair of sequences and figure out the optimal alignment within linear time, which hasn't been attained by any classical deterministic algorithm. In addition to that, the proposed algorithm is capable of quadratic speeding up to any unstructured search problem by finding out the optimal paths accurately in a deterministic manner, in contrast to existing randomized algorithms that frequently sort out the sub-optimal alignments, therefore, don't always guarantee of finding out the optimal solutions. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2305.08396 [pdf, other]

MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation

Authors: Abdul Rehman Khan, Asifullah Khan

Abstract: Since their emergence, Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also in medical image segmentation due to their ability to process… ▽ More Since their emergence, Convolutional Neural Networks (CNNs) have made significant strides in medical image analysis. However, the local nature of the convolution operator may pose a limitation for capturing global and long-range interactions in CNNs. Recently, Transformers have gained popularity in the computer vision community and also in medical image segmentation due to their ability to process global features effectively. The scalability issues of the self-attention mechanism and lack of the CNN-like inductive bias may have limited their adoption. Therefore, hybrid Vision transformers (CNN-Transformer), exploiting the advantages of both Convolution and Self-attention Mechanisms, have gained importance. In this work, we present MaxViT-UNet, a new Encoder-Decoder based UNet type hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with a nominal memory and computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, thereby helping in improving the segmentation efficiency. In the Hybrid Decoder, a new block is also proposed. The fusion process commences by integrating the upsampled lower-level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to segment the nuclei regions progressively. Experimental results on MoNuSeg18 and MoNuSAC20 datasets demonstrate the effectiveness of the proposed technique. △ Less

Submitted 29 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 19 pages, 6 figures, 5 tables

arXiv:2302.03232 [pdf, other]

Linear Optimal Partial Transport Embedding

Authors: Yikun Bai, Ivan Medri, Rocio Diaz Martin, Rana Muhammad Shahroz Khan, Soheil Kolouri

Abstract: Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed.… ▽ More Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed. In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. The proposed embedding allows for faster computation of OPT distance between pairs of positive measures. Besides our theoretical contributions, we demonstrate the LOPT embedding technique in point-cloud interpolation and PCA analysis. △ Less

Submitted 23 April, 2024; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.08479 [pdf, other]

Pneumonia Detection in Chest X-Ray Images : Handling Class Imbalance

Authors: Wardah Ali, Eesha Qureshi, Omama Ahmed Farooqi, Rizwan Ahmed Khan

Abstract: People all over the globe are affected by pneumonia but deaths due to it are highest in Sub-Saharan Asia and South Asia. In recent years, the overall incidence and mortality rate of pneumonia regardless of the utilization of effective vaccines and compelling antibiotics has escalated. Thus, pneumonia remains a disease that needs spry prevention and treatment. The widespread prevalence of pneumonia… ▽ More People all over the globe are affected by pneumonia but deaths due to it are highest in Sub-Saharan Asia and South Asia. In recent years, the overall incidence and mortality rate of pneumonia regardless of the utilization of effective vaccines and compelling antibiotics has escalated. Thus, pneumonia remains a disease that needs spry prevention and treatment. The widespread prevalence of pneumonia has caused the research community to come up with a framework that helps detect, diagnose and analyze diseases accurately and promptly. One of the major hurdles faced by the Artificial Intelligence (AI) research community is the lack of publicly available datasets for chest diseases, including pneumonia . Secondly, few of the available datasets are highly imbalanced (normal examples are over sampled, while samples with ailment are in severe minority) making the problem even more challenging. In this article we present a novel framework for the detection of pneumonia. The novelty of the proposed methodology lies in the tackling of class imbalance problem. The Generative Adversarial Network (GAN), specifically a combination of Deep Convolutional Generative Adversarial Network (DCGAN) and Wasserstein GAN gradient penalty (WGAN-GP) was applied on the minority class ``Pneumonia'' for augmentation, whereas Random Under-Sampling (RUS) was done on the majority class ``No Findings'' to deal with the imbalance problem. The ChestX-Ray8 dataset, one of the biggest datasets, is used to validate the performance of the proposed framework. The learning phase is completed using transfer learning on state-of-the-art deep learning models i.e. ResNet-50, Xception, and VGG-16. Results obtained exceed state-of-the-art. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2212.02230 [pdf]

doi 10.1109/AIBT53261.2021.00015

A Hybrid Evolutionary Approach to Solve University Course Allocation Problem

Authors: Dibyo Fabian Dofadar, Riyo Hayat Khan, Shafqat Hasan, Towshik Anam Taj, Arif Shakil, Mahbub Majumdar

Abstract: This paper discusses various types of constraints, difficulties and solutions to overcome the challenges regarding university course allocation problem. A hybrid evolutionary algorithm has been defined combining Local Repair Algorithm and Modified Genetic Algorithm to generate the best course assignment. After analyzing the collected dataset, all the necessary constraints were formulated. These co… ▽ More This paper discusses various types of constraints, difficulties and solutions to overcome the challenges regarding university course allocation problem. A hybrid evolutionary algorithm has been defined combining Local Repair Algorithm and Modified Genetic Algorithm to generate the best course assignment. After analyzing the collected dataset, all the necessary constraints were formulated. These constraints manage to cover the aspects needed to be kept in mind while preparing clash free and efficient class schedules for every faculty member. The goal is to generate an optimized solution which will fulfill those constraints while maintaining time efficiency and also reduce the workload of handling this task manually. The proposed algorithm was compared with some base level optimization algorithms to show the better efficiency in terms of accuracy and time. △ Less

Submitted 24 July, 2023; v1 submitted 15 November, 2022; originally announced December 2022.

arXiv:2211.06642 [pdf, other]

ConceptX: A Framework for Latent Concept Analysis

Authors: Firoj Alam, Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Abdul Rafae Khan, Jia Xu

Abstract: The opacity of deep neural networks remains a challenge in deploying solutions where explanation is as important as precision. We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in pre-trained Language Models (pLMs). We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to… ▽ More The opacity of deep neural networks remains a challenge in deploying solutions where explanation is as important as precision. We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in pre-trained Language Models (pLMs). We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts. To facilitate the process, we provide auto-annotations of the concepts (based on traditional linguistic ontologies). Such annotations enable development of a linguistic resource that directly represents latent concepts learned within deep NLP models. These include not just traditional linguistic concepts, but also task-specific or sensitive concepts (words grouped based on gender or religious connotation) that helps the annotators to mark bias in the model. The framework consists of two parts (i) concept discovery and (ii) annotation platform. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: AAAI 23

arXiv:2210.11670 [pdf, ps, other]

SIT at MixMT 2022: Fluent Translation Built on Giant Pre-trained Models

Authors: Abdul Rafae Khan, Hrishikesh Kanade, Girish Amar Budhrani, Preet Jhanglani, Jia Xu

Abstract: This paper describes the Stevens Institute of Technology's submission for the WMT 2022 Shared Task: Code-mixed Machine Translation (MixMT). The task consisted of two subtasks, subtask $1$ Hindi/English to Hinglish and subtask $2$ Hinglish to English translation. Our findings lie in the improvements made through the use of large pre-trained multilingual NMT models and in-domain datasets, as well as… ▽ More This paper describes the Stevens Institute of Technology's submission for the WMT 2022 Shared Task: Code-mixed Machine Translation (MixMT). The task consisted of two subtasks, subtask $1$ Hindi/English to Hinglish and subtask $2$ Hinglish to English translation. Our findings lie in the improvements made through the use of large pre-trained multilingual NMT models and in-domain datasets, as well as back-translation and ensemble techniques. The translation output is automatically evaluated against the reference translations using ROUGE-L and WER. Our system achieves the $1^{st}$ position on subtask $2$ according to ROUGE-L, WER, and human evaluation, $1^{st}$ position on subtask $1$ according to WER and human evaluation, and $3^{rd}$ position on subtask $1$ with respect to ROUGE-L metric. △ Less

Submitted 16 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2208.01355 [pdf, other]

doi 10.1109/ISIEA54517.2022.9873797

A Comparative Study on COVID-19 Fake News Detection Using Different Transformer Based Models

Authors: Sajib Kumar Saha Joy, Dibyo Fabian Dofadar, Riyo Hayat Khan, Md. Sabbir Ahmed, Rafeed Rahman

Abstract: The rapid advancement of social networks and the convenience of internet availability have accelerated the rampant spread of false news and rumors on social media sites. Amid the COVID 19 epidemic, this misleading information has aggravated the situation by putting peoples mental and physical lives in danger. To limit the spread of such inaccuracies, identifying the fake news from online platforms… ▽ More The rapid advancement of social networks and the convenience of internet availability have accelerated the rampant spread of false news and rumors on social media sites. Amid the COVID 19 epidemic, this misleading information has aggravated the situation by putting peoples mental and physical lives in danger. To limit the spread of such inaccuracies, identifying the fake news from online platforms could be the first and foremost step. In this research, the authors have conducted a comparative analysis by implementing five transformer based models such as BERT, BERT without LSTM, ALBERT, RoBERTa, and a Hybrid of BERT & ALBERT in order to detect the fraudulent news of COVID 19 from the internet. COVID 19 Fake News Dataset has been used for training and testing the models. Among all these models, the RoBERTa model has performed better than other models by obtaining an F1 score of 0.98 in both real and fake classes. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2206.13289 [pdf, other]

Analyzing Encoded Concepts in Transformer Language Models

Authors: Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan, Jia Xu

Abstract: We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representat… ▽ More We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation), whereas the core-linguistic concepts (e.g., morphological or syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: 20 pages, 10 figures

Journal ref: 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics

arXiv:2206.12815 [pdf, other]

doi 10.1016/j.bspc.2023.105353

Breast Cancer Classification using Deep Learned Features Boosted with Handcrafted Features

Authors: Unaiza Sajid, Rizwan Ahmed Khan, Shahid Munir Shah, Sheeraz Arif

Abstract: Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early dete… ▽ More Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early detection, classification and diagnosis. Artificial intelligence research community in coordination with medical practitioners are developing such frameworks to automate the task of detection. With the surge in research activities coupled with availability of large datasets and enhanced computational powers, it expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for classification of breast cancer using mammograms is proposed. The proposed framework combines robust features extracted from novel Convolutional Neural Network (CNN) features with handcrafted features including HOG (Histogram of Oriented Gradients) and LBP (Local Binary Pattern). The obtained results on CBIS-DDSM dataset exceed state of the art. △ Less

Submitted 16 January, 2023; v1 submitted 26 June, 2022; originally announced June 2022.

Journal ref: Biomedical Signal Processing and Control 2023

arXiv:2206.08464 [pdf, other]

PRANC: Pseudo RAndom Networks for Compacting deep models

Authors: Parsa Nooralinejad, Ali Abbasi, Soroush Abbasi Koohpayegani, Kossar Pourahmadi Meibodi, Rana Muhammad Shahroz Khan, Soheil Kolouri, Hamed Pirsiavash

Abstract: We demonstrate that a deep model can be reparametrized as a linear combination of several randomly initialized and frozen deep models in the weight space. During training, we seek local minima that reside within the subspace spanned by these random models (i.e., `basis' networks). Our framework, PRANC, enables significant compaction of a deep model. The model can be reconstructed using a single sc… ▽ More We demonstrate that a deep model can be reparametrized as a linear combination of several randomly initialized and frozen deep models in the weight space. During training, we seek local minima that reside within the subspace spanned by these random models (i.e., `basis' networks). Our framework, PRANC, enables significant compaction of a deep model. The model can be reconstructed using a single scalar `seed,' employed to generate the pseudo-random `basis' networks, together with the learned linear mixture coefficients. In practical applications, PRANC addresses the challenge of efficiently storing and communicating deep models, a common bottleneck in several scenarios, including multi-agent learning, continual learners, federated systems, and edge devices, among others. In this study, we employ PRANC to condense image classification models and compress images by compacting their associated implicit neural networks. PRANC outperforms baselines with a large margin on image classification when compressing a deep model almost $100$ times. Moreover, we show that PRANC enables memory-efficient inference by generating layer-wise weights on the fly. The source code of PRANC is here: \url{https://github.com/UCDvision/PRANC} △ Less

Submitted 28 August, 2023; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.06925 [pdf]

Towards a secured smart IoT using light weight blockchain: An aim to secure Pharmacy Products

Authors: Md. Faruk Abdullah Al Sohan, Samiur Rahman Khan, Nusrat Jahan Anannya, Md Taimur Ahad

Abstract: Blockchain has proven a very developed and secured technology. It ensures data integrity with authentic connected nodes. Now-a-days, blockchain with IoT is a great combination for secured and smart end to end product delivery. This observation has motivated the research to develop a conceptual model to provide a secure pharmaceutical product delivery by developing a IoT integrated with lightweight… ▽ More Blockchain has proven a very developed and secured technology. It ensures data integrity with authentic connected nodes. Now-a-days, blockchain with IoT is a great combination for secured and smart end to end product delivery. This observation has motivated the research to develop a conceptual model to provide a secure pharmaceutical product delivery by developing a IoT integrated with lightweight blockchain. The undeveloped and most of the developing countries are facing problems such as drug counterfeits, shortages, opiates and tracking them became difficult because of less transparency. Also, nature sensitive medicines need to be stored under controlled temperature known as cold-chain shipping. The storage of these information in the recent software is done in the centralized databases that is prone to data manipulations and hacks. Due to less production drugs needed to be imported with maintaining drug supply chain regulations by law. This paper proposes a lightweight blockchain model for pharmaceutical industries by using IoT. This model ensures traceability of drugs within a very simple way which is less complex compared to the existing ones. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 9 pages 3 figures

arXiv:2205.07237 [pdf, other]

Discovering Latent Concepts Learned in BERT

Authors: Fahim Dalvi, Abdul Rafae Khan, Firoj Alam, Nadir Durrani, Jia Xu, Hassan Sajjad

Abstract: A large number of studies that analyze deep neural network models and their ability to encode various linguistic and non-linguistic concepts provide an interpretation of the inner mechanics of these models. The scope of the analyses is limited to pre-defined concepts that reinforce the traditional linguistic knowledge and do not reflect on how novel concepts are learned by the model. We address th… ▽ More A large number of studies that analyze deep neural network models and their ability to encode various linguistic and non-linguistic concepts provide an interpretation of the inner mechanics of these models. The scope of the analyses is limited to pre-defined concepts that reinforce the traditional linguistic knowledge and do not reflect on how novel concepts are learned by the model. We address this limitation by discovering and analyzing latent concepts learned in neural network models in an unsupervised fashion and provide interpretations from the model's perspective. In this work, we study: i) what latent concepts exist in the pre-trained BERT model, ii) how the discovered latent concepts align or diverge from classical linguistic hierarchy and iii) how the latent concepts evolve across layers. Our findings show: i) a model learns novel concepts (e.g. animal categories and demographic groups), which do not strictly adhere to any pre-defined categorization (e.g. POS, semantic tags), ii) several latent concepts are based on multiple properties which may include semantics, syntax, and morphology, iii) the lower layers in the model dominate in learning shallow lexical concepts while the higher layers learn semantic relations and iv) the discovered latent concepts highlight potential biases learned in the model. We also release a novel BERT ConceptNet dataset (BCN) consisting of 174 concept labels and 1M annotated instances. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: ICLR 2022

arXiv:2204.01205 [pdf, other]

doi 10.1016/j.cageo.2023.105402

Model-Parallel Fourier Neural Operators as Learned Surrogates for Large-Scale Parametric PDEs

Authors: Thomas J. Grady II, Rishi Khan, Mathias Louboutin, Ziyi Yin, Philipp A. Witte, Ranveer Chandra, Russell J. Hewett, Felix J. Herrmann

Abstract: Fourier neural operators (FNOs) are a recently introduced neural network architecture for learning solution operators of partial differential equations (PDEs), which have been shown to perform significantly better than comparable deep learning approaches. Once trained, FNOs can achieve speed-ups of multiple orders of magnitude over conventional numerical PDE solvers. However, due to the high dimen… ▽ More Fourier neural operators (FNOs) are a recently introduced neural network architecture for learning solution operators of partial differential equations (PDEs), which have been shown to perform significantly better than comparable deep learning approaches. Once trained, FNOs can achieve speed-ups of multiple orders of magnitude over conventional numerical PDE solvers. However, due to the high dimensionality of their input data and network weights, FNOs have so far only been applied to two-dimensional or small three-dimensional problems. To remove this limited problem-size barrier, we propose a model-parallel version of FNOs based on domain-decomposition of both the input data and network weights. We demonstrate that our model-parallel FNO is able to predict time-varying PDE solutions of over 2.6 billion variables on Perlmutter using up to 512 A100 GPUs and show an example of training a distributed FNO on the Azure cloud for simulating multiphase CO$_2$ dynamics in the Earth's subsurface. △ Less

Submitted 1 February, 2023; v1 submitted 3 April, 2022; originally announced April 2022.

arXiv:2203.06721 [pdf]

Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning

Authors: Md. Shafaat Jamil Rokon, Md Kishor Morol, Ishra Binte Hasan, A. M. Saif, Rafid Hussain Khan

Abstract: Food is essential for human survival, and people always try to taste different types of delicious recipes. Frequently, people choose food ingredients without even knowing their names or pick up some food ingredients that are not obvious to them from a grocery store. Knowing which ingredients can be mixed to make a delicious food recipe is essential. Selecting the right recipe by choosing a list of… ▽ More Food is essential for human survival, and people always try to taste different types of delicious recipes. Frequently, people choose food ingredients without even knowing their names or pick up some food ingredients that are not obvious to them from a grocery store. Knowing which ingredients can be mixed to make a delicious food recipe is essential. Selecting the right recipe by choosing a list of ingredients is very difficult for a beginner cook. However, it can be a problem even for experts. One such example is recognising objects through image processing. Although this process is complex due to different food ingredients, traditional approaches will lead to an inaccuracy rate. These problems can be solved by machine learning and deep learning approaches. In this paper, we implemented a model for food ingredients recognition and designed an algorithm for recommending recipes based on recognised ingredients. We made a custom dataset consisting of 9856 images belonging to 32 different food ingredients classes. Convolution Neural Network (CNN) model was used to identify food ingredients, and for recipe recommendations, we have used machine learning. We achieved an accuracy of 94 percent, which is quite impressive. △ Less

Submitted 13 March, 2022; originally announced March 2022.

Comments: Accepted at ICCA 2022

arXiv:2203.03022 [pdf, ps, other]

HEAR: Holistic Evaluation of Audio Representations

Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. HEAR was launched as a NeurIPS 2021 shared challenge. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear. △ Less

Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

arXiv:2203.02791 [pdf, ps, other]

Deep Q-Learning Based Resource Allocation in Interference Systems With Outage Constraint

Authors: Saniul Alam, Sadia Islam, Muhammad R. A. Khandaker, Risala T. Khan, Faisal Tariq, Apriana Toding

Abstract: This correspondence considers the resource allocation problem in wireless interference channel (IC) under link outage constraints. Since the optimization problem is non-convex in nature, existing approaches to find the optimal power allocation are computationally intensive and thus practically infeasible. Recently, deep reinforcement learning has shown promising outcome in solving non-convex optim… ▽ More This correspondence considers the resource allocation problem in wireless interference channel (IC) under link outage constraints. Since the optimization problem is non-convex in nature, existing approaches to find the optimal power allocation are computationally intensive and thus practically infeasible. Recently, deep reinforcement learning has shown promising outcome in solving non-convex optimization problems with reduced complexity. In this correspondence, we utilize a deep Q-learning (DQL) approach which interacts with the wireless environment and learns the optimal power allocation of a wireless IC while maximizing overall sum-rate of the system and maintaining reliability requirement of each link. We have used two separate deep Q-networks to remove the inherent instability in learning process. Simulation results demonstrate that the proposed DQL approach outperforms existing geometric programming based solution. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: Submitted to IEEE TVT

arXiv:2203.01594 [pdf]

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

Authors: Rashid Khan, M Shujah Islam, Khadija Kanwal, Mansoor Iqbal, Md. Imran Hossain, Zhongfu Ye

Abstract: Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images. This study aims to develop a system that uses a pre-trained convolutional neural network (CNN) to extract features from an image, integrates the features with an attention mechanism, and creates captions using a recurrent neural network (RNN). To… ▽ More Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images. This study aims to develop a system that uses a pre-trained convolutional neural network (CNN) to extract features from an image, integrates the features with an attention mechanism, and creates captions using a recurrent neural network (RNN). To encode an image into a feature vector as graphical attributes, we employed multiple pre-trained convolutional neural networks. Following that, a language model known as GRU is chosen as the decoder to construct the descriptive sentence. In order to increase performance, we merge the Bahdanau attention model with GRU to allow learning to be focused on a specific portion of the image. On the MSCOCO dataset, the experimental results achieve competitive performance against state-of-the-art approaches. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: 16 PAGES, 8 figures, 1 TABLE

Journal ref: Information Technology and Control 2022

arXiv:2112.13170 [pdf, other]

On the Feasibility of 4.9 GHz Public Safety Band as Spectrum Option for Internet of Vehicles

Authors: Muhammad Faizan Rizwan Khan, Seungmo Kim

Abstract: There is an unprecedented impetus on the advancement of internet of vehicles (IoV). The vehicle-to-everything (V2X) communication is well acknowledged as the key technology in constitution of the IoV. Nevertheless, the spectrum for V2X communication is undergoing a massive change in the United States: a majority of the bandwidth has been reallocated to Wi-Fi leaving even less than a half of the ba… ▽ More There is an unprecedented impetus on the advancement of internet of vehicles (IoV). The vehicle-to-everything (V2X) communication is well acknowledged as the key technology in constitution of the IoV. Nevertheless, the spectrum for V2X communication is undergoing a massive change in the United States: a majority of the bandwidth has been reallocated to Wi-Fi leaving even less than a half of the bandwidth for V2X. This motivates investigation of other candidate spectrum bands for operation of V2X communication as an urgent effort to guarantee efficient operations of IoV. To this line, this paper studies the feasibility of sharing the 4.9 GHz public safety band between the incumbent systems and V2X users. △ Less

Submitted 24 December, 2021; originally announced December 2021.

arXiv:2112.08590 [pdf, other]

doi 10.1109/ACCESS.2022.3162851

Federated 3GPP Mobile Edge Computing Systems: A Transparent Proxy for Third Party Authentication with Application Mobility Support

Authors: Asad Ali, Samin Rahman Khan, Sadman Sakib, Md. Shohrab Hossain, Ying-Dar Lin

Abstract: Multi-Access or Mobile Edge Computing (MEC) is being deployed by 4G/5G operators to provide computational services at lower latencies. Federating MECs across operators expands capability, capacity, and coverage but gives rise to two issues - third-party authentication and application mobility - for continuous service during roaming without re-authentication. In this work, we propose a Federated St… ▽ More Multi-Access or Mobile Edge Computing (MEC) is being deployed by 4G/5G operators to provide computational services at lower latencies. Federating MECs across operators expands capability, capacity, and coverage but gives rise to two issues - third-party authentication and application mobility - for continuous service during roaming without re-authentication. In this work, we propose a Federated State transfer and 3rd-party Authentication (FS3A) mechanism that uses a transparent proxy to transfer the information of both authentication and application state across operators to resolve these issues. The FS3A proxy is kept transparent, with virtual counterparts, to avoid any changes to the existing MEC and cellular architectures. FS3A provides users with a token, when authenticated by an MEC, which can be reused across operators for faster authentication. Prefetching of subscription and state is also proposed to further reduce the authentication and application mobility latencies. We evaluated FS3A on an OpenAirInterface (OAI)-based testbed and the results show that token reuse and subscription prefetching reduce the authentication latency by 53-65%, compared to complete re-authentication, while state prefetching reduces application mobility latency by 51-91%, compared to no prefetching. Overall, FS3A reduces the service interruption time by 33%, compared to no token reuse and prefetching. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 14 pages. 8 figures. Submitted to IEEE Access

arXiv:2110.15742 [pdf, other]

Barlow Graph Auto-Encoder for Unsupervised Network Embedding

Authors: Rayyan Ahmad Khan, Martin Kleinsteuber

Abstract: Network embedding has emerged as a promising research field for network analysis. Recently, an approach, named Barlow Twins, has been proposed for self-supervised learning in computer vision by applying the redundancy-reduction principle to the embedding vectors corresponding to two distorted versions of the image samples. Motivated by this, we propose Barlow Graph Auto-Encoder, a simple yet effec… ▽ More Network embedding has emerged as a promising research field for network analysis. Recently, an approach, named Barlow Twins, has been proposed for self-supervised learning in computer vision by applying the redundancy-reduction principle to the embedding vectors corresponding to two distorted versions of the image samples. Motivated by this, we propose Barlow Graph Auto-Encoder, a simple yet effective architecture for learning network embedding. It aims to maximize the similarity between the embedding vectors of immediate and larger neighborhoods of a node, while minimizing the redundancy between the components of these projections. In addition, we also present the variation counterpart named as Barlow Variational Graph Auto-Encoder. Our approach yields promising results for inductive link prediction and is also on par with state of the art for clustering and downstream node classification, as demonstrated by extensive comparisons with several well-known techniques on three benchmark citation datasets. △ Less

Submitted 13 December, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.00942 [pdf, other]

doi 10.1016/j.compbiomed.2022.105221

Artificial Intelligence For Breast Cancer Detection: Trends & Directions

Authors: Shahid Munir Shah, Rizwan Ahmed Khan, Sheeraz Arif, Unaiza Sajid

Abstract: In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware t… ▽ More In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware that can train those robust and complex AI algorithms and accessibility of large enough dataset required for training AI algorithms. Different imaging modalities that have been exploited by researchers to automate the task of breast cancer detection are mammograms, ultrasound, magnetic resonance imaging, histopathological images or any combination of them. This article analyzes these imaging modalities and presents their strengths, limitations and enlists resources from where their datasets can be accessed for research purpose. This article then summarizes AI and computer vision based state-of-the-art methods proposed in the last decade, to detect breast cancer using various imaging modalities. Generally, in this article we have focused on to review frameworks that have reported results using mammograms as it is most widely used breast imaging modality that serves as first test that medical practitioners usually prescribe for the detection of breast cancer. Second reason of focusing on mammogram imaging modalities is the availability of its labeled datasets. Datasets availability is one of the most important aspect for the development of AI based frameworks as such algorithms are data hungry and generally quality of dataset affects performance of AI based algorithms. In a nutshell, this research article will act as a primary resource for the research community working in the field of automated breast imaging analysis. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Journal ref: Computers in Biology and Medicine 2022

arXiv:2109.14197 [pdf]

Context based Roman-Urdu to Urdu Script Transliteration System

Authors: H Muhammad Shakeel, Rashid Khan, Muhammad Waheed

Abstract: Now a day computer is necessary for human being and it is very useful in many fields like search engine, text processing, short messaging services, voice chatting and text recognition. Since last many years there are many tools and techniques that have been developed to support the writing of language script. Most of the Asian languages like Arabic, Urdu, Persian, Chains and Korean are written in… ▽ More Now a day computer is necessary for human being and it is very useful in many fields like search engine, text processing, short messaging services, voice chatting and text recognition. Since last many years there are many tools and techniques that have been developed to support the writing of language script. Most of the Asian languages like Arabic, Urdu, Persian, Chains and Korean are written in Roman alphabets. Roman alphabets are the most commonly used for transliteration of languages, which have non-Latin scripts. For writing Urdu characters as an input, there are many layouts which are already exist. Mostly Urdu speaker prefer to use Roman-Urdu for different applications, because mostly user is not familiar with Urdu language keyboard. The objective of this work is to improve the context base transliteration of Roman-Urdu to Urdu script. In this paper, we propose an algorithm which effectively solve the transliteration issues. The algorithm work like, convert the encoding roman words into the words in the standard Urdu script and match it with the lexicon. If match found, then display the word in the text editor. The highest frequency words are displayed if more than one match found in the lexicon. Display the first encoded and converted instance and set it to the default if there is not a single instance of the match is found and then adjust the given ambiguous word to their desire location according to their context. The outcome of this algorithm proved the efficiency and significance as compare to other models and algorithms which work for transliteration of Raman-Urdu to Urdu on context. △ Less

Submitted 29 September, 2021; originally announced September 2021.

arXiv:2109.04653 [pdf, other]

Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation

Authors: Humair Raj Khan, Deepak Gupta, Asif Ekbal

Abstract: Pre-trained language-vision models have shown remarkable performance on the visual question answering (VQA) task. However, most pre-trained models are trained by only considering monolingual learning, especially the resource-rich language like English. Training such models for multilingual setups demand high computing resources and multilingual language-vision dataset which hinders their applicati… ▽ More Pre-trained language-vision models have shown remarkable performance on the visual question answering (VQA) task. However, most pre-trained models are trained by only considering monolingual learning, especially the resource-rich language like English. Training such models for multilingual setups demand high computing resources and multilingual language-vision dataset which hinders their application in practice. To alleviate these challenges, we propose a knowledge distillation approach to extend an English language-vision model (teacher) into an equally effective multilingual and code-mixed model (student). Unlike the existing knowledge distillation methods, which only use the output from the last layer of the teacher network for distillation, our student model learns and imitates the teacher from multiple intermediate layers (language and vision encoders) with appropriately designed distillation objectives for incremental knowledge extraction. We also create the large-scale multilingual and code-mixed VQA dataset in eleven different language setups considering the multiple Indian and European languages. Experimental results and in-depth analysis show the effectiveness of the proposed VQA model over the pre-trained language-vision models on eleven diverse language setups. △ Less

Submitted 9 September, 2021; originally announced September 2021.

Comments: Accepted in EMNLP-Findings (2021)

arXiv:2108.03953 [pdf, other]

A Framework for Joint Unsupervised Learning of Cluster-Aware Embedding for Heterogeneous Networks

Authors: Rayyan Ahmad Khan, Martin Kleinsteuber

Abstract: Heterogeneous Information Network (HIN) embedding refers to the low-dimensional projections of the HIN nodes that preserve the HIN structure and semantics. HIN embedding has emerged as a promising research field for network analysis as it enables downstream tasks such as clustering and node classification. In this work, we propose \ours for joint learning of cluster embeddings as well as cluster-a… ▽ More Heterogeneous Information Network (HIN) embedding refers to the low-dimensional projections of the HIN nodes that preserve the HIN structure and semantics. HIN embedding has emerged as a promising research field for network analysis as it enables downstream tasks such as clustering and node classification. In this work, we propose \ours for joint learning of cluster embeddings as well as cluster-aware HIN embedding. We assume that the connected nodes are highly likely to fall in the same cluster, and adopt a variational approach to preserve the information in the pairwise relations in a cluster-aware manner. In addition, we deploy contrastive modules to simultaneously utilize the information in multiple meta-paths, thereby alleviating the meta-path selection problem - a challenge faced by many of the famous HIN embedding approaches. The HIN embedding, thus learned, not only improves the clustering performance but also preserves pairwise proximity as well as the high-order HIN structure. We show the effectiveness of our approach by comparing it with many competitive baselines on three real-world datasets on clustering and downstream node classification. △ Less

Submitted 9 August, 2021; originally announced August 2021.

arXiv:2108.02899 [pdf, other]

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Authors: Amit Gupte, Alexey Romanov, Sahitya Mantravadi, Dalitso Banda, Jianjie Liu, Raza Khan, Lakshmanan Ramu Meenal, Benjamin Han, Soundar Srinivasan

Abstract: Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, us… ▽ More Document digitization is essential for the digital transformation of our societies, yet a crucial step in the process, Optical Character Recognition (OCR), is still not perfect. Even commercial OCR systems can produce questionable output depending on the fidelity of the scanned documents. In this paper, we demonstrate an effective framework for mitigating OCR errors for any downstream NLP task, using Named Entity Recognition (NER) as an example. We first address the data scarcity problem for model training by constructing a document synthesis pipeline, generating realistic but degraded data with NER labels. We measure the NER accuracy drop at various degradation levels and show that a text restoration model, trained on the degraded data, significantly closes the NER accuracy gaps caused by OCR errors, including on an out-of-domain dataset. For the benefit of the community, we have made the document synthesis pipeline available as an open-source project. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted to the Document Intelligence Workshop at KDD 2021. The source code of Genalog is available at https://github.com/microsoft/genalog

arXiv:2106.13456 [pdf, other]

Interpreting Criminal Charge Prediction and Its Algorithmic Bias via Quantum-Inspired Complex Valued Networks

Authors: Abdul Rafae Khan, Jia Xu, Peter Varsanyi, Rachit Pabreja

Abstract: While predictive policing has become increasingly common in assisting with decisions in the criminal justice system, the use of these results is still controversial. Some software based on deep learning lacks accuracy (e.g., in F-1), and importantly many decision processes are not transparent, causing doubt about decision bias, such as perceived racial and age disparities. This paper addresses bia… ▽ More While predictive policing has become increasingly common in assisting with decisions in the criminal justice system, the use of these results is still controversial. Some software based on deep learning lacks accuracy (e.g., in F-1), and importantly many decision processes are not transparent, causing doubt about decision bias, such as perceived racial and age disparities. This paper addresses bias issues with post-hoc explanations to provide a trustable prediction of whether a person will receive future criminal charges given one's previous criminal records by learning temporal behavior patterns over twenty years. Bi-LSTM relieves the vanishing gradient problem, attentional mechanisms allow learning and interpretation of feature importance, and complex-valued networks inspired quantum physics to facilitate a certain level of transparency in modeling the decision process. Our approach shows a consistent and reliable prediction precision and recall on a real-life dataset. Our analysis of the importance of each input feature shows the critical causal impact on decision-making, suggesting that criminal histories are statistically significant factors, while identifiers, such as race and age, are not. Finally, our algorithm indicates that a suspect tends to rather than suddenly increase crime severity level over time gradually. △ Less

Submitted 13 July, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: First two authors alphabetically ordered

arXiv:2105.01316 [pdf]

Technology Review of Blockchain Data Privacy Solutions

Authors: Jack Tanner, Roshaan Khan

Abstract: This objective of this report is to review existing enterprise blockchain technologies - EOSIO powered systems, Hyperledger Fabric and Besu, Consensus Quorum, R3 Corda and Ernst and Young's Nightfall - that provide data privacy while leveraging the data integrity benefits of blockchain. By reviewing and comparing how and how well these technologies achieve data privacy, a snapshot is captured of t… ▽ More This objective of this report is to review existing enterprise blockchain technologies - EOSIO powered systems, Hyperledger Fabric and Besu, Consensus Quorum, R3 Corda and Ernst and Young's Nightfall - that provide data privacy while leveraging the data integrity benefits of blockchain. By reviewing and comparing how and how well these technologies achieve data privacy, a snapshot is captured of the industry's current best practices and data privacy models. Major enterprise technologies are contrasted in parallel to EOSIO to better understand how EOSIO can evolve to meet the trends seen in enterprise blockchain privacy. The following strategies and trends were generally observed in these technologies: Cryptography: the hashing algorithm was found to be the most used cryptographic primitive in enterprise or changeover privacy solutions. Coordination via on-chain contracts - a common strategy was to use a shared publicly ledger to coordinate data privacy groups and more generally managed identities and access control. Transaction and contract code sharing: there was a variety of different levels of privacy around the business logic (smart contract code) visibility. Some solutions only allowed authorised peers to view code while others made this accessible to everybody that was a member of the shared ledger. Data migrations for data privacy applications: significant challenges exist when using cryptographically stored data in terms of being able to run system upgrades. Multiple blockchain ledgers for data privacy: solutions attempted to create a new private blockchain for every private data relationship which was eventually abandoned in favour of one shared ledger with private data collections/transactions that were anchored to the ledger with a hash in order to improve scaling. △ Less

Submitted 4 May, 2021; originally announced May 2021.

ACM Class: C.2.2; E.3

arXiv:2103.01322 [pdf, ps, other]

Thinking Out of the Blocks: Holochain for Distributed Security in IoT Healthcare

Authors: Shakila Zaman, Muhammad R. A. Khandaker, Risala T. Khan, Faisal Tariq, Kai-Kit Wong

Abstract: The Internet-of-Things (IoT) is an emerging and cognitive technology which connects a massive number of smart physical devices with virtual objects operating in diverse platforms through the internet. IoT is increasingly being implemented in distributed settings, making footprints in almost every sector of our life. Unfortunately, for healthcare systems, the entities connected to the IoT networks… ▽ More The Internet-of-Things (IoT) is an emerging and cognitive technology which connects a massive number of smart physical devices with virtual objects operating in diverse platforms through the internet. IoT is increasingly being implemented in distributed settings, making footprints in almost every sector of our life. Unfortunately, for healthcare systems, the entities connected to the IoT networks are exposed to an unprecedented level of security threats. Relying on a huge volume of sensitive and personal data, IoT healthcare systems are facing unique challenges in protecting data security and privacy. Although blockchain has posed to be the solution in this scenario thanks to its inherent distributed ledger technology (DLT), it suffers from major setbacks of increasing storage and computation requirements with the network size. This paper proposes a holochain-based security and privacy-preserving framework for IoT healthcare systems that overcomes these challenges and is particularly suited for resource constrained IoT scenarios. The performance and thorough security analyses demonstrate that a holochain-based IoT healthcare system is significantly better compared to blockchain and other existing systems. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Submitted to IEEE

arXiv:2101.03885 [pdf, other]

Variational Embeddings for Community Detection and Node Representation

Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Omran Kaddah, Martin Kleinsteuber

Abstract: In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddi… ▽ More In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddings are learned in such a way that connected nodes are not only "closer" to each other but also share similar community assignments. A joint learning framework leverages community-aware node embeddings for better community detection. We demonstrate on several graph datasets that VECoDeR effectively out-performs many competitive baselines on all three tasks i.e. node classification, overlapping community detection and non-overlapping community detection. We also show that VECoDeR is computationally efficient and has quite robust performance with varying hyperparameters. △ Less

Submitted 11 January, 2021; originally announced January 2021.

Showing 1–50 of 129 results for author: Khan, R