subscribe to arXiv mailings

Hybrid Approach to Parallel Stochastic Gradient Descent

Authors: Aakash Sudhirbhai Vora, Dhrumil Chetankumar Joshi, Aksh Kantibhai Patel

Abstract: Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We pr… ▽ More Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2406.00182 [pdf, other]

Chiplets on Wheels: Review Paper on Holistic Chiplet Solutions for Autonomous Vehicles

Authors: Swathi Narashiman, Venkat A, Divyaratna Joshi, Deepak Sridhar, Harish Rajesh, Sanjay Sattva, Aniruddha S, Jayanth B, Varun Manjunath, Ragavendiran N

Abstract: On the advent of the slow death of Moore's law, the silicon industry is moving towards a new era of chiplets. The automotive industry is experiencing a profound transformation towards software-defined vehicles, fueled by the surging demand for automotive compute chips, expected to reach 20-22 billion by 2030. High-performance compute (HPC) chips become instrumental in meeting the soaring demand fo… ▽ More On the advent of the slow death of Moore's law, the silicon industry is moving towards a new era of chiplets. The automotive industry is experiencing a profound transformation towards software-defined vehicles, fueled by the surging demand for automotive compute chips, expected to reach 20-22 billion by 2030. High-performance compute (HPC) chips become instrumental in meeting the soaring demand for computational power. Various strategies, including centralized electrical and electronic architecture and the innovative Chiplet Systems, are under exploration. The latter, breaking down System-on-Chips (SoCs) into functional units, offers unparalleled customization and integration possibilities. The research accentuates the crucial open Chiplet ecosystem, fostering collaboration and enhancing supply chain resilience. In this paper, we address the unique challenges that arise when attempting to leverage chiplet-based architecture to design a holistic silicon solution for the automotive industry. We propose a throughput-oriented micro-architecture for ADAS and infotainment systems alongside a novel methodology to evaluate chiplet architectures. Further, we develop in-house simulation tools leveraging the gem5 framework to simulate latency and throughput. Finally, we perform an extensive design of thermally-aware chiplet placement and develop a micro-fluids-based cooling design. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2403.12173 [pdf, other]

TnT-LLM: Text Mining at Scale with Large Language Models

Authors: Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

Abstract: Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. Thi… ▽ More Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 9 pages main content, 8 pages references and appendix

arXiv:2403.05316 [pdf]

Direction of slip modulates the perception of slip distance and slip speed

Authors: Ayesha Tooba Khan, Deepak Joshi, Biswarup Mukherjee

Abstract: Purpose: The purpose of this study was to investigate the psychophysical understanding of the slip stimulus. We emphasized that the perception of slip and its characteristics, such as slip distance and slip speed depend on the interaction between slip direction, slip distance as well as slip speed. Methods: We developed a novel slip induction device to simulate the artificial sense of slip. We con… ▽ More Purpose: The purpose of this study was to investigate the psychophysical understanding of the slip stimulus. We emphasized that the perception of slip and its characteristics, such as slip distance and slip speed depend on the interaction between slip direction, slip distance as well as slip speed. Methods: We developed a novel slip induction device to simulate the artificial sense of slip. We conducted a psychophysical experiment on eight healthy subjects. The experiment was designed to evaluate the effect of slip direction on slip perception as well as on the perception of slip distance and slip speed. A series of psychophysical questions were asked at the end of the slip stimulation to record the subjective responses of the participants. The average success rate (%) was used to quantify the subject responses. Results: We demonstrated that the perception of slip is independent of slip direction however, perception of slip distance and slip speed are significantly modulated by slip direction. We also observed that a significant interaction exists between slip distance and slip speed in the upward slip direction. It was also observed that the average success rate was significantly different for various combinations of slip distance and slip speed in the upward slip direction. Conclusions: Our study clearly establishes a significant interaction between the slip direction, slip distance, and slip speed for psychophysical understanding of the perception of slip distance and slip speed. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.03311 [pdf, other]

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

Authors: Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Abstract: The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects. Drawing inspiration from these two abilities, we propose Hierarchical Adaptive Self-Supervised Object Detection (HASSOD), a novel approach that learns to detect objects and understand their compositions without human supervisi… ▽ More The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects. Drawing inspiration from these two abilities, we propose Hierarchical Adaptive Self-Supervised Object Detection (HASSOD), a novel approach that learns to detect objects and understand their compositions without human supervision. HASSOD employs a hierarchical adaptive clustering strategy to group regions into object masks based on self-supervised visual representations, adaptively determining the number of objects per image. Furthermore, HASSOD identifies the hierarchical levels of objects in terms of composition, by analyzing coverage relations between masks and constructing tree structures. This additional self-supervised learning task leads to improved detection performance and enhanced interpretability. Lastly, we abandon the inefficient multi-round self-training process utilized in prior methods and instead adapt the Mean Teacher framework from semi-supervised learning, which leads to a smoother and more efficient training process. Through extensive experiments on prevalent image datasets, we demonstrate the superiority of HASSOD over existing methods, thereby advancing the state of the art in self-supervised object detection. Notably, we improve Mask AR from 20.2 to 22.5 on LVIS, and from 17.0 to 26.0 on SA-1B. Project page: https://HASSOD-NeurIPS23.github.io. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: NeurIPS 2023

arXiv:2312.14346 [pdf, other]

Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models

Authors: Priyesh Vakharia, Devavrat Joshi, Meenal Chavan, Dhananjay Sonawane, Bhrigu Garg, Parsa Mazaheri

Abstract: Large Language Models (LLMs) are adept at text manipulation -- tasks such as machine translation and text summarization. However, these models can also be prone to hallucination, which can be detrimental to the faithfulness of any answers that the model provides. Recent works in combating hallucinations in LLMs deal with identifying hallucinated sentences and categorizing the different ways in whi… ▽ More Large Language Models (LLMs) are adept at text manipulation -- tasks such as machine translation and text summarization. However, these models can also be prone to hallucination, which can be detrimental to the faithfulness of any answers that the model provides. Recent works in combating hallucinations in LLMs deal with identifying hallucinated sentences and categorizing the different ways in which models hallucinate. This paper takes a deep dive into LLM behavior with respect to hallucinations, defines a token-level approach to identifying different kinds of hallucinations, and further utilizes this token-level tagging to improve the interpretability and faithfulness of LLMs in dialogue summarization tasks. Through this, the paper presents a new, enhanced dataset and a new training paradigm. △ Less

Submitted 2 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: All authors contributed equally to this work

arXiv:2312.13427 [pdf, other]

doi 10.1145/3626762

R2D2: Reducing Redundancy and Duplication in Data Lakes

Authors: Raunak Shah, Koyel Mukherjee, Atharv Tyagi, Sai Keerthana Karnam, Dhruv Joshi, Shivam Bhosale, Subrata Mitra

Abstract: Enterprise data lakes often suffer from substantial amounts of duplicate and redundant data, with data volumes ranging from terabytes to petabytes. This leads to both increased storage costs and unnecessarily high maintenance costs for these datasets. In this work, we focus on identifying and reducing redundancy in enterprise data lakes by addressing the problem of 'dataset containment'. To the be… ▽ More Enterprise data lakes often suffer from substantial amounts of duplicate and redundant data, with data volumes ranging from terabytes to petabytes. This leads to both increased storage costs and unnecessarily high maintenance costs for these datasets. In this work, we focus on identifying and reducing redundancy in enterprise data lakes by addressing the problem of 'dataset containment'. To the best of our knowledge, this is one of the first works that addresses table-level containment at a large scale. We propose R2D2: a three-step hierarchical pipeline that efficiently identifies almost all instances of containment by progressively reducing the search space in the data lake. It first builds (i) a schema containment graph, followed by (ii) statistical min-max pruning, and finally, (iii) content level pruning. We further propose minimizing the total storage and access costs by optimally identifying redundant datasets that can be deleted (and reconstructed on demand) while respecting latency constraints. We implement our system on Azure Databricks clusters using Apache Spark for enterprise data stored in ADLS Gen2, and on AWS clusters for open-source data. In contrast to existing modified baselines that are inaccurate or take several days to run, our pipeline can process an enterprise customer data lake at the TB scale in approximately 5 hours with high accuracy. We present theoretical results as well as extensive empirical validation on both enterprise (scale of TBs) and open-source datasets (scale of MBs - GBs), which showcase the effectiveness of our pipeline. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: The first two authors contributed equally. 25 pages, accepted to the International Conference on Management of Data (SIGMOD) 2024. ©Raunak Shah | ACM 2023. This is the author's version of the work. Not for redistribution. The definitive Version of Record was published in Proceedings of the ACM on Management of Data (PACMMOD), http://dx.doi.org/10.1145/3626762

Journal ref: Proc. ACM Manag. Data 1, 4, Article 268 (December 2023), 25 pages

arXiv:2310.16388 [pdf, other]

Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles

Authors: Aagam Bakliwal, Amit D. Joshi

Abstract: In the dynamic realm of deepfake detection, this work presents an innovative approach to validate video content. The methodology blends advanced 2-dimensional and 3-dimensional Convolutional Neural Networks. The 3D model is uniquely tailored to capture spatiotemporal features via sliding filters, extending through both spatial and temporal dimensions. This configuration enables nuanced pattern rec… ▽ More In the dynamic realm of deepfake detection, this work presents an innovative approach to validate video content. The methodology blends advanced 2-dimensional and 3-dimensional Convolutional Neural Networks. The 3D model is uniquely tailored to capture spatiotemporal features via sliding filters, extending through both spatial and temporal dimensions. This configuration enables nuanced pattern recognition in pixel arrangement and temporal evolution across frames. Simultaneously, the 2D model leverages EfficientNet architecture, harnessing auto-scaling in Convolutional Neural Networks. Notably, this ensemble integrates Voting Ensembles and Adaptive Weighted Ensembling. Strategic prioritization of the 3-dimensional model's output capitalizes on its exceptional spatio-temporal feature extraction. Experimental validation underscores the effectiveness of this strategy, showcasing its potential in countering deepfake generation's deceptive practices. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 6 pages, 2 figures

arXiv:2308.02778 [pdf, other]

Unveiling Emotions from EEG: A GRU-Based Approach

Authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi

Abstract: One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exp… ▽ More One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity. △ Less

Submitted 20 July, 2023; originally announced August 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2308.02437

arXiv:2308.02437 [pdf, other]

Noise removal methods on ambulatory EEG: A Survey

Authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi

Abstract: Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers… ▽ More Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers have been discussed to discern the techniques for detecting and removal the ambulatory EEG. Further, the literature survey shows that the pattern recognition required to detect ambulatory method, eye open and close, varies with different conditions of EEG datasets. This is mainly due to the fact that EEG detected under different conditions has different characteristics. This is, in turn, necessitates the identification of pattern recognition technique to effectively distinguish EEG noise data from a various condition of EEG data. △ Less

Submitted 16 July, 2023; originally announced August 2023.

arXiv:2306.06084 [pdf]

doi 10.1109/M2VIP55626.2022.10041089

Machine Vision Using Cellphone Camera: A Comparison of deep networks for classifying three challenging denominations of Indian Coins

Authors: Keyur D. Joshi, Dhruv Shah, Varshil Shah, Nilay Gandhi, Sanket J. Shah, Sanket B. Shah

Abstract: Indian currency coins come in a variety of denominations. Off all the varieties Rs.1, RS.2, and Rs.5 have similar diameters. Majority of the coin styles in market circulation for denominations of Rs.1 and Rs.2 coins are nearly the same except for numerals on its reverse side. If a coin is resting on its obverse side, the correct denomination is not distinguishable by humans. Therefore, it was hypo… ▽ More Indian currency coins come in a variety of denominations. Off all the varieties Rs.1, RS.2, and Rs.5 have similar diameters. Majority of the coin styles in market circulation for denominations of Rs.1 and Rs.2 coins are nearly the same except for numerals on its reverse side. If a coin is resting on its obverse side, the correct denomination is not distinguishable by humans. Therefore, it was hypothesized that a digital image of a coin resting on its either size could be classified into its correct denomination by training a deep neural network model. The digital images were generated by using cheap cell phone cameras. To find the most suitable deep neural network architecture, four were selected based on the preliminary analysis carried out for comparison. The results confirm that two of the four deep neural network models can classify the correct denomination from either side of a coin with an accuracy of 97%. △ Less

Submitted 12 May, 2023; originally announced June 2023.

Comments: 6 Pages, 4 Figures, 6 Tables, Conference paper

arXiv:2305.05420 [pdf]

doi 10.1007/978-981-19-5224-1_63

Estimating related words computationally using language model from the Mahabharata -- an Indian epic

Authors: Vrunda Gadesha, Keyur D Joshi, Shefali Naik

Abstract: 'Mahabharata' is the most popular among many Indian pieces of literature referred to in many domains for completely different purposes. This text itself is having various dimension and aspects which is useful for the human being in their personal life and professional life. This Indian Epic is originally written in the Sanskrit Language. Now in the era of Natural Language Processing, Artificial In… ▽ More 'Mahabharata' is the most popular among many Indian pieces of literature referred to in many domains for completely different purposes. This text itself is having various dimension and aspects which is useful for the human being in their personal life and professional life. This Indian Epic is originally written in the Sanskrit Language. Now in the era of Natural Language Processing, Artificial Intelligence, Machine Learning, and Human-Computer interaction this text can be processed according to the domain requirement. It is interesting to process this text and get useful insights from Mahabharata. The limitation of the humans while analyzing Mahabharata is that they always have a sentiment aspect towards the story narrated by the author. Apart from that, the human cannot memorize statistical or computational details, like which two words are frequently coming in one sentence? What is the average length of the sentences across the whole literature? Which word is the most popular word across the text, what are the lemmas of the words used across the sentences? Thus, in this paper, we propose an NLP pipeline to get some statistical and computational insights along with the most relevant word searching method from the largest epic 'Mahabharata'. We stacked the different text-processing approaches to articulate the best results which can be further used in the various domain where Mahabharata needs to be referred. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: ICT Analysis and Applications: Proceedings of ICT4SD 2022 pp 627-638

arXiv:2305.03034 [pdf, other]

Contrastive Mean Teacher for Domain Adaptive Object Detectors

Authors: Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Abstract: Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain). Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive lea… ▽ More Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain). Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. In this work, we identify the intriguing alignment and synergy between mean-teacher self-training and contrastive learning. Motivated by this, we propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals. Instead of using pseudo-labels solely for final predictions, our strategy extracts object-level features using pseudo-labels and optimizes them via contrastive learning, without requiring labels in the target domain. When combined with recent mean-teacher self-training methods, CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP. Notably, CMT can stabilize performance and provide more significant gains as pseudo-label noise increases. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: CVPR 2023

arXiv:2304.04824 [pdf, other]

Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning

Authors: Hanjing Wang, Dhiraj Joshi, Shiqiang Wang, Qiang Ji

Abstract: Predictions made by deep learning models are prone to data perturbations, adversarial attacks, and out-of-distribution inputs. To build a trusted AI system, it is therefore critical to accurately quantify the prediction uncertainties. While current efforts focus on improving uncertainty quantification accuracy and efficiency, there is a need to identify uncertainty sources and take actions to miti… ▽ More Predictions made by deep learning models are prone to data perturbations, adversarial attacks, and out-of-distribution inputs. To build a trusted AI system, it is therefore critical to accurately quantify the prediction uncertainties. While current efforts focus on improving uncertainty quantification accuracy and efficiency, there is a need to identify uncertainty sources and take actions to mitigate their effects on predictions. Therefore, we propose to develop explainable and actionable Bayesian deep learning methods to not only perform accurate uncertainty quantification but also explain the uncertainties, identify their sources, and propose strategies to mitigate the uncertainty impacts. Specifically, we introduce a gradient-based uncertainty attribution method to identify the most problematic regions of the input that contribute to the prediction uncertainty. Compared to existing methods, the proposed UA-Backprop has competitive accuracy, relaxed assumptions, and high efficiency. Moreover, we propose an uncertainty mitigation strategy that leverages the attribution results as attention to further improve the model performance. Both qualitative and quantitative evaluations are conducted to demonstrate the effectiveness of our proposed methods. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted to CVPR 2023

arXiv:2302.02137 [pdf, other]

FedSpectral+: Spectral Clustering using Federated Learning

Authors: Janvi Thakkar, Devvrat Joshi

Abstract: Clustering in graphs has been a well-known research problem, particularly because most Internet and social network data is in the form of graphs. Organizations widely use spectral clustering algorithms to find clustering in graph datasets. However, applying spectral clustering to a large dataset is challenging due to computational overhead. While the distributed spectral clustering algorithm exist… ▽ More Clustering in graphs has been a well-known research problem, particularly because most Internet and social network data is in the form of graphs. Organizations widely use spectral clustering algorithms to find clustering in graph datasets. However, applying spectral clustering to a large dataset is challenging due to computational overhead. While the distributed spectral clustering algorithm exists, they face the problem of data privacy and increased communication costs between the clients. Thus, in this paper, we propose a spectral clustering algorithm using federated learning (FL) to overcome these issues. FL is a privacy-protecting algorithm that accumulates model parameters from each local learner rather than collecting users' raw data, thus providing both scalability and data privacy. We developed two approaches: FedSpectral and FedSpectral+. FedSpectral is a baseline approach that uses local spectral clustering labels to aggregate the global spectral clustering by creating a similarity graph. FedSpectral+, a state-of-the-art approach, uses the power iteration method to learn the global spectral embedding by incorporating the entire graph data without access to the raw information distributed among the clients. We further designed our own similarity metric to check the clustering quality of the distributed approach to that of the original/non-FL clustering. The proposed approach FedSpectral+ obtained a similarity of 98.85% and 99.8%, comparable to that of global clustering on the ego-Facebook and email-Eu-core dataset. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: Accepted at GCLR Workshop, AAAI 2023

arXiv:2301.03834 [pdf, other]

HQAlign: Aligning nanopore reads for SV detection using current-level modeling

Authors: Dhaivat Joshi, Suhas Diggavi, Mark J. P. Chaisson, Sreeram Kannan

Abstract: Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing… ▽ More Motivation: Detection of structural variants (SV) from the alignment of sample DNA reads to the reference genome is an important problem in understanding human diseases. Long reads that can span repeat regions, along with an accurate alignment of these long reads play an important role in identifying novel SVs. Long read sequencers such as nanopore sequencing can address this problem by providing very long reads but with high error rates, making accurate alignment challenging. Many errors induced by nanopore sequencing have a bias because of the physics of the sequencing process and proper utilization of these error characteristics can play an important role in designing a robust aligner for SV detection problems. In this paper, we design and evaluate HQAlign, an aligner for SV detection using nanopore sequenced reads. The key ideas of HQAlign include (i) using basecalled nanopore reads along with the nanopore physics to improve alignments for SVs (ii) incorporating SV specific changes to the alignment pipeline (iii) adapting these into existing state-of-the-art long read aligner pipeline, minimap2 (v2.24), for efficient alignments. Results: We show that HQAlign captures about 4%-6% complementary SVs across different datasets which are missed by minimap2 alignments while having a standalone performance at par with minimap2 for real nanopore reads data. For the common SV calls between HQAlign and minimap2, HQAlign improves the start and the end breakpoint accuracy for about 10%-50% of SVs across different datasets. Moreover, HQAlign improves the alignment rate to 89.35% from minimap2 85.64% for nanopore reads alignment to recent telomere-to-telomere CHM13 assembly, and it improves to 86.65% from 83.48% for nanopore reads alignment to GRCh37 human genome. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.02896 [pdf, other]

k-Means SubClustering: A Differentially Private Algorithm with Improved Clustering Quality

Authors: Devvrat Joshi, Janvi Thakkar

Abstract: In today's data-driven world, the sensitivity of information has been a significant concern. With this data and additional information on the person's background, one can easily infer an individual's private data. Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from these inference attacks. The existing approaches adapt… ▽ More In today's data-driven world, the sensitivity of information has been a significant concern. With this data and additional information on the person's background, one can easily infer an individual's private data. Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from these inference attacks. The existing approaches adapt the method to compute differentially private(DP) centroids by iterative Llyod's algorithm and perturbing the centroid with various DP mechanisms. These DP mechanisms do not guarantee convergence of differentially private iterative algorithms and degrade the quality of the cluster. Thus, in this work, we further extend the previous work on 'Differentially Private k-Means Clustering With Convergence Guarantee' by taking it as our baseline. The novelty of our approach is to sub-cluster the clusters and then select the centroid which has a higher probability of moving in the direction of the future centroid. At every Lloyd's step, the centroids are injected with the noise using the exponential DP mechanism. The results of the experiments indicate that our approach outperforms the current state-of-the-art method, i.e., the baseline algorithm, in terms of clustering quality while maintaining the same differential privacy requirements. The clustering quality significantly improved by 4.13 and 2.83 times than baseline for the Wine and Breast_Cancer dataset, respectively. △ Less

Submitted 7 January, 2023; originally announced January 2023.

Comments: Accepted at PAS Workshop at CIKM 2022

arXiv:2211.13194 [pdf]

Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation

Authors: Siddharth Agrawal, Keyur D. Joshi

Abstract: Detection and recognition of a licence plate is important when automating weighbridge services. While many large databases are available for Latin and Chinese alphanumeric license plates, data for Indian License Plates is inadequate. In particular, databases of Indian commercial truck license plates are inadequate, despite the fact that commercial vehicle license plate recognition plays a profound… ▽ More Detection and recognition of a licence plate is important when automating weighbridge services. While many large databases are available for Latin and Chinese alphanumeric license plates, data for Indian License Plates is inadequate. In particular, databases of Indian commercial truck license plates are inadequate, despite the fact that commercial vehicle license plate recognition plays a profound role in terms of logistics management and weighbridge automation. Moreover, models to recognise license plates are not effectively able to generalise to such data due to its challenging nature, and due to the abundant frequency of handwritten license plates, leading to the usage of diverse font styles. Thus, a database and effective models to recognise and detect such license plates are crucial. This paper provides a database on commercial truck license plates, and using state-of-the-art models in real-time object Detection: You Only Look Once Version 7, and SceneText Recognition: Permuted Autoregressive Sequence Models, our method outperforms the other cited references where the maximum accuracy obtained was less than 90%, while we have achieved 95.82% accuracy in our algorithm implementation on the presented challenging license plate dataset. Index Terms- Automatic License Plate Recognition, character recognition, license plate detection, vision transformer. △ Less

Submitted 22 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2209.02609 [pdf, other]

Merged-GHCIDR: Geometrical Approach to Reduce Image Data

Authors: Devvrat Joshi, Janvi Thakkar, Siddharth Soni, Shril Mody, Rohan Patil, Nipun Batra

Abstract: The computational resources required to train a model have been increasing since the inception of deep networks. Training neural networks on massive datasets have become a challenging and time-consuming task. So, there arises a need to reduce the dataset without compromising the accuracy. In this paper, we present novel variations of an earlier approach called reduction through homogeneous cluster… ▽ More The computational resources required to train a model have been increasing since the inception of deep networks. Training neural networks on massive datasets have become a challenging and time-consuming task. So, there arises a need to reduce the dataset without compromising the accuracy. In this paper, we present novel variations of an earlier approach called reduction through homogeneous clustering for reducing dataset size. The proposed methods are based on the idea of partitioning the dataset into homogeneous clusters and selecting images that contribute significantly to the accuracy. We propose two variations: Geometrical Homogeneous Clustering for Image Data Reduction (GHCIDR) and Merged-GHCIDR upon the baseline algorithm - Reduction through Homogeneous Clustering (RHC) to achieve better accuracy and training time. The intuition behind GHCIDR involves selecting data points by cluster weights and geometrical distribution of the training set. Merged-GHCIDR involves merging clusters having the same labels using complete linkage clustering. We used three deep learning models- Fully Connected Networks (FCN), VGG1, and VGG16. We experimented with the two variants on four datasets- MNIST, CIFAR10, Fashion-MNIST, and Tiny-Imagenet. Merged-GHCIDR with the same percentage reduction as RHC showed an increase of 2.8%, 8.9%, 7.6% and 3.5% accuracy on MNIST, Fashion-MNIST, CIFAR10, and Tiny-Imagenet, respectively. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.13079 [pdf, other]

Geometrical Homogeneous Clustering for Image Data Reduction

Authors: Shril Mody, Janvi Thakkar, Devvrat Joshi, Siddharth Soni, Rohan Patil, Nipun Batra

Abstract: In this paper, we present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size. The intuition behind the approaches proposed in this paper is to partition the dataset into homogeneous clusters and select some images which contribute significantly to the accuracy. Selected images are the proper subset of the training data and thus are human-reada… ▽ More In this paper, we present novel variations of an earlier approach called homogeneous clustering algorithm for reducing dataset size. The intuition behind the approaches proposed in this paper is to partition the dataset into homogeneous clusters and select some images which contribute significantly to the accuracy. Selected images are the proper subset of the training data and thus are human-readable. We propose four variations upon the baseline algorithm-RHC. The intuition behind the first approach, RHCKON, is that the boundary points contribute significantly towards the representation of clusters. It involves selecting k farthest and one nearest neighbour of the centroid of the clusters. In the following two approaches (KONCW and CWKC), we introduce the concept of cluster weights. They are based on the fact that larger clusters contribute more than smaller sized clusters. The final variation is GHCIDR which selects points based on the geometrical aspect of data distribution. We performed the experiments on two deep learning models- Fully Connected Networks (FCN) and VGG1. We experimented with the four variants on three datasets- MNIST, CIFAR10, and Fashion-MNIST. We found that GHCIDR gave the best accuracy of 99.35%, 81.10%, and 91.66% and a training data reduction of 87.27%, 32.34%, and 76.80% on MNIST, CIFAR10, and Fashion-MNIST respectively. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: Accepted at Subset ML Workshop @ ICML 2021 as a poster

arXiv:2207.13021 [pdf]

Topological Optimized Convolutional Visual Recurrent Network for Brain Tumor Segmentation and Classification

Authors: Dhananjay Joshi, Bhupesh Kumar Singh, Kapil Kumar Nagwanshi, Nitin S. Choubey

Abstract: In today's world of health care, brain tumor detection has become common. However, the manual brain tumor classification approach is time-consuming. So Deep Convolutional Neural Network (DCNN) is used by many researchers in the medical field for making accurate diagnoses and aiding in the patient's treatment. The traditional techniques have problems such as overfitting and the inability to extract… ▽ More In today's world of health care, brain tumor detection has become common. However, the manual brain tumor classification approach is time-consuming. So Deep Convolutional Neural Network (DCNN) is used by many researchers in the medical field for making accurate diagnoses and aiding in the patient's treatment. The traditional techniques have problems such as overfitting and the inability to extract necessary features. To overcome these problems, we developed the Topological Data Analysis based Improved Persistent Homology (TDA-IPH) and Convolutional Transfer learning and Visual Recurrent learning with Elephant Herding Optimization hyper-parameter tuning (CTVR-EHO) models for brain tumor segmentation and classification. Initially, the Topological Data Analysis based Improved Persistent Homology is designed to segment the brain tumor image. Then, from the segmented image, features are extracted using TL via the AlexNet model and Bidirectional Visual Long Short-Term Memory (Bi-VLSTM). Next, elephant Herding Optimization (EHO) is used to tune the hyperparameters of both networks to get an optimal result. Finally, extracted features are concatenated and classified using the softmax activation layer. The simulation result of this proposed CTVR-EHO and TDA-IPH method is analyzed based on precision, accuracy, recall, loss, and F score metrics. When compared to other existing brain tumor segmentation and classification models, the proposed CTVR-EHO and TDA-IPH approaches show high accuracy (99.8%), high recall (99.23%), high precision (99.67%), and high F score (99.59%). △ Less

Submitted 14 July, 2024; v1 submitted 6 June, 2022; originally announced July 2022.

MSC Class: 68U10 ACM Class: I.4

arXiv:2111.12495 [pdf, other]

Altering Backward Pass Gradients improves Convergence

Authors: Bishshoy Das, Milton Mondal, Brejesh Lall, Shiv Dutt Joshi, Sumantra Dutta Roy

Abstract: In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in the backward pass has seldom been studied in the literature. In this paper we explore decoupled training, where we alter the gradients in the backward pass. We p… ▽ More In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in the backward pass has seldom been studied in the literature. In this paper we explore decoupled training, where we alter the gradients in the backward pass. We propose a simple yet powerful method called PowerGrad Transform, that alters the gradients before the weight update in the backward pass and significantly enhances the predictive performance of the neural network. PowerGrad Transform trains the network to arrive at a better optima at convergence. It is computationally extremely efficient, virtually adding no additional cost to either memory or compute, but results in improved final accuracies on both the training and test sets. PowerGrad Transform is easy to integrate into existing training routines, requiring just a few lines of code. PowerGrad Transform accelerates training and makes it possible for the network to better fit the training data. With decoupled training, PowerGrad Transform improves baseline accuracies for ResNet-50 by 0.73%, for SE-ResNet-50 by 0.66% and by more than 1.0% for the non-normalized ResNet-18 network on the ImageNet classification task. △ Less

Submitted 20 September, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

arXiv:2109.04194 [pdf, other]

Novel Time Domain Based Upper-Limb Prosthesis Control using Incremental Learning Approach

Authors: Sidharth Pancholi, Amit M. Joshi Deepak Joshi, Bradly S. Duerstock

Abstract: The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the… ▽ More The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the past decade. Conventional EMG-PR based prosthesis struggles to give accurate performance due to off-line training used and incapability to compensate for electrode position shift and change in arm position. This work proposes online training and incremental learning based system for upper limb prosthetic application. This system consists of ADS1298 as AFE (analog front end) and a 32 bit arm cortex-m4 processor for DSP (digital signal processing). The system has been tested for both intact and amputated subjects. Time derivative moment based features have been implemented and utilized for effective pattern classification. Initially, system have been trained for four classes using the on-line training process later on the number of classes have been incremented on user demand till eleven, and system performance has been evaluated. The system yielded a completion rate of 100% for healthy and amputated subjects when four motions have been considered. Further 94.33% and 92% completion rate have been showcased by the system when the number of classes increased to eleven for healthy and amputees respectively. The motion efficacy test is also evaluated for all the subjects. The highest efficacy rate of 91.23% and 88.64% are observed for intact and amputated subjects respectively. △ Less

Submitted 13 January, 2024; v1 submitted 25 August, 2021; originally announced September 2021.

Comments: 15 Pages, 8 Figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2108.11860 [pdf, ps, other]

Auxiliary Heuristics for Frontier Based Planners

Authors: Arsh Tangri, Dhruv Joshi, Ashalatha Nayak

Abstract: Autonomous exploration of unknown environments is a vital function for robots and has applications in a wide variety of scenarios. Our focus primarily lies in its application for the task of efficient coverage of unknown environments. Various methods have been proposed for this task and frontier based methods are an efficient category in this class of methods. Efficiency is of utmost importance in… ▽ More Autonomous exploration of unknown environments is a vital function for robots and has applications in a wide variety of scenarios. Our focus primarily lies in its application for the task of efficient coverage of unknown environments. Various methods have been proposed for this task and frontier based methods are an efficient category in this class of methods. Efficiency is of utmost importance in exploration and heuristics play a critical role in guiding our search. In this work we demonstrate the ability of heuristics that are learnt by imitating clairvoyant oracles. These learnt heuristics can be used to predict the expected future return from selected states without building search trees, which are inefficient and limited by on-board compute. We also propose an additional filter-based heuristic which results in an enhancement in the performance of the frontier-based planner with respect to certain tasks such as coverage planning. △ Less

Submitted 26 August, 2021; originally announced August 2021.

arXiv:2103.11905 [pdf, ps, other]

doi 10.1109/TSP.2022.3152607

The Generalized Fourier Transform: A Unified Framework for the Fourier, Laplace, Mellin and $Z$ Transforms

Authors: Pushpendra Singh, Anubha Gupta, Shiv Dutt Joshi

Abstract: This paper introduces Generalized Fourier transform (GFT) that is an extension or the generalization of the Fourier transform (FT). The Unilateral Laplace transform (LT) is observed to be the special case of GFT. GFT, as proposed in this work, contributes significantly to the scholarly literature. There are many salient contribution of this work. Firstly, GFT is applicable to a much larger class o… ▽ More This paper introduces Generalized Fourier transform (GFT) that is an extension or the generalization of the Fourier transform (FT). The Unilateral Laplace transform (LT) is observed to be the special case of GFT. GFT, as proposed in this work, contributes significantly to the scholarly literature. There are many salient contribution of this work. Firstly, GFT is applicable to a much larger class of signals, some of which cannot be analyzed with FT and LT. For example, we have shown the applicability of GFT on the polynomially decaying functions and super exponentials. Secondly, we demonstrate the efficacy of GFT in solving the initial value problems (IVPs). Thirdly, the generalization presented for FT is extended for other integral transforms with examples shown for wavelet transform and cosine transform. Likewise, generalized Gamma function is also presented. One interesting application of GFT is the computation of generalized moments, for the otherwise non-finite moments, of any random variable such as the Cauchy random variable. Fourthly, we introduce Fourier scale transform (FST) that utilizes GFT with the topological isomorphism of an exponential map. Lastly, we propose Generalized Discrete-Time Fourier transform (GDTFT). The DTFT and unilateral $z$-transform are shown to be the special cases of the proposed GDTFT. The properties of GFT and GDTFT have also been discussed. △ Less

Submitted 12 February, 2021; originally announced March 2021.

Comments: 18 pages

arXiv:2101.04427 [pdf, other]

Quantum Internet- Applications, Functionalities, Enabling Technologies, Challenges, and Research Directions

Authors: Amoldeep Singh, Kapal Dev, Harun Siljak, Hem Dutt Joshi, Maurizio Magarini

Abstract: The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to… ▽ More The advanced notebooks, mobile phones, and internet applications in today's world that we use are all entrenched in classical communication bits of zeros and ones. Classical internet has laid its foundation originating from the amalgamation of mathematics and Claude Shannon's theory of information. But today's internet technology is a playground for eavesdroppers. This poses a serious challenge to various applications that relies on classical internet technology. This has motivated the researchers to switch to new technologies that are fundamentally more secure. Exploring the quantum effects, researchers paved the way into quantum networks that provide security, privacy and range of capabilities such as quantum computation, communication and metrology. The realization of quantum internet requires quantum communication between various remote nodes through quantum channels guarded by quantum cryptographic protocols. Such networks rely upon quantum bits (qubits) that can simultaneously take the value of zeros and ones. Due to extraordinary properties of qubits such as entanglement, teleportation and superposition, it gives an edge to quantum networks over traditional networks in many ways. But at the same time transmitting qubits over long distances is a formidable task and extensive research is going on quantum teleportation over such distances, which will become a breakthrough in physically realizing quantum internet in near future. In this paper, quantum internet functionalities, technologies, applications and open challenges have been extensively surveyed to help readers gain a basic understanding of infrastructure required for the development of global quantum internet. △ Less

Submitted 1 June, 2021; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: This survey paper is submitted in IEEE Communications Surveys and Tutorials and revised on 27th May 2021. It includes 31 pages, 14 figures, and 5 tables

arXiv:2011.09068 [pdf, other]

An analytical diabolo model for robotic learning and control

Authors: Felix von Drigalski, Devwrat Joshi, Takayuki Murooka, Kazutoshi Tanaka, Masashi Hamaya, Yoshihisa Ijiri

Abstract: In this paper, we present a diabolo model that can be used for training agents in simulation to play diabolo, as well as running it on a real dual robot arm system. We first derive an analytical model of the diabolo-string system and compare its accuracy using data recorded via motion capture, which we release as a public dataset of skilled play with diabolos of different dynamics. We show that ou… ▽ More In this paper, we present a diabolo model that can be used for training agents in simulation to play diabolo, as well as running it on a real dual robot arm system. We first derive an analytical model of the diabolo-string system and compare its accuracy using data recorded via motion capture, which we release as a public dataset of skilled play with diabolos of different dynamics. We show that our model outperforms a deep-learning-based predictor, both in terms of precision and physically consistent behavior. Next, we describe a method based on optimal control to generate robot trajectories that produce the desired diabolo trajectory, as well as a system to transform higher-level actions into robot motions. Finally, we test our method on a real robot system by playing the diabolo, and throwing it to and catching it from a human player. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: Video: https://youtu.be/oS-9mCfKIeY

arXiv:2006.09199 [pdf, other]

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Abstract: Current methods for learning visually grounded language from videos often rely on text annotation, such as human generated captions or machine generated automatic speech recognition (ASR) transcripts. In this work, we introduce the Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs. To circumvent the nee… ▽ More Current methods for learning visually grounded language from videos often rely on text annotation, such as human generated captions or machine generated automatic speech recognition (ASR) transcripts. In this work, we introduce the Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs. To circumvent the need for text annotation, we learn audio-visual representations from randomly segmented video clips and their raw audio waveforms. We train AVLnet on HowTo100M, a large corpus of publicly available instructional videos, and evaluate on image retrieval and video retrieval tasks, achieving state-of-the-art performance. We perform analysis of AVLnet's learned representations, showing our model utilizes speech and natural sounds to learn audio-visual concepts. Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval. Our code, data, and trained models will be released at avlnet.csail.mit.edu △ Less

Submitted 29 June, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: A version of this work has been accepted to Interspeech 2021

arXiv:2005.11417 [pdf]

Approaching Bio Cellular Classification for Malaria Infected Cells Using Machine Learning and then Deep Learning to compare & analyze K-Nearest Neighbours and Deep CNNs

Authors: Rishabh Malhotra, Dhron Joshi, Ku Young Shin

Abstract: Malaria is a deadly disease which claims the lives of hundreds of thousands of people every year. Computational methods have been proven to be useful in the medical industry by providing effective means of classification of diagnostic imaging and disease identification. This paper examines different machine learning methods in the context of classifying the presence of malaria in cell images. Nume… ▽ More Malaria is a deadly disease which claims the lives of hundreds of thousands of people every year. Computational methods have been proven to be useful in the medical industry by providing effective means of classification of diagnostic imaging and disease identification. This paper examines different machine learning methods in the context of classifying the presence of malaria in cell images. Numerous machine learning methods can be applied to the same problem; the question of whether one machine learning method is better suited to a problem relies heavily on the problem itself and the implementation of a model. In particular, convolutional neural networks and k nearest neighbours are both analyzed and contrasted in regards to their application to classifying the presence of malaria and each models empirical performance. Here, we implement two models of classification; a convolutional neural network, and the k nearest neighbours algorithm. These two algorithms are compared based on validation accuracy. For our implementation, CNN (95%) performed 25% better than kNN (75%). △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: 7 Pages

arXiv:2005.00447 [pdf, other]

Image fusion using symmetric skip autoencodervia an Adversarial Regulariser

Authors: Snigdha Bhagat, S. D. Joshi, Brejesh Lall

Abstract: It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual aut… ▽ More It is a challenging task to extract the best of both worlds by combining the spatial characteristics of a visible image and the spectral content of an infrared image. In this work, we propose a spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. In this paper, we propose a residual autoencoder architecture, regularised by a residual adversarial network, to generate a more realistic fused image. The residual module serves as primary building for the encoder, decoder and adversarial network, as an add on the symmetric skip connections perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure, which makes inference on both the visual and infrared images separately. In order to efficiently optimize the parameters of the network, we propose an adversarial regulariser network which would perform supervised learning on the fused image and the original visual image. △ Less

Submitted 4 June, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

arXiv:1911.05609 [pdf, other]

doi 10.1145/3363560

Affective Computing for Large-Scale Heterogeneous Multimedia Data: A Survey

Authors: Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, Qiang Ji

Abstract: The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC tec… ▽ More The wide popularity of digital photography and social networks has generated a rapidly growing volume of multimedia data (i.e., image, music, and video), resulting in a great demand for managing, retrieving, and understanding these data. Affective computing (AC) of these data can help to understand human behaviors and enable wide applications. In this article, we survey the state-of-the-art AC technologies comprehensively for large-scale heterogeneous multimedia data. We begin this survey by introducing the typical emotion representation models from psychology that are widely employed in AC. We briefly describe the available datasets for evaluating AC algorithms. We then summarize and compare the representative methods on AC of different multimedia types, i.e., images, music, videos, and multimodal data, with the focus on both handcrafted features-based methods and deep learning methods. Finally, we discuss some challenges and future directions for multimedia affective computing. △ Less

Submitted 3 October, 2019; originally announced November 2019.

Comments: Accepted by ACM TOMM

arXiv:1904.09651 [pdf]

An improved sex specific and age dependent classification model for Parkinson's diagnosis using handwriting measurement

Authors: Ujjwal Gupta, Hritik Bansal, Deepak Joshi

Abstract: Accurate diagnosis is crucial for preventing the progression of Parkinson's, as well as improving the quality of life with individuals with Parkinson's disease. In this paper, we develop a sex-specific and age-dependent classification method to diagnose the Parkinson's disease using the online handwriting recorded from individuals with Parkinson's(n=37;m/f-19/18;age-69.3+-10.9years) and healthy co… ▽ More Accurate diagnosis is crucial for preventing the progression of Parkinson's, as well as improving the quality of life with individuals with Parkinson's disease. In this paper, we develop a sex-specific and age-dependent classification method to diagnose the Parkinson's disease using the online handwriting recorded from individuals with Parkinson's(n=37;m/f-19/18;age-69.3+-10.9years) and healthy controls(n=38;m/f-20/18;age-62.4+-11.3 years).The sex specific and age dependent classifier was observed significantly outperforming the generalized classifier. An improved accuracy of 83.75%(SD+1.63) with female specific classifier, and 79.55%(SD=1.58) with old age dependent classifier was observed in comparison to 75.76%(SD=1.17) accuracy with the generalized classifier. Finally, combining the age and sex information proved to be encouraging in classification. We performed a rigorous analysis to observe the dominance of sex specific and age dependent features for Parkinson's detection and ranked them using the support vector machine(SVM) ranking method. Distinct set of features were observed to be dominating for higher classification accuracy in different category of classification. △ Less

Submitted 30 December, 2019; v1 submitted 21 April, 2019; originally announced April 2019.

Comments: Journal of Computer Methods and Programs in Biomedicine(Accepted on 27 December 2019)

arXiv:1904.07303 [pdf, other]

CryptoNN: Training Neural Networks over Encrypted Data

Authors: Runhua Xu, James B. D. Joshi, Chao Li

Abstract: Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concer… ▽ More Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concerns, several privacy-preserving approaches have been proposed in the literature that use either secure multi-party computation (SMC) or homomorphic encryption (HE) as the underlying mechanisms. However, neither of these cryptographic approaches provides an efficient solution towards constructing a privacy-preserving machine learning model, as well as supporting both the training and inference phases. To tackle the above issue, we propose a CryptoNN framework that supports training a neural network model over encrypted data by using the emerging functional encryption scheme instead of SMC or HE. We also construct a functional encryption scheme for basic arithmetic computation to support the requirement of the proposed CryptoNN framework. We present performance evaluation and security analysis of the underlying crypto scheme and show through our experiments that CryptoNN achieves accuracy that is similar to those of the baseline neural network models on the MNIST dataset. △ Less

Submitted 26 April, 2019; v1 submitted 15 April, 2019; originally announced April 2019.

Comments: ePrint

arXiv:1811.08815 [pdf, other]

Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Authors: Khoi-Nguyen C. Mac, Dhiraj Joshi, Raymond A. Yeh, Jinjun Xiong, Rogerio S. Feris, Minh N. Do

Abstract: Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus on produc… ▽ More Fine-grained action detection is an important task with numerous applications in robotics and human-computer interaction. Existing methods typically utilize a two-stage approach including extraction of local spatio-temporal features followed by temporal modeling to capture long-term dependencies. While most recent papers have focused on the latter (long-temporal modeling), here, we focus on producing features capable of modeling fine-grained motion more efficiently. We propose a novel locally-consistent deformable convolution, which utilizes the change in receptive fields and enforces a local coherency constraint to capture motion information effectively. Our model jointly learns spatio-temporal features (instead of using independent spatial and temporal streams). The temporal component is learned from the feature space instead of pixel space, e.g. optical flow. The produced features can be flexibly used in conjunction with other long-temporal modeling networks, e.g. ST-CNN, DilatedTCN, and ED-TCN. Overall, our proposed approach robustly outperforms the original long-temporal models on two fine-grained action datasets: 50 Salads and GTEA, achieving F1 scores of 80.22% and 75.39% respectively. △ Less

Submitted 6 November, 2019; v1 submitted 21 November, 2018; originally announced November 2018.

Comments: Accepted at ICCV 2019 as oral

arXiv:1710.10227 [pdf, other]

Unified Functorial Signal Representation III: Foundations, Redundancy, $L^0$ and $L^2$ functors

Authors: Salil Samant, Shiv Dutt Joshi

Abstract: In this paper we propose and lay the foundations of a functorial framework for representing signals. By incorporating additional category-theoretic relative and generative perspective alongside the classic set-theoretic measure theory the fundamental concepts of redundancy, compression are formulated in a novel authentic arrow-theoretic way. The existing classic framework representing a signal as… ▽ More In this paper we propose and lay the foundations of a functorial framework for representing signals. By incorporating additional category-theoretic relative and generative perspective alongside the classic set-theoretic measure theory the fundamental concepts of redundancy, compression are formulated in a novel authentic arrow-theoretic way. The existing classic framework representing a signal as a vector of appropriate linear space is shown as a special case of the proposed framework. Next in the context of signal-spaces as a categories we study the various covariant and contravariant forms of $L^0$ and $L^2$ functors using categories of measurable or measure spaces and their opposites involving Boolean and measure algebras along with partial extension. Finally we contribute a novel definition of intra-signal redundancy using general concept of isomorphism arrow in a category covering the translation case and others as special cases. Through category-theory we provide a simple yet precise explanation for the well-known heuristic of lossless differential encoding standards yielding better compressions in image types such as line drawings, iconic image, text etc; as compared to classic representation techniques such as JPEG which choose bases or frames in a global Hilbert space. △ Less

Submitted 27 October, 2017; originally announced October 2017.

Comments: First draft version

arXiv:1707.07075 [pdf, other]

Automatic Curation of Golf Highlights using Multimodal Excitement Features

Authors: Michele Merler, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, John R. Smith, Rogerio S. Feris

Abstract: The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players' reactions (action recog… ▽ More The production of sports highlight packages summarizing a game's most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players' reactions (action recognition such as high-fives and fist pumps), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player's name and the hole number, allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament, successfully extracting highlights from live video streams over four consecutive days. △ Less

Submitted 21 July, 2017; originally announced July 2017.

arXiv:1707.06283 [pdf, other]

Orthogonal Ramanujan Sums, its properties and Applications in Multiresolution Analysis

Authors: Devendra Kumar Yadav, Gajraj Kuldeep, S. D. Joshi

Abstract: Signal processing community has recently shown interest in Ramanujan sums which was defined by S.Ramanujan in 1918. In this paper we have proposed Orthog- onal Ramanujan Sums (ORS) based on Ramanujan sums. In this paper we present two novel application of ORS. Firstly a new representation of a finite length signal is given using ORS which is defined as Orthogonal Ramanujan Periodic Transform.Secon… ▽ More Signal processing community has recently shown interest in Ramanujan sums which was defined by S.Ramanujan in 1918. In this paper we have proposed Orthog- onal Ramanujan Sums (ORS) based on Ramanujan sums. In this paper we present two novel application of ORS. Firstly a new representation of a finite length signal is given using ORS which is defined as Orthogonal Ramanujan Periodic Transform.Secondly ORS has been applied to multiresolution analysis and it is shown that Haar transform is a spe- cial case. △ Less

Submitted 24 May, 2017; originally announced July 2017.

arXiv:1507.08117 [pdf, other]

doi 10.1007/s00034-019-01133-x

Some studies on multidimensional Fourier theory for Hilbert transform, analytic signal and space-time series analysis

Authors: Pushpendra Singh, Shiv Dutt Joshi

Abstract: In this paper, we propose the Fourier frequency vector (FFV), inherently, associated with multidimensional Fourier transform. With the help of FFV, we are able to provide physical meaning of so called negative frequencies in multidimensional Fourier transform (MDFT), which in turn provide multidimensional spatial and space-time series analysis. The complex exponential representation of sinusoidal… ▽ More In this paper, we propose the Fourier frequency vector (FFV), inherently, associated with multidimensional Fourier transform. With the help of FFV, we are able to provide physical meaning of so called negative frequencies in multidimensional Fourier transform (MDFT), which in turn provide multidimensional spatial and space-time series analysis. The complex exponential representation of sinusoidal function always yields two frequencies, negative frequency corresponding to positive frequency and vice versa, in the multidimensional Fourier spectrum. Thus, using the MDFT, we propose multidimensional Hilbert transform (MDHT) and associated multidimensional analytic signal (MDAS) with following properties: (a) the extra and redundant positive, negative, or both frequencies, introduced due to complex exponential representation of multidimensional Fourier spectrum, are suppressed, (b) real part of MDAS is original signal, (c) real and imaginary part of MDAS are orthogonal, and (d) the magnitude envelope of a original signal is obtained as the magnitude of its associated MDAS, which is the instantaneous amplitude of the MDAS. The proposed MDHT and associated DMAS are generalization of the 1D HT and AS, respectively. We also provide the decomposition of an image into the AM-FM image model by the Fourier method and obtain explicit expression for the analytic image computation by 2DDFT. △ Less

Submitted 29 July, 2015; originally announced July 2015.

Comments: 13 pages, 10 figures

Journal ref: Circuits, Systems, and Signal Processing, May 2019

arXiv:1505.06878 [pdf, ps, other]

Computationally efficient MIMO system identification using Signal Matched Synthesis Filter Bank

Authors: Binish Fatimah, Shiv Dutt Joshi

Abstract: We propose a multi input multi output(MIMO) system identification framework by interpreting the MIMO system in terms of a multirate synthesis filter bank. The proposed methodology is discussed in two steps: in the first step the MIMO system is interpreted as a synthesis filter bank and the second step is to convert the MIMO system into a SISO system "without any loss of information", which re-stru… ▽ More We propose a multi input multi output(MIMO) system identification framework by interpreting the MIMO system in terms of a multirate synthesis filter bank. The proposed methodology is discussed in two steps: in the first step the MIMO system is interpreted as a synthesis filter bank and the second step is to convert the MIMO system into a SISO system "without any loss of information", which re-structures the system identification problem into a SISO form. The system identification problem, in its new form, is identical to the problem of obtaining the signal matched synthesis filter bank (SMSFB) as proposed in Part II. Since we have developed fast algorithms to obtain the filter bank coefficients in Part II, for "the given data case" as well as "the given statistics case", we can use these algorithm for the MIMO system identification as well. This framework can have an adaptive as well as block processing implementation. The algorithms, used here, involve only scalar computations, unlike the conventional MIMO system identification algorithms where one requires matrix computations. These order recursive algorithm can also be used to obtain approximate smaller order model for large order systems without using any model order reduction algorithm. The proposed identification framework can also be used for SISO LPTV system identification and also for a SIMO or MISO system. The efficacy of the proposed scheme is validated and its performance in the presence of measurement noise is illustrated using simulation results. △ Less

Submitted 26 May, 2015; originally announced May 2015.

arXiv:1504.04104 [pdf, other]

The Hilbert spectrum and the Energy Preserving Empirical Mode Decomposition

Authors: Pushpendra Singh, Shiv Dutt Joshi, Rakesh Kumar Patney, Kaushik Saha

Abstract: In this paper, we propose algorithms which preserve energy in empirical mode decomposition (EMD), generating finite $n$ number of band limited Intrinsic Mode Functions (IMFs). In the first energy preserving EMD (EPEMD) algorithm, a signal is decomposed into linearly independent (LI), non orthogonal yet energy preserving (LINOEP) IMFs and residue (EPIMFs). It is shown that a vector in an inner prod… ▽ More In this paper, we propose algorithms which preserve energy in empirical mode decomposition (EMD), generating finite $n$ number of band limited Intrinsic Mode Functions (IMFs). In the first energy preserving EMD (EPEMD) algorithm, a signal is decomposed into linearly independent (LI), non orthogonal yet energy preserving (LINOEP) IMFs and residue (EPIMFs). It is shown that a vector in an inner product space can be represented as a sum of LI and non orthogonal vectors in such a way that Parseval's type property is satisfied. From the set of $n$ IMFs, through Gram-Schmidt orthogonalization method (GSOM), $n!$ set of orthogonal functions can be obtained. In the second algorithm, we show that if the orthogonalization process proceeds from lowest frequency IMF to highest frequency IMF, then the GSOM yields functions which preserve the properties of IMFs and the energy of a signal. With the Hilbert transform, these IMFs yield instantaneous frequencies and amplitudes as functions of time that reveal the imbedded structures of a signal. The instantaneous frequencies and square of amplitudes as functions of time produce a time-frequency-energy distribution, referred as the Hilbert spectrum, of a signal. Simulations have been carried out for the analysis of various time series and real life signals to show comparison among IMFs produced by EMD, EPEMD, ensemble EMD and multivariate EMD algorithms. Simulation results demonstrate the power of this proposed method. △ Less

Submitted 16 April, 2015; originally announced April 2015.

Comments: 23 pages, 25 figures

arXiv:1503.06675 [pdf, ps, other]

doi 10.1098/rspa.2016.0871

The Fourier Decomposition Method for nonlinear and nonstationary time series analysis

Authors: Pushpendra Singh, Shiv Dutt Joshi, Rakesh Kumar Patney, Kaushik Saha

Abstract: Since many decades, there is a general perception in literature that the Fourier methods are not suitable for the analysis of nonlinear and nonstationary data. In this paper, we propose a Fourier Decomposition Method (FDM) and demonstrate its efficacy for the analysis of nonlinear (i.e. data generated by nonlinear systems) and nonstationary time series. The proposed FDM decomposes any data into a… ▽ More Since many decades, there is a general perception in literature that the Fourier methods are not suitable for the analysis of nonlinear and nonstationary data. In this paper, we propose a Fourier Decomposition Method (FDM) and demonstrate its efficacy for the analysis of nonlinear (i.e. data generated by nonlinear systems) and nonstationary time series. The proposed FDM decomposes any data into a small number of `Fourier intrinsic band functions' (FIBFs). The FDM presents a generalized Fourier expansion with variable amplitudes and frequencies of a time series by the Fourier method itself. We propose an idea of zero-phase filter bank based multivariate FDM (MFDM) algorithm, for the analysis of multivariate nonlinear and nonstationary time series, from the FDM. We also present an algorithm to obtain cutoff frequencies for MFDM. The MFDM algorithm is generating finite number of band limited multivariate FIBFs (MFIBFs). The MFDM preserves some intrinsic physical properties of the multivariate data, such as scale alignment, trend and instantaneous frequency. The proposed methods produce the results in a time-frequency-energy distribution that reveal the intrinsic structures of a data. Simulations have been carried out and comparison is made with the Empirical Mode Decomposition (EMD) methods in the analysis of various simulated as well as real life time series, and results show that the proposed methods are powerful tools for analyzing and obtaining the time-frequency-energy representation of any data. △ Less

Submitted 31 August, 2015; v1 submitted 26 February, 2015; originally announced March 2015.

Comments: 14 Pages, 18 Figures

Journal ref: Proceedings of the Royal Society of London A; March 2017, Volume 473, issue 2199

arXiv:1502.08003 [pdf]

doi 10.1109/TNSRE.2014.2360533

Illusory Sense of Human Touch from a Warm and Soft Artificial Hand

Authors: John-John Cabibihan, Deepak Joshi, Yeshwin Mysore Srinivasa, Mark Aaron Chan, Arrchana Muruganantham

Abstract: To touch and be touched are vital to human development, well being, and relationships. However, to those who have lost their arms and hands due to accident or war, touching becomes a serious concern that often leads to psychosocial issues and social stigma. In this paper, we demonstrate that the touch from a warm and soft rubber hand can be perceived by another person as if the touch were coming f… ▽ More To touch and be touched are vital to human development, well being, and relationships. However, to those who have lost their arms and hands due to accident or war, touching becomes a serious concern that often leads to psychosocial issues and social stigma. In this paper, we demonstrate that the touch from a warm and soft rubber hand can be perceived by another person as if the touch were coming from a human hand. We describe a three step process toward this goal. First, we made participants select artificial skin samples according to their preferred warmth and softness characteristics. At room temperature, the preferred warmth was found to be 28.4 deg C at the skin surface of a soft silicone rubber material that has a Shore durometer value of 30 at the OO scale. Second, we developed a process to create a rubber hand replica of a human hand. To compare the skin softness of a human hand and artificial hands, a robotic indenter was employed to produce a softness map by recording the displacement data when constant indentation force of 1 N was applied to 780 data points on the palmar side of the hand. Results showed that an artificial hand with skeletal structure is as soft as a human hand. Lastly, the participants arms were touched with human and artificial hands, but they were prevented to see the hand that touched them. Receiver operating characteristic curve analysis suggests that a warm and soft artificial hand can create an illusion that the touch is from a human hand. These findings open the possibilities for prosthetic and robotic hands that are lifelike and are more socially acceptable. △ Less

Submitted 27 February, 2015; originally announced February 2015.

Comments: 23 pages, 12 figures, supplementary video at: http://youtu.be/lATSgG7CuQU; contact info at: http://www.johncabibihan.com, IEEE Trans on Neural Systems and Rehabilitation Engineering, 2015

arXiv:1409.5099 [pdf, ps, other]

Exact Least Squares Algorithm for Signal Matched Synthesis Filter Bank: Part II

Authors: Binish Fatimah, S. D. Joshi

Abstract: In the companion paper, we proposed a concept of signal matched whitening filter bank and developed a time and order recursive, fast least squares algorithm for the same. Objective of part II of the paper is two fold: first is to define a concept of signal matched synthesis filter bank, hence combining definitions of part I and part II we obtain a filter bank matched to a given signal. We also dev… ▽ More In the companion paper, we proposed a concept of signal matched whitening filter bank and developed a time and order recursive, fast least squares algorithm for the same. Objective of part II of the paper is two fold: first is to define a concept of signal matched synthesis filter bank, hence combining definitions of part I and part II we obtain a filter bank matched to a given signal. We also develop a fast time and order recursive, least squares algorithm for obtaining the same. The synthesis filters, obtained here, reconstruct the given signal only and not every signal from the finite energy signal space (i.e. belonging to L^2(R)), as is usually done. The recursions, so obtained, result in a lattice-like structure. Since the filter parameters are not directly available, we also present an order recursive algorithm for the computation of signal matched synthesis filter bank coefficients from the lattice parameters. The second objective is to explore the possibility of using synthesis side for modeling of a given stochastic process. Simulation results have also been presented to validate the theory. △ Less

Submitted 16 September, 2014; originally announced September 2014.

arXiv:1409.5015 [pdf, ps, other]

Exact Least Squares Algorithm for Signal Matched Multirate Whitening Filter Bank: Part I

Authors: Binish Fatimah, S. D. Joshi

Abstract: In this paper, we define a concept of signal matched multirate whitening filter bank which provides an optimum coding gain. This is achieved by whitening the outputs, of the analysis filter bank, within as well as across the channels, by solving a constrained projection problem. We also present a fast time and order recursive least squares algorithm to obtain the vector output of the proposed anal… ▽ More In this paper, we define a concept of signal matched multirate whitening filter bank which provides an optimum coding gain. This is achieved by whitening the outputs, of the analysis filter bank, within as well as across the channels, by solving a constrained projection problem. We also present a fast time and order recursive least squares algorithm to obtain the vector output of the proposed analysis filter bank. The recursive algorithm, developed here, gives rise to a lattice-like structure. Since the proposed signal matched analysis filter bank coefficients are not available directly, an order recursive algorithm is also presented for estimating these from the lattice parameters. Simulation results are presented to validate the theory. It is also observed that the proposed algorithm can be used to whiten Gaussian/non-Gaussian processes with minimum as well as non-minimum phase. △ Less

Submitted 17 September, 2014; originally announced September 2014.

Showing 1–44 of 44 results for author: Joshi, D