subscribe to arXiv mailings

Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language

Authors: Ronny Paul, Himanshu Buckchash, Shantipriya Parida, Dilip K. Prasad

Abstract: Sámi, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the Sámi language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR… ▽ More Sámi, an indigenous language group comprising multiple languages, faces digital marginalization due to the limited availability of data and sophisticated language models designed for its linguistic intricacies. This work focuses on increasing technological participation for the Sámi language. We draw the attention of the ML community towards the language modeling problem of Ultra Low Resource (ULR) languages. ULR languages are those for which the amount of available textual resources is very low, and the speaker count for them is also very low. ULRLs are also not supported by mainstream Large Language Models (LLMs) like ChatGPT, due to which gathering artificial training data for them becomes even more challenging. Mainstream AI foundational model development has given less attention to this category of languages. Generally, these languages have very few speakers, making it hard to find them. However, it is important to develop foundational models for these ULR languages to promote inclusion and the tangible abilities and impact of LLMs. To this end, we have compiled the available Sámi language resources from the web to create a clean dataset for training language models. In order to study the behavior of modern LLM models with ULR languages (Sámi), we have experimented with different kinds of LLMs, mainly at the order of $\sim$ seven billion parameters. We have also explored the effect of multilingual LLM training for ULRLs. We found that the decoder-only models under a sequential multilingual training scenario perform better than joint multilingual training, whereas multilingual training with high semantic overlap, in general, performs better than training from scratch.This is the first study on the Sámi language for adapting non-statistical language models that use the latest developments in the field of natural language processing (NLP). △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.13865 [pdf, other]

doi 10.1007/978-3-031-49601-1_6

Context-Enhanced Language Models for Generating Multi-Paper Citations

Authors: Avinash Anand, Kritarth Prasad, Ujjwal Goel, Mohit Gupta, Naman Lal, Astha Verma, Rajiv Ratn Shah

Abstract: Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, whi… ▽ More Citation text plays a pivotal role in elucidating the connection between scientific documents, demanding an in-depth comprehension of the cited paper. Constructing citations is often time-consuming, requiring researchers to delve into extensive literature and grapple with articulating relevant content. To address this challenge, the field of citation text generation (CTG) has emerged. However, while earlier methods have primarily centered on creating single-sentence citations, practical scenarios frequently necessitate citing multiple papers within a single paragraph. To bridge this gap, we propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences. Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text. Furthermore, we introduce a curated dataset named MCG-S2ORC, composed of English-language academic research papers in Computer Science, showcasing multiple citation instances. In our experiments, we evaluate three LLMs LLaMA, Alpaca, and Vicuna to ascertain the most effective model for this endeavor. Additionally, we exhibit enhanced performance by integrating knowledge graphs from target papers into the prompts for generating citation text. This research underscores the potential of harnessing LLMs for citation generation, opening a compelling avenue for exploring the intricate connections between scientific documents. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 14 pages, 7 figures, 11th International Conference, BDA 2023, Delhi, India

Journal ref: Big Data and Artificial Intelligence 2023, Delhi, India, December 7, 80 94

arXiv:2404.13770 [pdf, other]

EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder

Authors: Hasanul Mahmud, Kevin Desai, Palden Lama, Sushil K. Prasad

Abstract: Image classification is a fundamental task in computer vision, and the quest to enhance DNN accuracy without inflating model size or latency remains a pressing concern. We make a couple of advances in this regard, leading to a novel EncodeNet design and training framework. The first advancement involves Converting Autoencoders, a novel approach that transforms images into an easy-to-classify image… ▽ More Image classification is a fundamental task in computer vision, and the quest to enhance DNN accuracy without inflating model size or latency remains a pressing concern. We make a couple of advances in this regard, leading to a novel EncodeNet design and training framework. The first advancement involves Converting Autoencoders, a novel approach that transforms images into an easy-to-classify image of its class. Our prior work that applied the Converting Autoencoder and a simple classifier in tandem achieved moderate accuracy over simple datasets, such as MNIST and FMNIST. However, on more complex datasets like CIFAR-10, the Converting Autoencoder has a large reconstruction loss, making it unsuitable for enhancing DNN accuracy. To address these limitations, we generalize the design of Converting Autoencoders by leveraging a larger class of DNNs, those with architectures comprising feature extraction layers followed by classification layers. We incorporate a generalized algorithmic design of the Converting Autoencoder and intraclass clustering to identify representative images, leading to optimized image feature learning. Next, we demonstrate the effectiveness of our EncodeNet design and training framework, improving the accuracy of well-trained baseline DNNs while maintaining the overall model size. EncodeNet's building blocks comprise the trained encoder from our generalized Converting Autoencoders transferring knowledge to a lightweight classifier network - also extracted from the baseline DNN. Our experimental results demonstrate that EncodeNet improves the accuracy of VGG16 from 92.64% to 94.05% on CIFAR-10 and RestNet20 from 74.56% to 76.04% on CIFAR-100. It outperforms state-of-the-art techniques that rely on knowledge distillation and attention mechanisms, delivering higher accuracy for models of comparable size. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.13099 [pdf, other]

Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks

Authors: Avinash Anand, Mohit Gupta, Kritarth Prasad, Navya Singla, Sanjana Sanjeev, Jatin Kumar, Adarsh Raj Shivam, Rajiv Ratn Shah

Abstract: The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application… ▽ More The rapid progress in the field of natural language processing (NLP) systems and the expansion of large language models (LLMs) have opened up numerous opportunities in the field of education and instructional methods. These advancements offer the potential for tailored learning experiences and immediate feedback, all delivered through accessible and cost-effective services. One notable application area for this technological advancement is in the realm of solving mathematical problems. Mathematical problem-solving not only requires the ability to decipher complex problem statements but also the skill to perform precise arithmetic calculations at each step of the problem-solving process. However, the evaluation of the arithmetic capabilities of large language models remains an area that has received relatively little attention. In response, we introduce an extensive mathematics dataset called "MathQuest" sourced from the 11th and 12th standard Mathematics NCERT textbooks. This dataset encompasses mathematical challenges of varying complexity and covers a wide range of mathematical concepts. Utilizing this dataset, we conduct fine-tuning experiments with three prominent LLMs: LLaMA-2, WizardMath, and MAmmoTH. These fine-tuned models serve as benchmarks for evaluating their performance on our dataset. Our experiments reveal that among the three models, MAmmoTH-13B emerges as the most proficient, achieving the highest level of competence in solving the presented mathematical problems. Consequently, MAmmoTH-13B establishes itself as a robust and dependable benchmark for addressing NCERT mathematics problems. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 10 pages, 3 figures, NeurIPS 2023 Workshop on Generative AI for Education (GAIED)

Journal ref: NeurIPS 2023 Workshop on Generative AI for Education (GAIED)

arXiv:2404.09763 [pdf, other]

KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models

Authors: Avinash Anand, Mohit Gupta, Kritarth Prasad, Ujjwal Goel, Naman Lal, Astha Verma, Rajiv Ratn Shah

Abstract: Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generati… ▽ More Citation Text Generation (CTG) is a task in natural language processing (NLP) that aims to produce text that accurately cites or references a cited document within a source document. In CTG, the generated text draws upon contextual cues from both the source document and the cited paper, ensuring accurate and relevant citation information is provided. Previous work in the field of citation generation is mainly based on the text summarization of documents. Following this, this paper presents a framework, and a comparative study to demonstrate the use of Large Language Models (LLMs) for the task of citation generation. Also, we have shown the improvement in the results of citation generation by incorporating the knowledge graph relations of the papers in the prompt for the LLM to better learn the relationship between the papers. To assess how well our model is performing, we have used a subset of standard S2ORC dataset, which only consists of computer science academic research papers in the English Language. Vicuna performs best for this task with 14.15 Meteor, 12.88 Rouge-1, 1.52 Rouge-2, and 10.94 Rouge-L. Also, Alpaca performs best, and improves the performance by 36.98% in Rouge-1, and 33.14% in Meteor by including knowledge graphs. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.04903 [pdf, other]

Online Learning under Haphazard Input Conditions: A Comprehensive Review and Analysis

Authors: Rohit Agarwal, Arijit Das, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Abstract: The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss,… ▽ More The domain of online learning has experienced multifaceted expansion owing to its prevalence in real-life applications. Nonetheless, this progression operates under the assumption that the input feature space of the streaming data remains constant. In this survey paper, we address the topic of online learning in the context of haphazard inputs, explicitly foregoing such an assumption. We discuss, classify, evaluate, and compare the methodologies that are adept at modeling haphazard inputs, additionally providing the corresponding code implementations and their carbon footprint. Moreover, we classify the datasets related to the field of haphazard inputs and introduce evaluation metrics specifically designed for datasets exhibiting imbalance. The code of each methodology can be found at https://github.com/Rohit102497/HaphazardInputsReview △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2311.02538 [pdf, other]

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Authors: Iqra Qasim, Alexander Horsch, Dilip K. Prasad

Abstract: Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing differen… ▽ More Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 35 pages, 10 figures

arXiv:2311.01460 [pdf, ps, other]

Implicit Chain of Thought Reasoning via Knowledge Distillation

Authors: Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber

Abstract: To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternat… ▽ More To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. The implicit reasoning steps are distilled from a teacher model trained on explicit chain-of-thought reasoning, and instead of doing reasoning "horizontally" by producing intermediate words one-by-one, we distill it such that the reasoning happens "vertically" among the hidden states in different layers. We conduct experiments on a multi-digit multiplication task and a grade school math problem dataset and find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2309.08698 [pdf, other]

Modelling Irregularly Sampled Time Series Without Imputation

Authors: Rohit Agarwal, Aman Sinha, Dilip K. Prasad, Marianne Clausel, Alexander Horsch, Mathieu Constant, Xavier Coubez

Abstract: Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of L… ▽ More Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors. SLAN exploits the irregularity information to capture each sensor's local summary explicitly and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on publicly available datasets, namely, MIMIC-III, Physionet 2012 and Physionet 2019. The code is available at https://github.com/Rohit102497/SLAN. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.14357 [pdf, other]

doi 10.1109/LRA.2024.3418311

Geometrically Modulable Gait Design for Quadrupeds

Authors: Hari Krishna Hari Prasad, Ross L. Hatton, Kaushik Jayaram

Abstract: Miniature-legged robots are constrained by their onboard computation and control, thus motivating the need for simple, first-principles-based geometric models that connect \emph{periodic actuation or gaits} (a universal robot control paradigm) to the induced average locomotion. In this paper, we develop a \emph{modulable two-beat gait design framework} for sprawled planar quadrupedal systems under… ▽ More Miniature-legged robots are constrained by their onboard computation and control, thus motivating the need for simple, first-principles-based geometric models that connect \emph{periodic actuation or gaits} (a universal robot control paradigm) to the induced average locomotion. In this paper, we develop a \emph{modulable two-beat gait design framework} for sprawled planar quadrupedal systems under the no-slip using tools from geometric mechanics. We reduce standard two-beat gaits into unique subgaits in mutually exclusive shape subspaces. Subgaits are characterized by a locomotive stance phase when limbs are in ground contact and a non-locomotive, instantaneous swing phase where the limbs are reset without contact. During the stance phase, the contacting limbs form a four-bar mechanism. To analyze the ensuing locomotion, we develop the following tools: (a) a vector field to generate nonslip actuation, (b) the kinematics of a four-bar mechanism as a local connection, and (c) stratified panels that combine the kinematics and constrained actuation to encode the net change in the system's position generated by a stance-swing subgait cycle. Decoupled subgaits are then designed independently using flows on the shape-change basis and are combined with appropriate phasing to produce a two-beat gait. Further, we introduce ``scaling" and ``sliding" control inputs to continuously modulate the global trajectories of the quadrupedal system in gait time through which we demonstrate cycle-average speed, direction, and steering control using the control inputs. Thus, this framework has the potential to create uncomplicated open-loop gait plans or gain schedules for robots with limited resources, bringing them closer to achieving autonomous control. △ Less

Submitted 2 July, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 8 pages, 6 figures

Journal ref: IEEE Robotics and Automation Letters Vol. 9, No. 8, August 2024

arXiv:2308.06983 [pdf, other]

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Authors: Momojit Biswas, Himanshu Buckchash, Dilip K. Prasad

Abstract: Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor b… ▽ More Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor based method for SSL. We then provide a refined baseline (pNNCLR) to the nearest neighbor based SSL approach (NNCLR). To this end, we introduce pseudo nearest neighbors (pNN) to control the quality of the support set, wherein, rather than sampling the nearest neighbors, we sample in the vicinity of hard nearest neighbors by varying the magnitude of the resultant vector and employing a stochastic sampling strategy to improve the performance. Additionally, to stabilize the effects of uncertainty in NN-based learning, we employ a smooth-weight-update approach for training the proposed network. Evaluation of the proposed method on multiple public image recognition and medical image recognition datasets shows that it performs up to 8 percent better than the baseline nearest neighbor method, and is comparable to other previously proposed SSL methods. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 15 pages, 5 figures

arXiv:2307.08073 [pdf, ps, other]

The sequence of higher order Mersenne numbers and associated binomial transforms

Authors: Kalika Prasad, Munesh Kumari, Rabiranjan Mohanta, Hrishikesh Mahato

Abstract: In this article, we introduce and study a new integer sequence referred to as the higher order Mersenne sequence. The proposed sequence is analogous to the higher order Fibonacci numbers and closely associated with the Mersenne numbers. Here, we discuss various algebraic properties such as Binet's formula, Catalan's identity, d'Ocagne's identity, generating functions, finite and binomial sums, etc… ▽ More In this article, we introduce and study a new integer sequence referred to as the higher order Mersenne sequence. The proposed sequence is analogous to the higher order Fibonacci numbers and closely associated with the Mersenne numbers. Here, we discuss various algebraic properties such as Binet's formula, Catalan's identity, d'Ocagne's identity, generating functions, finite and binomial sums, etc. of this new sequence, and some inter-relations with Mersenne and Jacobsthal numbers. Moreover, we study the sequence generated from the binomial transforms of the higher order Mersenne numbers and present the recurrence relation and algebraic properties of them. Lastly, we give matrix generators and tridiagonal matrix representation for higher order Mersenne numbers. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: 16 pages

MSC Class: 11B37; 11B39; 11B83

arXiv:2307.04149 [pdf, other]

Latent Graph Attention for Enhanced Spatial Context

Authors: Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad

Abstract: Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph A… ▽ More Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation. △ Less

Submitted 12 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

Comments: 20 pages, 7 figures

arXiv:2306.10276 [pdf, other]

Geometric Mechanics of Contact-Switching Systems

Authors: Hari Krishna Hari Prasad, Ross L. Hatton, Kaushik Jayaram

Abstract: Discrete and periodic contact switching is a key characteristic of steady-state legged locomotion. This paper introduces a framework for modeling and analyzing this contact-switching behavior through the framework of geometric mechanics on a toy robot model that can make continuous limb swings and discrete contact switches. The kinematics of this model form a hybrid shape-space and by extending th… ▽ More Discrete and periodic contact switching is a key characteristic of steady-state legged locomotion. This paper introduces a framework for modeling and analyzing this contact-switching behavior through the framework of geometric mechanics on a toy robot model that can make continuous limb swings and discrete contact switches. The kinematics of this model form a hybrid shape-space and by extending the generalized Stokes' theorem to compute discrete curvature functions called \textit{stratified panels}, we determine average locomotion generated by gaits spanning multiple contact modes. Using this tool, we also demonstrate the ability to optimize gaits based on the system's locomotion constraints and perform gait reduction on a complex gait spanning multiple contact modes to highlight the method's scalability to multilegged systems. △ Less

Submitted 20 October, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

Comments: 7 pages, 6 figures, and link to associated video: "https://drive.google.com/file/d/12Sgl0R1oDLDWRrqlwwAt3JR2Gc3rEB4T/view?usp=sharing". Link to code: "https://github.com/Animal-Inspired-Motion-And-Robotics-Lab/Paper-Geometric-Mechanics-of-Contact-Switching-Systems". Accepted to RA-L on Monday, October 16th, 2023

arXiv:2303.09103 [pdf]

doi 10.1007/s11042-022-13516-5

Machine learning based biomedical image processing for echocardiographic images

Authors: Ayesha Heena, Nagashettappa Biradar, Najmuddin M. Maroof, Surbhi Bhatia, Rashmi Agarwal, Kanta Prasad

Abstract: The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suita… ▽ More The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suitable algorithm which is simple, conceptual and computational, which provides very good accuracy in results. KNN algorithm is a unique user-friendly approach with wide range of applications in machine learning algorithms which are majorly used for the various image processing applications including classification, segmentation and regression issues of the image processing. The proposed system uses gray level co-occurrence matrix features. The trained neural network has been tested successfully on a group of echocardiographic images, errors were compared using regression plot. The results of the algorithm are tested using various quantitative as well as qualitative metrics and proven to exhibit better performance in terms of both quantitative and qualitative metrics in terms of current state-of-the-art methods in the related area. To compare the performance of trained neural network the regression analysis performed showed a good correlation. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 10 figures 4 tables

MSC Class: Computers

arXiv:2303.05155 [pdf, other]

Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts

Authors: Rohit Agarwal, Deepak Gupta, Alexander Horsch, Dilip K. Prasad

Abstract: Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in… ▽ More Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modelled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets. The code is available at https://github.com/Rohit102497/Aux-Drop. △ Less

Submitted 31 May, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted at Transactions on Machine Learning Research (TMLR). Link: https://openreview.net/pdf?id=R9CgBkeZ6Z

Journal ref: Transactions on Machine Learning Research, 2023

arXiv:2303.03050 [pdf, other]

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Authors: Rohit Agarwal, Gyanendra Das, Saksham Aggarwal, Alexander Horsch, Dilip K. Prasad

Abstract: Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant bl… ▽ More Image retrieval has garnered growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms. MABNet consists of master and assistant blocks, both learning independently through supervision and collectively via self-supervision. The master guides the assistant by providing its knowledge base as a reference for self-supervision and the assistant reports its knowledge back to the master by weight transfer. We perform extensive experiments on public datasets with and without post-processing. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023

arXiv:2303.02095 [pdf, other]

Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective

Authors: Animesh Gupta, Irtiza Hasan, Dilip K. Prasad, Deepak K. Gupta

Abstract: Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are… ▽ More Coreset selection is among the most effective ways to reduce the training time of CNNs, however, only limited is known on how the resultant models will behave under variations of the coreset size, and choice of datasets and models. Moreover, given the recent paradigm shift towards transformer-based models, it is still an open question how coreset selection would impact their performance. There are several similar intriguing questions that need to be answered for a wide acceptance of coreset selection methods, and this paper attempts to answer some of these. We present a systematic benchmarking setup and perform a rigorous comparison of different coreset selection methods on CNNs and transformers. Our investigation reveals that under certain circumstances, random selection of subsets is more robust and stable when compared with the SOTA selection methods. We demonstrate that the conventional concept of uniform subset sampling across the various classes of the data is not the appropriate choice. Rather samples should be adaptively chosen based on the complexity of the data distribution for each class. Transformers are generally pretrained on large datasets, and we show that for certain target datasets, it helps to keep their performance stable at even very small coreset sizes. We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e.g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes. Lastly, we demonstrate that in the absence of the right pretraining, CNNs are better at learning the semantic coherence between spatially distant objects within an image, and these tend to outperform transformers at almost all choices of the coreset size. △ Less

Submitted 10 March, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2303.01546 [pdf, other]

MiShape: 3D Shape Modelling of Mitochondria in Microscopy

Authors: Abhinanda R. Punnakkal, Suyog S Jadhav, Alexander Horsch, Krishna Agarwal, Dilip K. Prasad

Abstract: Fluorescence microscopy is a quintessential tool for observing cells and understanding the underlying mechanisms of life-sustaining processes of all living organisms. The problem of extracting 3D shape of mitochondria from fluorescence microscopy images remains unsolved due to the complex and varied shapes expressed by mitochondria and the poor resolving capacity of these microscopes. We propose a… ▽ More Fluorescence microscopy is a quintessential tool for observing cells and understanding the underlying mechanisms of life-sustaining processes of all living organisms. The problem of extracting 3D shape of mitochondria from fluorescence microscopy images remains unsolved due to the complex and varied shapes expressed by mitochondria and the poor resolving capacity of these microscopes. We propose an approach to bridge this gap by learning a shape prior for mitochondria termed as MiShape, by leveraging high-resolution electron microscopy data. MiShape is a generative model learned using implicit representations of mitochondrial shapes. It provides a shape distribution that can be used to generate infinite realistic mitochondrial shapes. We demonstrate the representation power of MiShape and its utility for 3D shape reconstruction given a single 2D fluorescence image or a small 3D stack of 2D slices. We also showcase applications of our method by deriving simulated fluorescence microscope datasets that have realistic 3D ground truths for the problem of 2D segmentation and microscope-to-microscope transformation. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.00095 [pdf, ps, other]

XCRYPT: Accelerating Lattice Based Cryptography with Memristor Crossbar Arrays

Authors: Sarabjeet Singh, Xiong Fan, Ananth Krishna Prasad, Lin Jia, Anirban Nag, Rajeev Balasubramonian, Mahdi Nazm Bojnordi, Elaine Shi

Abstract: This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but… ▽ More This paper makes a case for accelerating lattice-based post quantum cryptography (PQC) with memristor based crossbars, and shows that these inherently error-tolerant algorithms are a good fit for noisy analog MAC operations in crossbars. We compare different NIST round-3 lattice-based candidates for PQC, and identify that SABER is not only a front-runner when executing on traditional systems, but it is also amenable to acceleration with crossbars. SABER is a module-LWR based approach, which performs modular polynomial multiplications with rounding. We map the polynomial multiplications in SABER on crossbars and show that analog dot-products can yield a $1.7-32.5\times$ performance and energy efficiency improvement, compared to recent hardware proposals. This initial design combines the innovations in multiple state-of-the-art works -- the algorithm in SABER and the memristive acceleration principles proposed in ISAAC (for deep neural network acceleration). We then identify the bottlenecks in this initial design and introduce several additional techniques to improve its efficiency. These techniques are synergistic and especially benefit from SABER's power-of-two modulo operation. First, we show that some of the software techniques used in SABER, that are effective on CPU platforms, are unhelpful in crossbar-based accelerators. Relying on simpler algorithms further improves our efficiencies by $1.3-3.6\times$. Second, we exploit the nature of SABER's computations to stagger the operations in crossbars and share a few variable precision ADCs, resulting in up to $1.8\times$ higher efficiency. Third, to further reduce ADC pressure, we propose a simple analog Shift-and-Add technique, which results in a $1.3-6.3\times$ increase in the efficiency. Overall, our designs achieve $3-15\times$ higher efficiency over initial design, and $3-51\times$ higher than prior work. △ Less

Submitted 31 January, 2023; originally announced February 2023.

arXiv:2301.13817 [pdf, other]

Patch Gradient Descent: Training Neural Networks on Very Large Images

Authors: Deepak K. Gupta, Gowreesh Mago, Arnav Chavan, Dilip K. Prasad

Abstract: Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that… ▽ More Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.04502 [pdf, other]

Pruning Compact ConvNets for Efficient Inference

Authors: Sayan Ghosh, Karthik Prasad, Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Graham Cormode, Peter Vajda

Abstract: Neural network pruning is frequently used to compress over-parameterized networks by large amounts, while incurring only marginal drops in generalization performance. However, the impact of pruning on networks that have been highly optimized for efficient inference has not received the same level of attention. In this paper, we analyze the effect of pruning for computer vision, and study state-of-… ▽ More Neural network pruning is frequently used to compress over-parameterized networks by large amounts, while incurring only marginal drops in generalization performance. However, the impact of pruning on networks that have been highly optimized for efficient inference has not received the same level of attention. In this paper, we analyze the effect of pruning for computer vision, and study state-of-the-art ConvNets, such as the FBNetV3 family of models. We show that model pruning approaches can be used to further optimize networks trained through NAS (Neural Architecture Search). The resulting family of pruned models can consistently obtain better performance than existing FBNetV3 models at the same level of computation, and thus provide state-of-the-art results when trading off between computational complexity and generalization performance on the ImageNet benchmark. In addition to better generalization performance, we also demonstrate that when limited computation resources are available, pruning FBNetV3 models incur only a fraction of GPU-hours involved in running a full-scale NAS. △ Less

Submitted 11 January, 2023; originally announced January 2023.

arXiv:2211.13769 [pdf, other]

On Designing Light-Weight Object Trackers through Network Pruning: Use CNNs or Transformers?

Authors: Saksham Aggarwal, Taneesh Gupta, Pawan Kumar Sahu, Arnav Chavan, Rishabh Tiwari, Dilip K. Prasad, Deepak K. Gupta

Abstract: Object trackers deployed on low-power devices need to be light-weight, however, most of the current state-of-the-art (SOTA) methods rely on using compute-heavy backbones built using CNNs or transformers. Large sizes of such models do not allow their deployment in low-power conditions and designing compressed variants of large tracking models is of great importance. This paper demonstrates how high… ▽ More Object trackers deployed on low-power devices need to be light-weight, however, most of the current state-of-the-art (SOTA) methods rely on using compute-heavy backbones built using CNNs or transformers. Large sizes of such models do not allow their deployment in low-power conditions and designing compressed variants of large tracking models is of great importance. This paper demonstrates how highly compressed light-weight object trackers can be designed using neural architectural pruning of large CNN and transformer based trackers. Further, a comparative study on architectural choices best suited to design light-weight trackers is provided. A comparison between SOTA trackers using CNNs, transformers as well as the combination of the two is presented to study their stability at various compression ratios. Finally results for extreme pruning scenarios going as low as 1% in some cases are shown to study the limits of network pruning in object tracking. This work provides deeper insights into designing highly efficient trackers from existing SOTA methods. △ Less

Submitted 26 March, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted at IEEE ICASSP 2023

arXiv:2211.06739 [pdf, other]

Partial Binarization of Neural Networks for Budget-Aware Efficient Learning

Authors: Udbhav Bamba, Neeraj Anand, Saksham Aggarwal, Dilip K. Prasad, Deepak K. Gupta

Abstract: Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to… ▽ More Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to partial binarization, creating a budgeted binary neural network (B2NN) with our MixBin strategy. This method optimizes the mixing of binary and full-precision components, allowing for explicit selection of the fraction of the network to remain binary. Our experiments show that B2NNs created using MixBin outperform those from random or iterative searches and state-of-the-art layer selection methods by up to 3% on the ImageNet-1K dataset. We also show that B2NNs outperform the structured pruning baseline by approximately 23% at the extreme FLOP budget of 15%, and perform well in object tracking, with up to a 12.4% relative improvement over other baselines. Additionally, we demonstrate that B2NNs developed by MixBin can be transferred across datasets, with some cases showing improved performance over directly applying MixBin on the downstream data. △ Less

Submitted 8 November, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: Accepted at WACV 2023 Conference

arXiv:2207.12779 [pdf, other]

Reconciling Security and Communication Efficiency in Federated Learning

Authors: Karthik Prasad, Sayan Ghosh, Graham Cormode, Ilya Mironov, Ashkan Yousefpour, Pierre Stock

Abstract: Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper… ▽ More Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40$\times$ compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.09451 [pdf, ps, other]

Balancing polynomials, Fibonacci numbers and some new series for $π$

Authors: Robert Frontczak, Kalika Prasad

Abstract: We evaluate some types of infinite series with balancing and Lucas-balancing polynomials in closed form. These evaluations will lead to some new curious series for $π$ involving Fibonacci and Lucas numbers. Our findings complement those of Castellanos from 1986 and 1989. We evaluate some types of infinite series with balancing and Lucas-balancing polynomials in closed form. These evaluations will lead to some new curious series for $π$ involving Fibonacci and Lucas numbers. Our findings complement those of Castellanos from 1986 and 1989. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: 16 pages, 5 tables

MSC Class: Primary 11B37; 11B39; Secondary 15A15

arXiv:2206.12681 [pdf, other]

UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images

Authors: Deepak K. Gupta, Udbhav Bamba, Abhishek Thakur, Akash Gupta, Suraj Sharan, Ertugrul Demir, Dilip K. Prasad

Abstract: Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significa… ▽ More Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present 'UltraMNIST dataset', a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: 'UltraMNIST classification' and 'Budget-aware UltraMNIST classification'. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2202.08156 [pdf, other]

doi 10.22049/CCO.2024.28022.1419

A novel public key cryptography based on generalized Lucas matrices

Authors: Kalika Prasad, Hrishikesh Mahato, Munesh Kumari

Abstract: In this article, we have proposed a generalized Lucas matrix (recursive matrix of higher order) having relation with generalized Fibonacci sequences and established many special properties in addition to that usual matrix algebra. Further, we have proposed a modified public key cryptography using these matrices as keys in Affine cipher and key agreement for encryption-decryption with the combinati… ▽ More In this article, we have proposed a generalized Lucas matrix (recursive matrix of higher order) having relation with generalized Fibonacci sequences and established many special properties in addition to that usual matrix algebra. Further, we have proposed a modified public key cryptography using these matrices as keys in Affine cipher and key agreement for encryption-decryption with the combination of terms of generalized Lucas sequences under residue operations. In this scheme, instead of exchanging the whole key matrix, only a pair of numbers(parameters) need to be exchanged, which reduces the time complexity as well as space complexity of the key transmission and has a large key-space. △ Less

Submitted 16 February, 2022; originally announced February 2022.

Comments: 14pages

MSC Class: 11T71; 11B39; 94A60; 14G50; 68P30 ACM Class: E.3; G.2.1

Journal ref: Communications in Combinatorics and Optimization 2024

arXiv:2112.08933 [pdf, other]

Responsive parallelized architecture for deploying deep learning models in production environments

Authors: Nikhil Verma, Krishna Prasad

Abstract: Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae (CV) document. Unstructured document CV beholds candidate's portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically-refined label attention networks. D… ▽ More Recruiters can easily shortlist candidates for jobs via viewing their curriculum vitae (CV) document. Unstructured document CV beholds candidate's portfolio and named entities listing details. The main aim of this study is to design and propose a web oriented, highly responsive, computational pipeline that systematically predicts CV entities using hierarchically-refined label attention networks. Deep learning models specialized for named entity recognition were trained on large dataset to predict relevant fields. The article suggests an optimal strategy to use a number of deep learning models in parallel and predict in real time. We demonstrate selection of light weight micro web framework using Analytical Hierarchy Processing algorithm and focus on an approach useful to deploy large deep learning model-based pipelines in production ready environments using microservices. Deployed models and architecture proposed helped in parsing normal CV in less than 700 milliseconds for sequential flow of requests. △ Less

Submitted 10 July, 2023; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 20 Pages

arXiv:2111.13364 [pdf]

Optimal Technical Indicator-based Trading Strategies Using NSGA-II

Authors: P. Shanmukh Kali Prasad, Vadlamani Madhav, Ramanuj Lal, Vadlamani Ravi

Abstract: This paper proposes non-dominated sorting genetic algorithm-II (NSGA-II ) in the context of technical indicator-based stock trading, by finding optimal combinations of technical indicators to generate buy and sell strategies such that the objectives, namely, Sharpe ratio and Maximum Drawdown are maximized and minimized respectively. NSGA-II is chosen because it is a very popular and powerful bi-ob… ▽ More This paper proposes non-dominated sorting genetic algorithm-II (NSGA-II ) in the context of technical indicator-based stock trading, by finding optimal combinations of technical indicators to generate buy and sell strategies such that the objectives, namely, Sharpe ratio and Maximum Drawdown are maximized and minimized respectively. NSGA-II is chosen because it is a very popular and powerful bi-objective evolutionary algorithm. The training and testing used a rolling-based approach (two years training and a year for testing) and thus the results of the approach seem to be considerably better in stable periods without major economic fluctuations. Further, another important contribution of this study is to incorporate the transaction cost and domain expertise in the whole modeling approach. △ Less

Submitted 25 January, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: 24 pages; 1 figure; 11 Tables

MSC Class: 68W50 ACM Class: I.2

arXiv:2111.09109 [pdf, other]

Physics-guided Loss Functions Improve Deep Learning Performance in Inverse Scattering

Authors: Zicheng Liu, Mayank Roy, Dilip K. Prasad, Krishna Agarwal

Abstract: Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algor… ▽ More Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost. Recently, deep neural network (DNN) techniques have been successfully applied on ISPs and shown potential of superior imaging over conventional methods. In this paper, we analyse the analogy between DNN solvers and traditional iterative algorithms and discuss how important physical phenomena cannot be effectively incorporated in the training process. We show the importance of including near-field priors in the learning process of DNNs. To this end, we propose new designs of loss functions which incorporate multiple-scattering based near-field quantities (such as scattered fields or induced currents within domain of interest). Effects of physics-guided loss functions are studied using a variety of numerical experiments. Pros and cons of the investigated ISP solvers with different loss functions are summarized. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2109.12298 [pdf, other]

Opacus: User-Friendly Differential Privacy Library in PyTorch

Authors: Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, Ilya Mironov

Abstract: We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of… ▽ More We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of layers, including multi-head attention, convolution, LSTM, GRU (and generic RNN), and embedding, right out of the box and provides the means for supporting other user-defined layers. Opacus computes batched per-sample gradients, providing higher efficiency compared to the traditional "micro batch" approach. In this paper we present Opacus, detail the principles that drove its implementation and unique features, and benchmark it against other frameworks for training models with differential privacy as well as standard PyTorch. △ Less

Submitted 22 August, 2022; v1 submitted 25 September, 2021; originally announced September 2021.

Comments: Privacy in Machine Learning (PriML) workshop, NeurIPS 2021

arXiv:2108.08497 [pdf, other]

Monarch: A Durable Polymorphic Memory For Data Intensive Applications

Authors: Ananth Krishna Prasad, Mahdi Nazm Bojnordi

Abstract: 3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories. Also, DRAM is facing serious scalability challenges that make alternative technologies more appealing. This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array c… ▽ More 3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories. Also, DRAM is facing serious scalability challenges that make alternative technologies more appealing. This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM. The XAM array is capable of switching between random access and content-addressable modes, which enables Monarch (i) to better utilize the in-package bandwidth and (ii) to satisfy both the random access memory and associative search requirements of various applications. Moreover, the Monarch controller ensures a given target lifetime for the resistive stack. Our simulation results on a set of parallel memory-intensive applications indicate that Monarch outperforms an ideal DRAM caching by 1.21x on average. For in-memory hash table and string matching workloads, Monarch improves performance up to 12x over the conventional high bandwidth memories. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: Submitted to IEEE TC

ACM Class: B.3; E.2

arXiv:2106.03408 [pdf, other]

Antipodes of Label Differential Privacy: PATE and ALIBI

Authors: Mani Malek, Ilya Mironov, Karthik Prasad, Igor Shilov, Florian Tramèr

Abstract: We consider the privacy-preserving machine learning (ML) setting where the trained model must satisfy differential privacy (DP) with respect to the labels of the training examples. We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks. While recent work by Ghazi et al. proposed Label DP sch… ▽ More We consider the privacy-preserving machine learning (ML) setting where the trained model must satisfy differential privacy (DP) with respect to the labels of the training examples. We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks. While recent work by Ghazi et al. proposed Label DP schemes based on a randomized response mechanism, we argue that additive Laplace noise coupled with Bayesian inference (ALIBI) is a better fit for typical ML tasks. Moreover, we show how to achieve very strong privacy levels in some regimes, with our adaptation of the PATE framework that builds on recent advances in semi-supervised learning. We complement theoretical analysis of our algorithms' privacy guarantees with empirical evaluation of their memorization properties. Our evaluation suggests that comparing different algorithms according to their provable DP guarantees can be misleading and favor a less private algorithm with a tighter analysis. Code for implementation of algorithms and memorization attacks is available from https://github.com/facebookresearch/label_dp_antipodes. △ Less

Submitted 29 October, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: 2021 Conference on Neural Information Processing Systems (NeurIPS)

arXiv:2010.08776 [pdf, other]

The NVIDIA PilotNet Experiments

Authors: Mariusz Bojarski, Chenyi Chen, Joyjit Daw, Alperen Değirmenci, Joya Deri, Bernhard Firner, Beat Flepp, Sachin Gogri, Jesse Hong, Lawrence Jackel, Zhenhua Jia, BJ Lee, Bo Liu, Fei Liu, Urs Muller, Samuel Payne, Nischal Kota Nagendra Prasad, Artem Provodin, John Roach, Timur Rvachov, Neha Tadimeti, Jesper van Engelen, Haiguang Wen, Eric Yang, Zongyi Yang

Abstract: Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as i… ▽ More Four years ago, an experimental system known as PilotNet became the first NVIDIA system to steer an autonomous car along a roadway. This system represents a departure from the classical approach for self-driving in which the process is manually decomposed into a series of modules, each performing a different task. In PilotNet, on the other hand, a single deep neural network (DNN) takes pixels as input and produces a desired vehicle trajectory as output; there are no distinct internal modules connected by human-designed interfaces. We believe that handcrafted interfaces ultimately limit performance by restricting information flow through the system and that a learned approach, in combination with other artificial intelligence systems that add redundancy, will lead to better overall performing systems. We continue to conduct research toward that goal. This document describes the PilotNet lane-keeping effort, carried out over the past five years by our NVIDIA PilotNet group in Holmdel, New Jersey. Here we present a snapshot of system status in mid-2020 and highlight some of the work done by the PilotNet group. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2008.12617 [pdf, other]

Simulation-supervised deep learning for analysing organelles states and behaviour in living cells

Authors: Arif Ahmed Sekh, Ida S. Opstad, Rohit Agarwal, Asa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Krishna Agarwal, Dilip K. Prasad

Abstract: In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy v… ▽ More In many real-world scientific problems, generating ground truth (GT) for supervised learning is almost impossible. The causes include limitations imposed by scientific instrument, physical phenomenon itself, or the complexity of modeling. Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy videos of living cells is a prime example. The 3D blurring function of microscope, digital resolution from pixel size, optical resolution due to the character of light, noise characteristics, and complex 3D deformable shapes of mitochondria, all contribute to making this problem GT hard. Manual segmentation of 100s of mitochondria across 1000s of frames and then across many such videos is not only herculean but also physically inaccurate because of the instrument and phenomena imposed limitations. Unsupervised learning produces less than optimal results and accuracy is important if inferences relevant to therapy are to be derived. In order to solve this unsurmountable problem, we bring modeling and deep learning to a nexus. We show that accurate physics based modeling of microscopy data including all its limitations can be the solution for generating simulated training datasets for supervised learning. We show here that our simulation-supervised segmentation approach is a great enabler for studying mitochondrial states and behaviour in heart muscle cells, where mitochondria have a significant role to play in the health of the cells. We report unprecedented mean IoU score of 91% for binary segmentation (19% better than the best performing unsupervised approach) of mitochondria in actual microscopy videos of living cells. We further demonstrate the possibility of performing multi-class classification, tracking, and morphology associated analytics at the scale of individual mitochondrion. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: under review at NIPS 2020

arXiv:2008.11828 [pdf, other]

Auxiliary Network: Scalable and agile online learning for dynamic system with inconsistently available inputs

Authors: Rohit Agarwal, Arif Ahmed Sekh, Krishna Agarwal, Dilip K. Prasad

Abstract: Streaming classification methods assume the number of input features is fixed and always received. But in many real-world scenarios demand is some input features are reliable while others are unreliable or inconsistent. In this paper, we propose a novel deep learning-based model called Auxiliary Network (Aux-Net), which is scalable and agile. It employs a weighted ensemble of classifiers to give a… ▽ More Streaming classification methods assume the number of input features is fixed and always received. But in many real-world scenarios demand is some input features are reliable while others are unreliable or inconsistent. In this paper, we propose a novel deep learning-based model called Auxiliary Network (Aux-Net), which is scalable and agile. It employs a weighted ensemble of classifiers to give a final outcome. The Aux-Net model is based on the hedging algorithm and online gradient descent. It employs a model of varying depth in an online setting using single pass learning. Aux-Net is a foundational work towards scalable neural network model for a dynamic complex environment requiring ad hoc or inconsistent input data. The efficacy of Aux-Net is shown on public dataset. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: under review at NIPS 2020

arXiv:2008.06713 [pdf, other]

Single image dehazing for a variety of haze scenarios using back projected pyramid network

Authors: Ayush Singh, Ajay Bhave, Dilip K. Prasad

Abstract: Learning to dehaze single hazy images, especially using a small training dataset is quite challenging. We propose a novel generative adversarial network architecture for this problem, namely back projected pyramid network (BPPNet), that gives good performance for a variety of challenging haze conditions, including dense haze and inhomogeneous haze. Our architecture incorporates learning of multipl… ▽ More Learning to dehaze single hazy images, especially using a small training dataset is quite challenging. We propose a novel generative adversarial network architecture for this problem, namely back projected pyramid network (BPPNet), that gives good performance for a variety of challenging haze conditions, including dense haze and inhomogeneous haze. Our architecture incorporates learning of multiple levels of complexities while retaining spatial context through iterative blocks of UNets and structural information of multiple scales through a novel pyramidal convolution block. These blocks together for the generator and are amenable to learning through back projection. We have shown that our network can be trained without over-fitting using as few as 20 image pairs of hazy and non-hazy images. We report the state of the art performances on NTIRE 2018 homogeneous haze datasets for indoor and outdoor images, NTIRE 2019 denseHaze dataset, and NTIRE 2020 non-homogeneous haze dataset. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 16 pages, 8 figures, to be published in Computer Vision ECCV 2020 Workshops

arXiv:2007.16147 [pdf]

doi 10.5772/intechopen.78497

Computer and Network Security

Authors: Jaydip Sen, Sidra Mehtab, Michael Ekonde Sone, Veeramreddy Jyothsna, Koneti Munivara Prasad, Rajeev Singh, Teek Parval Sharma, Anton Noskov, Ignacio Velasquez, Angelica Caro, Alfonco Rodriguez, Tamer S. A. Fatayer, Altaf O. Mulani, Pradeep B. Mane, Roshan Chitrakar, Roshan Bhusal, Prajwol Maharjan

Abstract: In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis and storage of such humongous volume of data, several new challenges are faced in protecting privacy of sensitive data and securing systems by designing novel schemes for secure authentication, integrity protection, encryption and non-repudiation. Lightwei… ▽ More In the era of Internet of Things and with the explosive worldwide growth of electronic data volume, and associated need of processing, analysis and storage of such humongous volume of data, several new challenges are faced in protecting privacy of sensitive data and securing systems by designing novel schemes for secure authentication, integrity protection, encryption and non-repudiation. Lightweight symmetric key cryptography and adaptive network security algorithms are in demand for mitigating these challenges. This book presents some of the state-of-the-art research work in the field of cryptography and security in computing and communications. It is a valuable source of knowledge for researchers, engineers, practitioners, graduate and doctoral students who are working in the field of cryptography, network security and security and privacy issues in the Internet of Things (IoT), and machine learning application in security. It will also be useful for faculty members of graduate schools and universities. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: 175 pages, 87 figures and 44 Tables

arXiv:2007.05954 [pdf, other]

Changing Clusters of Indian States with respect to number of Cases of COVID-19 using incrementalKMN Method

Authors: Rabinder Kumar Prasad, Rosy Sarmah, Subrata Chakraborty

Abstract: The novel Coronavirus (COVID-19) incidence in India is currently experiencing exponential rise but with apparent spatial variation in growth rate and doubling time rate. We classify the states into five clusters with low to the high-risk category and study how the different states moved from one cluster to the other since the onset of the first case on $30^{th}$ January 2020 till the end of unlock… ▽ More The novel Coronavirus (COVID-19) incidence in India is currently experiencing exponential rise but with apparent spatial variation in growth rate and doubling time rate. We classify the states into five clusters with low to the high-risk category and study how the different states moved from one cluster to the other since the onset of the first case on $30^{th}$ January 2020 till the end of unlock 1 that is $30^{th}$ June 2020. We have implemented a new clustering technique called the incrementalKMN (Prasad, R. K., Sarmah, R., Chakraborty, S.(2019)) △ Less

Submitted 12 July, 2020; originally announced July 2020.

arXiv:2004.09982 [pdf, other]

A review on mathematical strength and analysis of Enigma

Authors: Kalika Prasad, Munesh Kumari

Abstract: In this review article, we discussed the Mathematics and mechanics behind the Enigma machine with an analysis of security strength. The German army used the Enigma machine during the second world war to encrypt communications. Due to its complexity, the encryption done by the Enigma Machine was assumed to be almost unbreakable. However, the Polish believed that people with good background and deep… ▽ More In this review article, we discussed the Mathematics and mechanics behind the Enigma machine with an analysis of security strength. The German army used the Enigma machine during the second world war to encrypt communications. Due to its complexity, the encryption done by the Enigma Machine was assumed to be almost unbreakable. However, the Polish believed that people with good background and deep knowledge of science and mathematics would have a better chance to break the encryption done by Enigma. They appointed twenty mathematicians from Poznan University to work on this problem at the Polish Cipher Bureau. Three of those, Marian Rejewski, Jerzy Rozycki and Henryk Zygalski were able to exploit certain flaws in the encryption, and by using permutation group theory finally managed to decipher the Enigma messages. The mathematics discovered by them is presented here. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: Enigma Machine, Mathematical Strength

arXiv:2004.00959 [pdf, other]

doi 10.3390/app10186448

Neural network based country wise risk prediction of COVID-19

Authors: Ratnabali Pal, Arif Ahmed Sekh, Samarjit Kar, Dilip K. Prasad

Abstract: The recent worldwide outbreak of the novel coronavirus (COVID-19) has opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and th… ▽ More The recent worldwide outbreak of the novel coronavirus (COVID-19) has opened up new challenges to the research community. Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic. Such predictions can be helpful to control and prevent the spread of such diseases. The main challenges of applying AI is the small volume of data and the uncertain nature. Here, we propose a shallow long short-term memory (LSTM) based neural network to predict the risk category of a country. We have used a Bayesian optimization framework to optimize and automatically design country-specific networks. The results show that the proposed pipeline outperforms state-of-the-art methods for data of 180 countries and can be a useful tool for such risk categorization. We have also experimented with the trend data and weather data combined for the prediction. The outcome shows that the weather does not have a significant role. The tool can be used to predict long-duration outbreak of such an epidemic such that we can take preventive steps earlier △ Less

Submitted 16 September, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

Journal ref: Applied Sciences, 2020

arXiv:2003.11936 [pdf, ps, other]

doi 10.1080/09720529.2020.1838744

Cryptography using generalized Fibonacci matrices with Affine-Hill cipher

Authors: Kalika Prasad, Hrishikesh Mahato

Abstract: In this article, we have proposed a public key cryptography using Affine-Hill cipher with a generalized Fibonacci matrix(called multinacci matrix). Also proposed a key establishment(exchange of key matrix $K=Q_λ^{k}$ of order $λ\timesλ$ for encryption-decryption) scheme with the help of multinacci sequences under prime modulo. In this scheme, instead of exchanging key matrix, we need to exchange t… ▽ More In this article, we have proposed a public key cryptography using Affine-Hill cipher with a generalized Fibonacci matrix(called multinacci matrix). Also proposed a key establishment(exchange of key matrix $K=Q_λ^{k}$ of order $λ\timesλ$ for encryption-decryption) scheme with the help of multinacci sequences under prime modulo. In this scheme, instead of exchanging key matrix, we need to exchange the only pair of numbers $(λ, k)$, which reduces the time complexity as well as space complexity and comes with a large key-space. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: Construction, development and efficiency

MSC Class: 11T71; 11B39; 14G50; 68P30; 68R01; 94A60

arXiv:1907.00058 [pdf, other]

Explainable Anatomical Shape Analysis through Deep Hierarchical Generative Models

Authors: Carlo Biffi, Juan J. Cerrolaza, Giacomo Tarroni, Wenjia Bai, Antonio de Marvao, Ozan Oktay, Christian Ledig, Loic Le Folgoc, Konstantinos Kamnitsas, Georgia Doumou, Jinming Duan, Sanjay K. Prasad, Stuart A. Cook, Declan P. O'Regan, Daniel Rueckert

Abstract: Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack in… ▽ More Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer's disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging. △ Less

Submitted 4 January, 2020; v1 submitted 28 June, 2019; originally announced July 2019.

Comments: Accepted for publication in IEEE Transactions on Medical Imaging (TMI)

arXiv:1906.10210 [pdf, other]

doi 10.1371/journal.pone.0238267

A laser-microfabricated electrohydrodynamic thruster for centimeter-scale aerial robots

Authors: Hari Krishna Hari Prasad, Ravi Sankar Vaddi, Yogesh M Chukewad, Elma Dedic, Igor Novosselov, Sawyer B Fuller

Abstract: To date, insect scale robots capable of controlled flight have used flapping wings for generating lift, but this requires a complex and failure-prone mechanism. A simpler alternative is electrohydrodynamic (EHD) thrust, which requires no moving mechanical parts. In EHD, corona discharge generates a flow of ions in an electric field between two electrodes; the high-velocity ions transfer their kine… ▽ More To date, insect scale robots capable of controlled flight have used flapping wings for generating lift, but this requires a complex and failure-prone mechanism. A simpler alternative is electrohydrodynamic (EHD) thrust, which requires no moving mechanical parts. In EHD, corona discharge generates a flow of ions in an electric field between two electrodes; the high-velocity ions transfer their kinetic energy to neutral air molecules through collisions, accelerating the gas and creating thrust. We introduce a fabrication process for EHD thruster based on 355 nm laser micromachining and our approach allows for greater flexibility in materials selection. Our four-thruster device measures 1.8 x 2.5 cm and is composed of steel emitters and a lightweight carbon fiber mesh. The current and thrust characteristics of each individual thruster of the quad thruster is determined and agrees with Townsend relation. The mass of the quad thruster is 37 mg and the measured thrust is greater than its weight (362.6 uN). The robot is able to lift off at a voltage of 4.6 kV with a thrust to weight ratio of 1.38. △ Less

Submitted 14 January, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: Co-primary authors: Hari Krishna Hari Prasad, Ravi Sankar Vaddi, and Yogesh M Chukewad Submitted to PLOS ONE

Report number: PLoS ONE 15(4): e0231362

Journal ref: 2020

arXiv:1902.05657 [pdf, other]

TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC

Authors: Somdip Dey, Amit K. Singh, Dilip K. Prasad, Klaus D. McDonald-Maier

Abstract: Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-… ▽ More Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel bio-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless ANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs. △ Less

Submitted 18 February, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

Comments: 11 pages, 5 figures, 2 tables

ACM Class: I.4; I.2.1; C.1.4

arXiv:1902.04955 [pdf, other]

Can We Automate Diagrammatic Reasoning?

Authors: Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, Partha Pratim Roy, Dilip K. Prasad

Abstract: Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on vis… ▽ More Learning to solve diagrammatic reasoning (DR) can be a challenging but interesting problem to the computer vision research community. It is believed that next generation pattern recognition applications should be able to simulate human brain to understand and analyze reasoning of images. However, due to the lack of benchmarks of diagrammatic reasoning, the present research primarily focuses on visual reasoning that can be applied to real-world objects. In this paper, we present a diagrammatic reasoning dataset that provides a large variety of DR problems. In addition, we also propose a Knowledge-based Long Short Term Memory (KLSTM) to solve diagrammatic reasoning problems. Our proposed analysis is arguably the first work in this research area. Several state-of-the-art learning frameworks have been used to compare with the proposed KLSTM framework in the present context. Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1901.05107 [pdf, other]

Actions Speak Louder Than (Pass)words: Passive Authentication of Smartphone Users via Deep Temporal Features

Authors: Debayan Deb, Arun Ross, Anil K. Jain, Kwaku Prakah-Asante, K. Venkatesh Prasad

Abstract: Prevailing user authentication schemes on smartphones rely on explicit user interaction, where a user types in a passcode or presents a biometric cue such as face, fingerprint, or iris. In addition to being cumbersome and obtrusive to the users, such authentication mechanisms pose security and privacy concerns. Passive authentication systems can tackle these challenges by frequently and unobtrusiv… ▽ More Prevailing user authentication schemes on smartphones rely on explicit user interaction, where a user types in a passcode or presents a biometric cue such as face, fingerprint, or iris. In addition to being cumbersome and obtrusive to the users, such authentication mechanisms pose security and privacy concerns. Passive authentication systems can tackle these challenges by frequently and unobtrusively monitoring the user's interaction with the device. In this paper, we propose a Siamese Long Short-Term Memory network architecture for passive authentication, where users can be verified without requiring any explicit authentication step. We acquired a dataset comprising of measurements from 30 smartphone sensor modalities for 37 users. We evaluate our approach on 8 dominant modalities, namely, keystroke dynamics, GPS location, accelerometer, gyroscope, magnetometer, linear accelerometer, gravity, and rotation sensors. Experimental results find that, within 3 seconds, a genuine user can be correctly verified 97.15% of the time at a false accept rate of 0.1%. △ Less

Submitted 15 January, 2019; originally announced January 2019.

arXiv:1812.09271 [pdf]

Polygonal approximation of digital planar curve using novel significant measure

Authors: Mangayarkarasi Ramaiah, Dilip K. Prasad

Abstract: This paper presents an iterative smoothing technique for polygonal approximation of digital image boundary. The technique starts with finest initial segmentation points of a curve. The contribution of initially segmented points towards preserving the original shape of the image boundary is determined by computing the significant measure of every initial segmentation points which is sensitive to sh… ▽ More This paper presents an iterative smoothing technique for polygonal approximation of digital image boundary. The technique starts with finest initial segmentation points of a curve. The contribution of initially segmented points towards preserving the original shape of the image boundary is determined by computing the significant measure of every initial segmentation points which is sensitive to sharp turns, which may be missed easily when conventional significant measures are used for detecting dominant points. The proposed method differentiates between the situations when a point on the curve between two points on a curve projects directly upon the line segment or beyond this line segment. It not only identifies these situations, but also computes its significant contribution for these situations differently. This situation-specific treatment allows preservation of points with high curvature even as revised set of dominant points are derived. The experimental results show that the proposed technique competes well with the state of the art techniques. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 17 pages,15 figures

arXiv:1810.08317 [pdf]

Enabling Grasp Action: Generalized Evaluation of Grasp Stability via Contact Stiffness from Contact Mechanics Insight

Authors: Huixu Dong, Chen Qiu, Dilip K. Prasad, Ye Pan, Jiansheng Dai, I-Ming Chen

Abstract: Performing a grasp is a pivotal capability for a robotic gripper. We propose a new evaluation approach of grasping stability via constructing a model of grasping stiffness based on the theory of contact mechanics. First, the mathematical models are built to explore soft contact and the general grasp stiffness between a finger and an object. Next, the grasping stiffness matrix is constructed to ref… ▽ More Performing a grasp is a pivotal capability for a robotic gripper. We propose a new evaluation approach of grasping stability via constructing a model of grasping stiffness based on the theory of contact mechanics. First, the mathematical models are built to explore soft contact and the general grasp stiffness between a finger and an object. Next, the grasping stiffness matrix is constructed to reflect the normal, tangential and torsion stiffness coefficients. Finally, we design two grasping cases to verify the proposed measurement criterion of grasping stability by comparing different grasping configurations. Specifically, a standard grasping index is used and compared with the minimum eigenvalue index of the constructed grasping stiffness we built. The comparison result reveals a similar tendency between them for measuring the grasping stability and thus, validates the proposed approach. △ Less

Submitted 18 October, 2018; originally announced October 2018.

Comments: 12 pages, 14 figures

Showing 1–50 of 96 results for author: Prasad, K