subscribe to arXiv mailings

Invited: Human-Inspired Distributed Wearable AI

Abstract: The explosive surge in Human-AI interactions, fused with a soaring fascination in wearable technology, has ignited a frenzy of innovation and the emergence of a myriad of Wearable AI devices, each wielding diverse form factors, tackling tasks from health surveillance to turbocharging productivity. This paper delves into the vision for wearable AI technology, addressing the technical bottlenecks th… ▽ More The explosive surge in Human-AI interactions, fused with a soaring fascination in wearable technology, has ignited a frenzy of innovation and the emergence of a myriad of Wearable AI devices, each wielding diverse form factors, tackling tasks from health surveillance to turbocharging productivity. This paper delves into the vision for wearable AI technology, addressing the technical bottlenecks that stand in the way of its promised advancements. Embracing a paradigm shift, we introduce a Human-Inspired Distributed Network for Wearable AI, enabled by high-speed ultra-low-power secure connectivity via the emerging 'Body as a Wire' (Wi-R) technology. This breakthrough acts as the missing link: the artificial nervous system, seamlessly interconnecting all wearables and implantables, ushering in a new era of interconnected intelligence, where featherweight, perpetually operating wearable AI nodes redefine the boundaries of possibility. △ Less

Submitted 12 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures, DAC 2024

arXiv:2406.08220 [pdf, other]

Efficient Communication and Powering for Smart Contact Lens with Resonant Magneto-Quasistatic Coupling

Authors: Sukriti Shaw, Mayukh Nath, Arunashish Datta, Shreyas Sen

Abstract: A two-coil wearable system is proposed for wireless communication and powering between a transmitter coil in a necklace and a receiver coil in a smart contact lens, where the necklace is invisible in contrast to coils embedded in wearables like spectacles or headbands. Magneto-quasistatic(MQS) field coupling facilitates communication between the transmitter in the necklace and the contact lens rec… ▽ More A two-coil wearable system is proposed for wireless communication and powering between a transmitter coil in a necklace and a receiver coil in a smart contact lens, where the necklace is invisible in contrast to coils embedded in wearables like spectacles or headbands. Magneto-quasistatic(MQS) field coupling facilitates communication between the transmitter in the necklace and the contact lens receiver, enabling AR/VR and health monitoring. As long as the receiver coil remains within the magnetic field generated by the transmitter, continuous communication is sustained through MQS field coupling despite the misalignments present. Resonant frequency tuning enhances system efficiency. The system's performance was tested for coil misalignments, showing a maximum path loss variation within $10 dB$ across scenarios, indicating robustness. Finite Element Method(FEM) analysis has been used to study the system for efficient wireless data transfer and powering. A communication channel capacity is $4.5 Mbps$ over a $1 MHz$ bandwidth. Simulations show negligible path loss differences with or without human tissues, as magnetic coupling remains unaffected at MQS frequencies below $30 MHz$ due to similar magnetic permeability of tissues and air. Therefore, the possibility of efficient communication and powering of smart contact lenses through a necklace is shown for the first time using resonant MQS coupling at an axial distance of $15cm$ and lateral distance of over $9cm$ to enable AR/VR and health monitoring on the contact lens. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 9 pages, 12 figures

arXiv:2402.02691 [pdf]

ALIVE: A Low-Cost Interactive Vaccine Storage Environment Module ensuring easy portability and remote tracking of operational logistics to the last mile

Authors: Arkadeep Datta, Arani Mukhopadhyay, Amitava Datta, Ranjan Ganguly

Abstract: The COVID-19 pandemic has profoundly reshaped our lives, prompting a search for solutions to its far-reaching effects. Vaccines emerged as a beacon of hope, yet reaching remote areas faces last-mile hurdles and cost issues due to loss of vaccine potency due to poor temperature regulation of the storage units and unanticipated vaccine wastage en route, a common occurrence in conventional vaccine tr… ▽ More The COVID-19 pandemic has profoundly reshaped our lives, prompting a search for solutions to its far-reaching effects. Vaccines emerged as a beacon of hope, yet reaching remote areas faces last-mile hurdles and cost issues due to loss of vaccine potency due to poor temperature regulation of the storage units and unanticipated vaccine wastage en route, a common occurrence in conventional vaccine transportation methods. We introduce ALIVE, a low-cost Interactive Vaccine Storage Environment module. ALIVE provides an off-grid, self-sufficient solution for vaccine storage and transport, enabled by active cooling technology. ALIVE's innovation lies in its integration with the Internet of Things (IoT), allowing real-time monitoring and control. This IoT-enabled Application Programming Interface (API) features a data acquisition and environment parameter control system, managing oversight and decision-making. ALIVE's compact, lightweight design makes it adaptable to various logistical scenarios, while its versatility enables it to maintain both time-invariant and time-dependent thermophysical and spatial parameters. Operationalized through a PID algorithm, ALIVE ensures precise temperature control within the vaccine chamber. Its dynamic features, such as remote actuation and data sharing, demonstrate its adaptability and potential applications. Despite the frugal nature of development, the system promises significant benefits, including reduced vaccine loss and remote monitoring advantages. Collaborations with healthcare partners seek to further enhance ALIVE's readiness and expand its impact. ALIVE revolutionizes vaccine logistics, offering scalable, cost-effective solutions for bridging accessibility gaps in challenging distribution scenarios. Its adaptability positions it for widespread application, from last-mile vaccine delivery to environment-controlled supply chains and beyond. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Presented at the International Conference on Robotics, Control, Automation, and Artificial Intelligence (RCAAI 2023). Corresponding: arkadeepdatta@gmail.com

arXiv:2401.10419 [pdf]

M3BUNet: Mobile Mean Max UNet for Pancreas Segmentation on CT-Scans

Authors: Juwita juwita, Ghulam Mubashar Hassan, Naveed Akhtar, Amitava Datta

Abstract: Segmenting organs in CT scan images is a necessary process for multiple downstream medical image analysis tasks. Currently, manual CT scan segmentation by radiologists is prevalent, especially for organs like the pancreas, which requires a high level of domain expertise for reliable segmentation due to factors like small organ size, occlusion, and varying shapes. When resorting to automated pancre… ▽ More Segmenting organs in CT scan images is a necessary process for multiple downstream medical image analysis tasks. Currently, manual CT scan segmentation by radiologists is prevalent, especially for organs like the pancreas, which requires a high level of domain expertise for reliable segmentation due to factors like small organ size, occlusion, and varying shapes. When resorting to automated pancreas segmentation, these factors translate to limited reliable labeled data to train effective segmentation models. Consequently, the performance of contemporary pancreas segmentation models is still not within acceptable ranges. To improve that, we propose M3BUNet, a fusion of MobileNet and U-Net neural networks, equipped with a novel Mean-Max (MM) attention that operates in two stages to gradually segment pancreas CT images from coarse to fine with mask guidance for object detection. This approach empowers the network to surpass segmentation performance achieved by similar network architectures and achieve results that are on par with complex state-of-the-art methods, all while maintaining a low parameter count. Additionally, we introduce external contour segmentation as a preprocessing step for the coarse stage to assist in the segmentation process through image standardization. For the fine segmentation stage, we found that applying a wavelet decomposition filter to create multi-input images enhances pancreas segmentation performance. We extensively evaluate our approach on the widely known NIH pancreas dataset and MSD pancreas dataset. Our approach demonstrates a considerable performance improvement, achieving an average Dice Similarity Coefficient (DSC) value of up to 89.53% and an Intersection Over Union (IOU) score of up to 81.16 for the NIH pancreas dataset, and 88.60% DSC and 79.90% IOU for the MSD Pancreas dataset. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2309.04505 [pdf, other]

doi 10.1109/TrustCom60117.2023.00377

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

Authors: Asmaa Shati, Ghulam Mubashar Hassan, Amitava Datta

Abstract: A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, re… ▽ More A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Despite this, recent advancements, such as cough audio recordings, have emerged as a means to automate the detection of respiratory conditions. Therefore, this research aims to explore various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. It investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, when applied to two machine learning algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and therefore proposes an efficient CovCepNet detection system. The proposed system provides a practical solution and demonstrates state-of-the-art classification performance, with an AUC of 0.843 on the COUGHVID dataset and 0.953 on the Virufy dataset for COVID-19 detection from cough audio signals. △ Less

Submitted 18 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: 8 pages, 3 figures

Journal ref: 2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Exeter, United Kingdom, 2023, pp. 2706-2713

arXiv:2307.11213 [pdf, other]

Invited: Can Wi-R enable perpetual IoB nodes?

Authors: Arunashish Datta, Shreyas Sen

Abstract: While the number of wearables is steadily growing, the wearables/person wearing them faces a limitation due to the need for charging all of them every day. To unlock the true power of IoB, we need to make these IoB nodes perpetual. However, that is not possible with today's technology. In this paper, we will debate, whether with the advent of Wi-R protocol that uses the body to communicate at 100X… ▽ More While the number of wearables is steadily growing, the wearables/person wearing them faces a limitation due to the need for charging all of them every day. To unlock the true power of IoB, we need to make these IoB nodes perpetual. However, that is not possible with today's technology. In this paper, we will debate, whether with the advent of Wi-R protocol that uses the body to communicate at 100X lower energy that BTLE/Wi-Fi, is it going to be possible to enable the long-standing desire of perpetual sensing/actuation nodes for the Internet of Bodies. △ Less

Submitted 8 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 5 pages, 5 figures

arXiv:2307.05373 [pdf, other]

Classification of sleep stages from EEG, EOG and EMG signals by SSNet

Authors: Haifa Almutairi, Ghulam Mubashar Hassan, Amitava Datta

Abstract: Classification of sleep stages plays an essential role in diagnosing sleep-related diseases including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from t… ▽ More Classification of sleep stages plays an essential role in diagnosing sleep-related diseases including Sleep Disorder Breathing (SDB) disease. In this study, we propose an end-to-end deep learning architecture, named SSNet, which comprises of two deep learning networks based on Convolutional Neuron Networks (CNN) and Long Short Term Memory (LSTM). Both deep learning networks extract features from the combination of Electrooculogram (EOG), Electroencephalogram (EEG), and Electromyogram (EMG) signals, as each signal has distinct features that help in the classification of sleep stages. The features produced by the two-deep learning networks are concatenated to pass to the fully connected layer for the classification. The performance of our proposed model is evaluated by using two public datasets Sleep-EDF Expanded dataset and ISRUC-Sleep dataset. The accuracy and Kappa coefficient are 96.36% and 93.40% respectively, for classifying three classes of sleep stages using Sleep-EDF Expanded dataset. Whereas, the accuracy and Kappa coefficient are 96.57% and 83.05% respectively for five classes of sleep stages using Sleep-EDF Expanded dataset. Our model achieves the best performance in classifying sleep stages when compared with the state-of-the-art techniques. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2304.13965 [pdf]

doi 10.1109/EMBC40787.2023.10341194

Patient Independent Interictal Epileptiform Discharge Detection

Authors: Matthew McDougall, Hezam Albaqami, Ghulam Mubashar Hassan, Amitava Datta

Abstract: Epilepsy is a highly prevalent brain condition with many serious complications arising from it. The majority of patients which present to a clinic and undergo electroencephalogram (EEG) monitoring would be unlikely to experience seizures during the examination period, thus the presence of interictal epileptiform discharges (IEDs) become effective markers for the diagnosis of epilepsy. Furthermore,… ▽ More Epilepsy is a highly prevalent brain condition with many serious complications arising from it. The majority of patients which present to a clinic and undergo electroencephalogram (EEG) monitoring would be unlikely to experience seizures during the examination period, thus the presence of interictal epileptiform discharges (IEDs) become effective markers for the diagnosis of epilepsy. Furthermore, IED shapes and patterns are highly variable across individuals, yet trained experts are still able to identify them through EEG recordings - meaning that commonalities exist across IEDs that an algorithm can be trained on to detect and generalise to the larger population. This research proposes an IED detection system for the binary classification of epilepsy using scalp EEG recordings. The proposed system features an ensemble based deep learning method to boost the performance of a residual convolutional neural network, and a bidirectional long short-term memory network. This is implemented using raw EEG data, sourced from Temple University Hospital's EEG Epilepsy Corpus, and is found to outperform the current state of the art model for IED detection across the same dataset. The achieved accuracy and Area Under Curve (AUC) of 94.92% and 97.45% demonstrates the effectiveness of an ensemble method, and that IED detection can be achieved with high performance using raw scalp EEG data, thus showing promise for the proposed approach in clinical settings. △ Less

Submitted 1 May, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted for publication at EMBC 2023

arXiv:2211.04628 [pdf]

doi 10.1016/j.bspc.2023.104780

MP-SeizNet: A Multi-Path CNN Bi-LSTM Network for Seizure-Type Classification Using EEG

Authors: Hezam Albaqami, Ghulam Mubashar Hassan, Amitava Datta

Abstract: Seizure type identification is essential for the treatment and management of epileptic patients. However, it is a difficult process known to be time consuming and labor intensive. Automated diagnosis systems, with the advancement of machine learning algorithms, have the potential to accelerate the classification process, alert patients, and support physicians in making quick and accurate decisions… ▽ More Seizure type identification is essential for the treatment and management of epileptic patients. However, it is a difficult process known to be time consuming and labor intensive. Automated diagnosis systems, with the advancement of machine learning algorithms, have the potential to accelerate the classification process, alert patients, and support physicians in making quick and accurate decisions. In this paper, we present a novel multi-path seizure-type classification deep learning network (MP-SeizNet), consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory neural network (Bi-LSTM) with an attention mechanism. The objective of this study was to classify specific types of seizures, including complex partial, simple partial, absence, tonic, and tonic-clonic seizures, using only electroencephalogram (EEG) data. The EEG data is fed to our proposed model in two different representations. The CNN was fed with wavelet-based features extracted from the EEG signals, while the Bi-LSTM was fed with raw EEG signals to let our MP-SeizNet jointly learns from different representations of seizure data for more accurate information learning. The proposed MP-SeizNet was evaluated using the largest available EEG epilepsy database, the Temple University Hospital EEG Seizure Corpus, TUSZ v1.5.2. We evaluated our proposed model across different patient data using three-fold cross-validation and across seizure data using five-fold cross-validation, achieving F1 scores of 87.6% and 98.1%, respectively. △ Less

Submitted 1 March, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

Journal ref: Biomed. Signal Process. Control. 84 (2023) 104780

arXiv:2204.13184 [pdf, other]

doi 10.1109/IMS19712.2021.9575020

Channel Modeling for Physically Secure Electro-Quasistatic In-Body to Out-of-Body Communication with Galvanic Tx and Multimodal Rx

Authors: Arunashish Datta, Mayukh Nath, Baibhab Chatterjee, Nirmoy Modak, Shreyas Sen

Abstract: Increasing number of devices being used in and around the human body has resulted in the exploration of the human body as a communication medium. In this paper, we design a channel model for implantable devices communicating outside the body using physically secure Electro-Quasistatic Human Body Communication. A galvanic receiver shows 5dB lower path loss than capacitive receiver when placed close… ▽ More Increasing number of devices being used in and around the human body has resulted in the exploration of the human body as a communication medium. In this paper, we design a channel model for implantable devices communicating outside the body using physically secure Electro-Quasistatic Human Body Communication. A galvanic receiver shows 5dB lower path loss than capacitive receiver when placed close to transmitter whereas a capacitive receiver has around 15dB lower path loss for larger separation between the transmitter and receiver. Finite Element Method (FEM) based simulations are used to analyze the communication channel for different receiver topologies and experimental data is used to validate the simulation results. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2204.13181 [pdf, other]

doi 10.1109/LMWC.2022.3163077

A Quantitative Analysis of Physical Security and Path Loss With Frequency for IBOB Channel

Authors: Arunashish Datta, Mayukh Nath, Baibhab Chatterjee, Shovan Maity, Shreyas Sen

Abstract: Security vulnerabilities demonstrated in implantable medical devices have opened the door for research into physically secure and low power communication methodologies. In this study, we perform a comparative analysis of commonly used ISM frequency bands and human body communication (HBC) for data transfer from in-body to out-of-body (IBOB). We develop a figure of merit (FoM) that comprises of the… ▽ More Security vulnerabilities demonstrated in implantable medical devices have opened the door for research into physically secure and low power communication methodologies. In this study, we perform a comparative analysis of commonly used ISM frequency bands and human body communication (HBC) for data transfer from in-body to out-of-body (IBOB). We develop a figure of merit (FoM) that comprises of the critical parameters to quantitatively compare the communication methodologies. We perform finite-element method (FEM)-based simulations and experiments to validate the FoM developed. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2203.00511 [pdf, other]

doi 10.3390/app12115702

Wavelet-Based Multi-Class Seizure Type Classification System

Authors: Hezam Albaqami, Ghulam Mubashar Hassan, Amitava Datta

Abstract: Epilepsy is one of the most common brain diseases that affect more than 1\% of the world's population. It is characterized by recurrent seizures, which come in different types and are treated differently. Electroencephalography (EEG) is commonly used in medical services to diagnose seizures and their types. The accurate identification of seizures helps to provide optimal treatment and accurate inf… ▽ More Epilepsy is one of the most common brain diseases that affect more than 1\% of the world's population. It is characterized by recurrent seizures, which come in different types and are treated differently. Electroencephalography (EEG) is commonly used in medical services to diagnose seizures and their types. The accurate identification of seizures helps to provide optimal treatment and accurate information to the patient. However, the manual diagnostic procedures of epileptic seizures are laborious and highly-specialized. Moreover, EEG manual evaluation is a process known to have a low inter-rater agreement among experts. This paper presents a novel automatic technique that involves extraction of specific features from EEG signals using Dual-tree Complex Wavelet Transform (DTCWT) and classifying them. We evaluated the proposed technique on TUH EEG Seizure Corpus (TUSZ) ver.1.5.2 dataset and compared the performance with existing state-of-the-art techniques using overall F1-score due to class imbalance seizure types. Our proposed technique achieved the best results of weighted F1-score of 99.1\% and 74.7\% for seizure-wise and patient-wise classification respectively, thereby setting new benchmark results for this dataset. △ Less

Submitted 19 February, 2022; originally announced March 2022.

arXiv:2102.11428 [pdf, other]

BMART-Enabled Field-Map Combination of Projection-Reconstruction Phase-Cycled SSFP Cardiac Cine for Banding and Flow-Artifact Reduction

Authors: Anjali Datta, Dwight G Nishimura, Corey A Baron

Abstract: Purpose: To develop a method for banding-free bSSFP cardiac cine with substantially reduced flow artifacts. Methods: A projection-reconstruction (PR) trajectory is proposed for a frequency-modulated cine sequence, facilitating reconstruction of three phase cycles and a field-map time series from a short, breath-held scan. Data is also acquired during the gradient rewinders to enable generation o… ▽ More Purpose: To develop a method for banding-free bSSFP cardiac cine with substantially reduced flow artifacts. Methods: A projection-reconstruction (PR) trajectory is proposed for a frequency-modulated cine sequence, facilitating reconstruction of three phase cycles and a field-map time series from a short, breath-held scan. Data is also acquired during the gradient rewinders to enable generation of field maps using BMART, B$_0$ mapping using rewinding trajectories, where the rewind data forms the second TE image for calculating the field map. A field-map-based combination method is developed which weights the phase-cycle component images to include only passband signal in the final cine images, and exclude stopband and near-band flow artifacts. Results: The weights derived from the BMART-generated field maps mask out banding and near-band flow artifacts in and around the heart. Therefore, the field-map-based phase-cycle combination, which is facilitated by the PR acquisition with BMART, results in more homogeneous blood pools and reduced hyperintense regions than root-sum-of-squares. Conclusion: With the proposed techniques, using a non-Cartesian trajectory for a frequency-modulated cine sequence enables flow-artifact-reduced banding-free cardiac imaging within a short breath-hold. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Submitted to Magnetic Resonance in Medicine

arXiv:2012.10034 [pdf, other]

doi 10.1016/j.bspc.2021.102957

Automatic detection of abnormal EEG signals using wavelet feature extraction and gradient boosting decision tree

Authors: Hezam Albaqami, Ghulam Mubashar Hassan, Abdulhamit Subasi, Amitava Datta

Abstract: Electroencephalography is frequently used for diagnostic evaluation of various brain-related disorders due to its excellent resolution, non-invasive nature and low cost. However, manual analysis of EEG signals could be strenuous and a time-consuming process for experts. It requires long training time for physicians to develop expertise in it and additionally experts have low inter-rater agreement… ▽ More Electroencephalography is frequently used for diagnostic evaluation of various brain-related disorders due to its excellent resolution, non-invasive nature and low cost. However, manual analysis of EEG signals could be strenuous and a time-consuming process for experts. It requires long training time for physicians to develop expertise in it and additionally experts have low inter-rater agreement (IRA) among themselves. Therefore, many Computer Aided Diagnostic (CAD) based studies have considered the automation of interpreting EEG signals to alleviate the workload and support the final diagnosis. In this paper, we present an automatic binary classification framework for brain signals in multichannel EEG recordings. We propose to use Wavelet Packet Decomposition (WPD) techniques to decompose the EEG signals into frequency sub-bands and extract a set of statistical features from each of the selected coefficients. Moreover, we propose a novel method to reduce the dimension of the feature space without compromising the quality of the extracted features. The extracted features are classified using different Gradient Boosting Decision Tree (GBDT) based classification frameworks, which are CatBoost, XGBoost and LightGBM. We used Temple University Hospital EEG Abnormal Corpus V2.0.0 to test our proposed technique. We found that CatBoost classifier achieves the binary classification accuracy of 87.68%, and outperforms state-of-the-art techniques on the same dataset by more than 1% in accuracy and more than 3% in sensitivity. The obtained results in this research provide important insights into the usefulness of WPD feature extraction and GBDT classifiers for EEG classification. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Journal ref: Biomedical Signal Processing and Control, Volume 70, September 2021, 102957

arXiv:2011.03216 [pdf, other]

Task-relevant Representation Learning for Networked Robotic Perception

Authors: Manabu Nakanoya, Sandeep Chinchali, Alexandros Anemogiannis, Akul Datta, Sachin Katti, Marco Pavone

Abstract: Today, even the most compute-and-power constrained robots can measure complex, high data-rate video and LIDAR sensory streams. Often, such robots, ranging from low-power drones to space and subterranean rovers, need to transmit high-bitrate sensory data to a remote compute server if they are uncertain or cannot scalably run complex perception or mapping tasks locally. However, today's representati… ▽ More Today, even the most compute-and-power constrained robots can measure complex, high data-rate video and LIDAR sensory streams. Often, such robots, ranging from low-power drones to space and subterranean rovers, need to transmit high-bitrate sensory data to a remote compute server if they are uncertain or cannot scalably run complex perception or mapping tasks locally. However, today's representations for sensory data are mostly designed for human, not robotic, perception and thus often waste precious compute or wireless network resources to transmit unimportant parts of a scene that are unnecessary for a high-level robotic task. This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective. Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods. Further, it achieves high accuracy and robust generalization on diverse tasks including Mars terrain classification with low-power deep learning accelerators, neural motion planning, and environmental timeseries classification. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2010.15339 [pdf, other]

doi 10.1109/TBME.2021.3074138

Advanced Biophysical Model to Capture Channel Variability for EQS Capacitive HBC

Authors: Arunashish Datta, Mayukh Nath, David Yang, Shreyas Sen

Abstract: Human Body Communication (HBC) has come up as a promising alternative to traditional radio frequency (RF) Wireless Body Area Network (WBAN) technologies. This is essentially due to HBC providing a broadband communication channel with enhanced signal security in the physical layer due to lower radiation from the human body as compared to its RF counterparts. An in-depth understanding of the mechani… ▽ More Human Body Communication (HBC) has come up as a promising alternative to traditional radio frequency (RF) Wireless Body Area Network (WBAN) technologies. This is essentially due to HBC providing a broadband communication channel with enhanced signal security in the physical layer due to lower radiation from the human body as compared to its RF counterparts. An in-depth understanding of the mechanism for the channel loss variability and associated biophysical model needs to be developed before EQS-HBC can be used more frequently in WBAN consumer and medical applications. Biophysical models characterizing the human body as a communication channel didn't exist in literature for a long time. Recent developments have shown models that capture the channel response for fixed transmitter and receiver positions on the human body. These biophysical models do not capture the variability in the HBC channel for varying positions of the devices with respect to the human body. In this study, we provide a detailed analysis of the change in path loss in a capacitive-HBC channel in the electroquasistatic (EQS) domain. Causes of channel loss variability namely: inter-device coupling and effects of fringe fields due to body's shadowing effects are investigated. FEM based simulation results are used to analyze the channel response of human body for different positions and sizes of the device which are further verified using measurement results to validate the developed biophysical model. Using the bio-physical model, we develop a closed form equation for the path loss in a capacitive HBC channel which is then analyzed as a function of the geometric properties of the device and the position with respect to the human body which will help pave the path towards future EQSHBC WBAN design. △ Less

Submitted 28 October, 2020; originally announced October 2020.

Comments: 12 pages, 14 figures

arXiv:2008.06121 [pdf, other]

LSTM Acoustic Models Learn to Align and Pronounce with Graphemes

Authors: Arindrima Datta, Guanlong Zhao, Bhuvana Ramabhadran, Eugene Weinstein

Abstract: Automated speech recognition coverage of the world's languages continues to expand. However, standard phoneme based systems require handcrafted lexicons that are difficult and expensive to obtain. To address this problem, we propose a training methodology for a grapheme-based speech recognizer that can be trained in a purely data-driven fashion. Built with LSTM networks and trained with the cross-… ▽ More Automated speech recognition coverage of the world's languages continues to expand. However, standard phoneme based systems require handcrafted lexicons that are difficult and expensive to obtain. To address this problem, we propose a training methodology for a grapheme-based speech recognizer that can be trained in a purely data-driven fashion. Built with LSTM networks and trained with the cross-entropy loss, the grapheme-output acoustic models we study are also extremely practical for real-world applications as they can be decoded with conventional ASR stack components such as language models and FST decoders, and produce good quality audio-to-grapheme alignments that are useful in many speech applications. We show that the grapheme models are competitive in WER with their phoneme-output counterparts when trained on large datasets, with the advantage that grapheme models do not require explicit linguistic knowledge as an input. We further compare the alignments generated by the phoneme and grapheme models to demonstrate the quality of the pronunciations learnt by them using four Indian languages that vary linguistically in spoken and written forms. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 5 pages, 4 figures. This work was done between summer 2018 and spring 2019

arXiv:2004.09571 [pdf, other]

Language-agnostic Multilingual Modeling

Authors: Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark

Abstract: Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scal… ▽ More Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarce languages. However, most state-of-the-art multilingual models require the encoding of language information and therefore are not as flexible or scalable when expanding to newer languages. Language-independent multilingual models help to address this issue, and are also better suited for multicultural societies where several languages are frequently used together (but often rendered with different writing systems). In this paper, we propose a new approach to building a language-agnostic multilingual ASR system which transforms all languages to one writing system through a many-to-one transliteration transducer. Thus, similar sounding acoustics are mapped to a single, canonical target sequence of graphemes, effectively separating the modeling and rendering problems. We show with four Indic languages, namely, Hindi, Bengali, Tamil and Kannada, that the language-agnostic multilingual model achieves up to 10% relative reduction in Word Error Rate (WER) over a language-dependent multilingual model. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:1909.05330 [pdf, other]

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Authors: Anjuli Kannan, Arindrima Datta, Tara N. Sainath, Eugene Weinstein, Bhuvana Ramabhadran, Yonghui Wu, Ankur Bapna, Zhifeng Chen, Seungji Lee

Abstract: Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in… ▽ More Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages. Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages). △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: Accepted in Interspeech 2019

arXiv:1603.05710 [pdf, ps, other]

Information Flow for Security in Control Systems

Authors: Sean Weerakkody, Bruno Sinopoli, Soummya Kar, Anupam Datta

Abstract: This paper considers the development of information flow analyses to support resilient design and active detection of adversaries in cyber physical systems (CPS). The area of CPS security, though well studied, suffers from fragmentation. In this paper, we consider control systems as an abstraction of CPS. Here, we extend the notion of information flow analysis, a well established set of methods de… ▽ More This paper considers the development of information flow analyses to support resilient design and active detection of adversaries in cyber physical systems (CPS). The area of CPS security, though well studied, suffers from fragmentation. In this paper, we consider control systems as an abstraction of CPS. Here, we extend the notion of information flow analysis, a well established set of methods developed in software security, to obtain a unified framework that captures and extends system theoretic results in control system security. In particular, we propose the Kullback Liebler (KL) divergence as a causal measure of information flow, which quantifies the effect of adversarial inputs on sensor outputs. We show that the proposed measure characterizes the resilience of control systems to specific attack strategies by relating the KL divergence to optimal detection techniques. We then relate information flows to stealthy attack scenarios where an adversary can bypass detection. Finally, this article examines active detection mechanisms where a defender intelligently manipulates control inputs or the system itself in order to elicit information flows from an attacker's malicious behavior. In all previous cases, we demonstrate an ability to investigate and extend existing results by utilizing the proposed information flow analyses. △ Less

Submitted 17 March, 2016; originally announced March 2016.

arXiv:1210.0660 [pdf, other]

Stream on the Sky: Outsourcing Access Control Enforcement for Stream Data to the Cloud

Authors: Tien Tuan Anh Dinh, Anwitaman Datta

Abstract: There is an increasing trend for businesses to migrate their systems towards the cloud. Security concerns that arise when outsourcing data and computation to the cloud include data confidentiality and privacy. Given that a tremendous amount of data is being generated everyday from plethora of devices equipped with sensing capabilities, we focus on the problem of access controls over live streams o… ▽ More There is an increasing trend for businesses to migrate their systems towards the cloud. Security concerns that arise when outsourcing data and computation to the cloud include data confidentiality and privacy. Given that a tremendous amount of data is being generated everyday from plethora of devices equipped with sensing capabilities, we focus on the problem of access controls over live streams of data based on triggers or sliding windows, which is a distinct and more challenging problem than access control over archival data. Specifically, we investigate secure mechanisms for outsourcing access control enforcement for stream data to the cloud. We devise a system that allows data owners to specify fine-grained policies associated with their data streams, then to encrypt the streams and relay them to the cloud for live processing and storage for future use. The access control policies are enforced by the cloud, without the latter learning about the data, while ensuring that unauthorized access is not feasible. To realize these ends, we employ a novel cryptographic primitive, namely proxy-based attribute-based encryption, which not only provides security but also allows the cloud to perform expensive computations on behalf of the users. Our approach is holistic, in that these controls are integrated with an XML based framework (XACML) for high-level management of policies. Experiments with our prototype demonstrate the feasibility of such mechanisms, and early evaluations suggest graceful scalability with increasing numbers of policies, data streams and users. △ Less

Submitted 2 October, 2012; originally announced October 2012.

Showing 1–21 of 21 results for author: Datta, A