subscribe to arXiv mailings

FACTS About Building Retrieval Augmented Generation-based Chatbots

Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots." △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

arXiv:2407.05315 [pdf, other]

doi 10.1016/j.engappai.2023.107719

Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Authors: Eun Som Jeon, Hongjun Choi, Ankita Shukla, Yuan Wang, Hyunglae Lee, Matthew P. Buman, Pavan Turaga

Abstract: Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust… ▽ More Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. The distilled student model utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Engineering Applications of Artificial Intelligence 130, 107719

Journal ref: Engineering Applications of Artificial Intelligence, 130, 107719 (2024)

arXiv:2405.03379 [pdf, other]

Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su

Abstract: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, espe… ▽ More Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics. Our approach consists of a reverse curriculum followed by a forward curriculum. Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets. The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems. A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024). Website: https://reverseforward-cl.github.io/

arXiv:2405.01592 [pdf]

Text and Audio Simplification: Human vs. ChatGPT

Authors: Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla

Abstract: Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including… ▽ More Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including ChatGPT, are available. We scored twelve corpora using our metrics: six text, one audio, and five ChatGPT simplified corpora. We then compare these corpora with texts simplified and verified in a prior user study. Finally, a medical domain expert evaluated these texts and five, new ChatGPT simplified versions. We found that simple corpora show higher similarity with the human simplified texts. ChatGPT simplification moves metrics in the right direction. The medical domain expert evaluation showed a preference for the ChatGPT style, but the text itself was rated lower for content retention. △ Less

Submitted 29 April, 2024; originally announced May 2024.

Comments: AMIA Summit, Boston, 2024

ACM Class: H.4

arXiv:2404.10212 [pdf, other]

LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark

Authors: Avinash Upadhyay, Bhipanshu Dhupar, Manoj Sharma, Ankit Shukla, Ajith Abraham

Abstract: Human pose estimation faces hurdles in real-world applications due to factors like lighting changes, occlusions, and cluttered environments. We introduce a unique RGB-Thermal Nearly Paired and Annotated 2D Pose Dataset, comprising over 2,400 high-quality LWIR (thermal) images. Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners.… ▽ More Human pose estimation faces hurdles in real-world applications due to factors like lighting changes, occlusions, and cluttered environments. We introduce a unique RGB-Thermal Nearly Paired and Annotated 2D Pose Dataset, comprising over 2,400 high-quality LWIR (thermal) images. Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners. This dataset, captured from seven actors performing diverse everyday activities like sitting, eating, and walking, facilitates pose estimation on occlusion and other challenging scenarios. We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential, establishing a strong baseline for future research. Our results demonstrate the dataset's effectiveness in promoting advancements in pose estimation for various applications, including surveillance, healthcare, and sports analytics. The dataset and code are available at https://github.com/avinres/LWIRPOSE △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Submitted in ICIP2024

arXiv:2404.00613 [pdf, ps, other]

On $(θ, Θ)$-cyclic codes and their applications in constructing QECCs

Authors: Awadhesh Kumar Shukla, Sachin Pathak, Om Prakash Pandey, Vipul Mishra, Ashish Kumar Upadhyay

Abstract: Let $\mathbb F_q$ be a finite field, where $q$ is an odd prime power. Let $R=\mathbb{F}_q+u\mathbb{F}_q+v\mathbb{F}_q+uv\mathbb F_q$ with $u^2=u,v^2=v,uv=vu$. In this paper, we study the algebraic structure of $(θ, Θ)$-cyclic codes of block length $(r,s )$ over $\mathbb{F}_qR.$ Specifically, we analyze the structure of these codes as left $R[x:Θ]$-submodules of… ▽ More Let $\mathbb F_q$ be a finite field, where $q$ is an odd prime power. Let $R=\mathbb{F}_q+u\mathbb{F}_q+v\mathbb{F}_q+uv\mathbb F_q$ with $u^2=u,v^2=v,uv=vu$. In this paper, we study the algebraic structure of $(θ, Θ)$-cyclic codes of block length $(r,s )$ over $\mathbb{F}_qR.$ Specifically, we analyze the structure of these codes as left $R[x:Θ]$-submodules of $\mathfrak{R}_{r,s} = \frac{\mathbb{F}_q[x:θ]}{\langle x^r-1\rangle} \times \frac{R[x:Θ]}{\langle x^s-1\rangle}$. Our investigation involves determining generator polynomials and minimal generating sets for this family of codes. Further, we discuss the algebraic structure of separable codes. A relationship between the generator polynomials of $(θ, Θ)$-cyclic codes over $\mathbb F_qR$ and their duals is established. Moreover, we calculate the generator polynomials of dual of $(θ, Θ)$-cyclic codes. As an application of our study, we provide a construction of quantum error-correcting codes (QECCs) from $(θ, Θ)$-cyclic codes of block length $(r,s)$ over $\mathbb{F}_qR$. We support our theoretical results with illustrative examples. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 30 pages, 4 tables

arXiv:2402.03737 [pdf, ps, other]

Differentially Private High Dimensional Bandits

Authors: Apurv Shukla

Abstract: We consider a high-dimensional stochastic contextual linear bandit problem when the parameter vector is $s_{0}$-sparse and the decision maker is subject to privacy constraints under both central and local models of differential privacy. We present PrivateLASSO, a differentially private LASSO bandit algorithm. PrivateLASSO is based on two sub-routines: (i) a sparse hard-thresholding-based privacy m… ▽ More We consider a high-dimensional stochastic contextual linear bandit problem when the parameter vector is $s_{0}$-sparse and the decision maker is subject to privacy constraints under both central and local models of differential privacy. We present PrivateLASSO, a differentially private LASSO bandit algorithm. PrivateLASSO is based on two sub-routines: (i) a sparse hard-thresholding-based privacy mechanism and (ii) an episodic thresholding rule for identifying the support of the parameter $θ$. We prove minimax private lower bounds and establish privacy and utility guarantees for PrivateLASSO for the central model under standard assumptions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.09185 [pdf, other]

Behavior Trees with Dataflow: Coordinating Reactive Tasks in Lingua Franca

Authors: Alexander Schulz-Rosengarten, Akash Ahmad, Malte Clement, Reinhard von Hanxleden, Benjamin Asch, Marten Lohstroh, Edward A. Lee, Gustavo Quiros Araya, Ankit Shukla

Abstract: Behavior Trees (BTs) provide a lean set of control flow elements that are easily composable in a modular tree structure. They are well established for modeling the high-level behavior of non-player characters in computer games and recently gained popularity in other areas such as industrial automation. While BTs nicely express control, data handling aspects so far must be provided separately, e. g… ▽ More Behavior Trees (BTs) provide a lean set of control flow elements that are easily composable in a modular tree structure. They are well established for modeling the high-level behavior of non-player characters in computer games and recently gained popularity in other areas such as industrial automation. While BTs nicely express control, data handling aspects so far must be provided separately, e. g. in the form of blackboards. This may hamper reusability and can be a source of nondeterminism. We here present a dataflow extension to BTs that explicitly models data relations and communication. We provide a combined textual/graphical approach in line with modern, productivity-enhancing pragmatics-aware modeling techniques. We realized and validated that approach in the recently introduced polyglot coordination language Lingua Franca (LF). △ Less

Submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.02158 [pdf, other]

Shayona@SMM4H23: COVID-19 Self diagnosis classification using BERT and LightGBM models

Authors: Rushi Chavda, Darshan Makwana, Vraj Patel, Anupam Shukla

Abstract: This paper describes approaches and results for shared Task 1 and 4 of SMMH4-23 by Team Shayona. Shared Task-1 was binary classification of english tweets self-reporting a COVID-19 diagnosis, and Shared Task-4 was Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. Our team has achieved the highest f1-score 0.94 in Task-1 among all participants. We hav… ▽ More This paper describes approaches and results for shared Task 1 and 4 of SMMH4-23 by Team Shayona. Shared Task-1 was binary classification of english tweets self-reporting a COVID-19 diagnosis, and Shared Task-4 was Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. Our team has achieved the highest f1-score 0.94 in Task-1 among all participants. We have leveraged the Transformer model (BERT) in combination with the LightGBM model for both tasks. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2311.16338 [pdf, other]

Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models

Authors: Rob Grzywinski, Joshua D'Arcy, Rob Naidoff, Ashish Shukla, Alex Browne, Ren Gibbons, Brinnae Bent

Abstract: Instruction-following language models demand robust methodologies for information retrieval to augment instructions for question-answering applications. A primary challenge is the resolution of coreferences in the context of chunking strategies for long documents. The critical barrier to experimentation of handling coreferences is a lack of open source datasets, specifically in question-answering… ▽ More Instruction-following language models demand robust methodologies for information retrieval to augment instructions for question-answering applications. A primary challenge is the resolution of coreferences in the context of chunking strategies for long documents. The critical barrier to experimentation of handling coreferences is a lack of open source datasets, specifically in question-answering tasks that require coreference resolution. In this work we present our Coreference Resolution in Question-Answering (CRaQAn) dataset, an open-source dataset that caters to the nuanced information retrieval requirements of coreference resolution in question-answering tasks by providing over 250 question-answer pairs containing coreferences. To develop this dataset, we developed a novel approach for creating high-quality datasets using an instruction-following model (GPT-4) and a Recursive Criticism and Improvement Loop. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

arXiv:2311.05079 [pdf, other]

Social Media Bot Detection using Dropout-GAN

Authors: Anant Shukla, Martin Jurecek, Mark Stamp

Abstract: Bot activity on social media platforms is a pervasive problem, undermining the credibility of online discourse and potentially leading to cybercrime. We propose an approach to bot detection using Generative Adversarial Networks (GAN). We discuss how we overcome the issue of mode collapse by utilizing multiple discriminators to train against one generator, while decoupling the discriminator to perf… ▽ More Bot activity on social media platforms is a pervasive problem, undermining the credibility of online discourse and potentially leading to cybercrime. We propose an approach to bot detection using Generative Adversarial Networks (GAN). We discuss how we overcome the issue of mode collapse by utilizing multiple discriminators to train against one generator, while decoupling the discriminator to perform social media bot detection and utilizing the generator for data augmentation. In terms of classification accuracy, our approach outperforms the state-of-the-art techniques in this field. We also show how the generator in the GAN can be used to evade such a classification technique. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.00887 [pdf, other]

GRID: A Platform for General Robot Intelligence Development

Authors: Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor

Abstract: Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to addres… ▽ More Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to address both of these issues. The platform enables robots to learn, compose and adapt skills to their physical capabilities, environmental constraints and goals. The platform addresses AI problems in robotics via foundation models that know the physical world. GRID is designed from the ground up to be extensible to accommodate new types of robots, vehicles, hardware platforms and software protocols. In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems. We demonstrate the platform in various aerial robotics scenarios and demonstrate how the platform dramatically accelerates development of machine intelligent robots. △ Less

Submitted 7 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.00207 [pdf, ps, other]

Detecting Unseen Multiword Expressions in American Sign Language

Authors: Lee Kezar, Aryan Shukla

Abstract: Multiword expressions present unique challenges in many translation tasks. In an attempt to ultimately apply a multiword expression detection system to the translation of American Sign Language, we built and tested two systems that apply word embeddings from GloVe to determine whether or not the word embeddings of lexemes can be used to predict whether or not those lexemes compose a multiword expr… ▽ More Multiword expressions present unique challenges in many translation tasks. In an attempt to ultimately apply a multiword expression detection system to the translation of American Sign Language, we built and tested two systems that apply word embeddings from GloVe to determine whether or not the word embeddings of lexemes can be used to predict whether or not those lexemes compose a multiword expression. It became apparent that word embeddings carry data that can detect non-compositionality with decent accuracy. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: Technical report, unpublished

arXiv:2308.12129 [pdf, other]

Resiliency Analysis of LLM generated models for Industrial Automation

Authors: Oluwatosin Ogundare, Gustavo Quiros Araya, Ioannis Akrotirianakis, Ankit Shukla

Abstract: This paper proposes a study of the resilience and efficiency of automatically generated industrial automation and control systems using Large Language Models (LLMs). The approach involves modeling the system using percolation theory to estimate its resilience and formulating the design problem as an optimization problem subject to constraints. Techniques from stochastic optimization and regret ana… ▽ More This paper proposes a study of the resilience and efficiency of automatically generated industrial automation and control systems using Large Language Models (LLMs). The approach involves modeling the system using percolation theory to estimate its resilience and formulating the design problem as an optimization problem subject to constraints. Techniques from stochastic optimization and regret analysis are used to find a near-optimal solution with provable regret bounds. The study aims to provide insights into the effectiveness and reliability of automatically generated systems in industrial automation and control, and to identify potential areas for improvement in their design and implementation. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 8 Pages, Conference Manuscript

arXiv:2306.15768 [pdf]

An Efficient Deep Convolutional Neural Network Model For Yoga Pose Recognition Using Single Images

Authors: Santosh Kumar Yadav, Apurv Shukla, Kamlesh Tiwari, Hari Mohan Pandey, Shaik Ali Akbar

Abstract: Pose recognition deals with designing algorithms to locate human body joints in a 2D/3D space and run inference on the estimated joint locations for predicting the poses. Yoga poses consist of some very complex postures. It imposes various challenges on the computer vision algorithms like occlusion, inter-class similarity, intra-class variability, viewpoint complexity, etc. This paper presents YPo… ▽ More Pose recognition deals with designing algorithms to locate human body joints in a 2D/3D space and run inference on the estimated joint locations for predicting the poses. Yoga poses consist of some very complex postures. It imposes various challenges on the computer vision algorithms like occlusion, inter-class similarity, intra-class variability, viewpoint complexity, etc. This paper presents YPose, an efficient deep convolutional neural network (CNN) model to recognize yoga asanas from RGB images. The proposed model consists of four steps as follows: (a) first, the region of interest (ROI) is segmented using segmentation based approaches to extract the ROI from the original images; (b) second, these refined images are passed to a CNN architecture based on the backbone of EfficientNets for feature extraction; (c) third, dense refinement blocks, adapted from the architecture of densely connected networks are added to learn more diversified features; and (d) fourth, global average pooling and fully connected layers are applied for the classification of the multi-level hierarchy of the yoga poses. The proposed model has been tested on the Yoga-82 dataset. It is a publicly available benchmark dataset for yoga pose recognition. Experimental results show that the proposed model achieves the state-of-the-art on this dataset. The proposed model obtained an accuracy of 93.28%, which is an improvement over the earlier state-of-the-art (79.35%) with a margin of approximately 13.9%. The code will be made publicly available. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2305.06414 [pdf, other]

Self-contained relaxation-based dynamical Ising machines

Authors: Mikhail Erementchouk, Aditya Shukla, Pinaki Mazumder

Abstract: Dynamical Ising machines are continuous dynamical systems evolving from a generic initial state to a state strongly related to the ground state of the classical Ising model on a graph. Reaching the ground state is equivalent to finding the maximum (weighted) cut of the graph, which presents the Ising machines as an alternative way to solving and investigating NP-complete problems. Among the dynami… ▽ More Dynamical Ising machines are continuous dynamical systems evolving from a generic initial state to a state strongly related to the ground state of the classical Ising model on a graph. Reaching the ground state is equivalent to finding the maximum (weighted) cut of the graph, which presents the Ising machines as an alternative way to solving and investigating NP-complete problems. Among the dynamical models driving the Ising machines, relaxation-based models are especially interesting because of their relations with guarantees of performance achieved in time scaling polynomially with the problem size. However, the terminal states of such machines are essentially non-binary, which necessitates special post-processing relying on disparate computing. We show that an Ising machine implementing a special dynamical system (called \mdII{}) solves the rounding problem dynamically. We prove that the \mdII-machine starting from an arbitrary non-binary state terminates in a state, which trivially rounds to a binary state with the cut at least as big as obtained after the optimal rounding of the initial state. Besides showing that relaxation-based dynamical Ising machines can be made self-contained, our findings demonstrate that dynamical systems can directly perform complex information processing tasks. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 22 pages, 5 figures

MSC Class: 90C27; 37B15; 82C20

arXiv:2305.03235 [pdf]

Hardware in Loop Learning with Spin Stochastic Neurons

Authors: A N M Nafiul Islam, Kezhou Yang, Amit K. Shukla, Pravin Khanal, Bowei Zhou, Wei-Gang Wang, Abhronil Sengupta

Abstract: Despite the promise of superior efficiency and scalability, real-world deployment of emerging nanoelectronic platforms for brain-inspired computing have been limited thus far, primarily because of inter-device variations and intrinsic non-idealities. In this work, we demonstrate mitigating these issues by performing learning directly on practical devices through a hardware-in-loop approach, utiliz… ▽ More Despite the promise of superior efficiency and scalability, real-world deployment of emerging nanoelectronic platforms for brain-inspired computing have been limited thus far, primarily because of inter-device variations and intrinsic non-idealities. In this work, we demonstrate mitigating these issues by performing learning directly on practical devices through a hardware-in-loop approach, utilizing stochastic neurons based on heavy metal/ferromagnetic spin-orbit torque heterostructures. We characterize the probabilistic switching and device-to-device variability of our fabricated devices of various sizes to showcase the effect of device dimension on the neuronal dynamics and its consequent impact on network-level performance. The efficacy of the hardware-in-loop scheme is illustrated in a deep learning scenario achieving equivalent software performance. This work paves the way for future large-scale implementations of neuromorphic hardware and realization of truly autonomous edge-intelligent devices. △ Less

Submitted 21 March, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.09948 [pdf]

Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large Language Models Using Generative Pre-Trained Transformers

Authors: Aishwarya Deep Shukla, Laksh Agarwal, Jie Mein, Goh, Guodong, Gao, Ritu Agarwal

Abstract: The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being and has prompted concern among consumer protection groups and regulatory bodies. Yet despite significant advancements in the fields of machine learning and natural language processing, there remains limited comprehension of the characteristics differentiating fraudulent from authentic revie… ▽ More The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being and has prompted concern among consumer protection groups and regulatory bodies. Yet despite significant advancements in the fields of machine learning and natural language processing, there remains limited comprehension of the characteristics differentiating fraudulent from authentic reviews. This study utilizes a novel pre-labeled dataset of 38048 physician reviews to establish the effectiveness of large language models in classifying reviews. Specifically, we compare the performance of traditional ML models, such as logistic regression and support vector machines, to generative pre-trained transformer models. Furthermore, we use GPT4, the newest model in the GPT family, to uncover the key dimensions along which fake and genuine physician reviews differ. Our findings reveal significantly superior performance of GPT-3 over traditional ML models in this context. Additionally, our analysis suggests that GPT3 requires a smaller training sample than traditional models, suggesting its appropriateness for tasks with scarce training data. Moreover, the superiority of GPT3 performance increases in the cold start context i.e., when there are no prior reviews of a doctor. Finally, we employ GPT4 to reveal the crucial dimensions that distinguish fake physician reviews. In sharp contrast to previous findings in the literature that were obtained using simulated data, our findings from a real-world dataset show that fake reviews are generally more clinically detailed, more reserved in sentiment, and have better structure and grammar than authentic ones. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.16024 [pdf, other]

Egocentric Auditory Attention Localization in Conversations

Authors: Fiona Ryan, Hao Jiang, Abhinav Shukla, James M. Rehg, Vamsi Krishna Ithapu

Abstract: In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for developing technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound source… ▽ More In a noisy conversation environment such as a dinner party, people often exhibit selective auditory attention, or the ability to focus on a particular speaker while tuning out others. Recognizing who somebody is listening to in a conversation is essential for developing technologies that can understand social behavior and devices that can augment human hearing by amplifying particular sound sources. The computer vision and audio research communities have made great strides towards recognizing sound sources and speakers in scenes. In this work, we take a step further by focusing on the problem of localizing auditory attention targets in egocentric video, or detecting who in a camera wearer's field of view they are listening to. To tackle the new and challenging Selective Auditory Attention Localization problem, we propose an end-to-end deep learning approach that uses egocentric video and multichannel audio to predict the heatmap of the camera wearer's auditory attention. Our approach leverages spatiotemporal audiovisual features and holistic reasoning about the scene to make predictions, and outperforms a set of baselines on a challenging multi-speaker conversation dataset. Project page: https://fkryan.github.io/saal △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.11873 [pdf, other]

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

Authors: William Merrill, Nikolaos Tsilivis, Aman Shukla

Abstract: Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks undergoing grokking on the sparse parity task, and find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that d… ▽ More Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly. We empirically study the internal structure of networks undergoing grokking on the sparse parity task, and find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that dominates model predictions. On an optimization level, we find that this subnetwork arises when a small subset of neurons undergoes rapid norm growth, whereas the other neurons in the network decay slowly in norm. Thus, we suggest that the grokking phase transition can be understood to emerge from competition of two largely distinct subnetworks: a dense one that dominates before the transition and generalizes poorly, and a sparse one that dominates afterwards. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Published at the Workshop on Understanding Foundation Models at ICLR 2023

arXiv:2303.11424 [pdf, other]

Polynomial Implicit Neural Representations For Large Diverse Datasets

Authors: Rajhans Singh, Ankita Shukla, Pavan Turaga

Abstract: Implicit neural representations (INR) have gained significant popularity for signal and image representation for many end-tasks, such as superresolution, 3D modeling, and more. Most INR architectures rely on sinusoidal positional encoding, which accounts for high-frequency information in data. However, the finite encoding size restricts the model's representational power. Higher representational p… ▽ More Implicit neural representations (INR) have gained significant popularity for signal and image representation for many end-tasks, such as superresolution, 3D modeling, and more. Most INR architectures rely on sinusoidal positional encoding, which accounts for high-frequency information in data. However, the finite encoding size restricts the model's representational power. Higher representational power is needed to go from representing a single given image to representing large and diverse datasets. Our approach addresses this gap by representing an image with a polynomial function and eliminates the need for positional encodings. Therefore, to achieve a progressively higher degree of polynomial representation, we use element-wise multiplications between features and affine-transformed coordinate locations after every ReLU layer. The proposed method is evaluated qualitatively and quantitatively on large datasets like ImageNet. The proposed Poly-INR model performs comparably to state-of-the-art generative models without any convolution, normalization, or self-attention layers, and with far fewer trainable parameters. With much fewer training parameters and higher representative power, our approach paves the way for broader adoption of INR models for generative modeling tasks in complex domains. The code is available at \url{https://github.com/Rajhans0/Poly_INR} △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023

arXiv:2303.07451 [pdf]

DRISHTI: Visual Navigation Assistant for Visually Impaired

Authors: Malay Joshi, Aditi Shukla, Jayesh Srivastava, Manya Rastogi

Abstract: In today's society, where independent living is becoming increasingly important, it can be extremely constricting for those who are blind. Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment. In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRI… ▽ More In today's society, where independent living is becoming increasingly important, it can be extremely constricting for those who are blind. Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment. In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRISHTI, to provide visual navigation assistance for BVI people. This system comprises a camera module, ESP32 processor, Bluetooth module, smartphone and speakers. Using artificial intelligence, this system is proposed to detect and understand the nature of the users' path and obstacles ahead of the user in that path and then inform BVI users about it via audio output to enable them to acquire directions by themselves on their journey. This first step discussed in this paper involves establishing a proof-of-concept of achieving the right balance of affordability and performance by testing an initial software integration of a currency detection algorithm on a low-cost embedded arrangement. This work will lay the foundation for our upcoming works toward achieving the goal of assisting the maximum of BVI people around the globe in moving independently. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Paper presented at International Conference on Advancements and Key Challenges in Green Energy and Computing (AKGEC 2023) is accepted to be published in the proceedings of the Journal of Physics

arXiv:2303.07201 [pdf, other]

An evaluation of Google Translate for Sanskrit to English translation via sentiment and semantic analysis

Authors: Akshat Shukla, Chaarvi Bansal, Sushrut Badhe, Mukul Ranjan, Rohitash Chandra

Abstract: Google Translate has been prominent for language translation; however, limited work has been done in evaluating the quality of translation when compared to human experts. Sanskrit one of the oldest written languages in the world. In 2022, the Sanskrit language was added to the Google Translate engine. Sanskrit is known as the mother of languages such as Hindi and an ancient source of the Indo-Euro… ▽ More Google Translate has been prominent for language translation; however, limited work has been done in evaluating the quality of translation when compared to human experts. Sanskrit one of the oldest written languages in the world. In 2022, the Sanskrit language was added to the Google Translate engine. Sanskrit is known as the mother of languages such as Hindi and an ancient source of the Indo-European group of languages. Sanskrit is the original language for sacred Hindu texts such as the Bhagavad Gita. In this study, we present a framework that evaluates the Google Translate for Sanskrit using the Bhagavad Gita. We first publish a translation of the Bhagavad Gita in Sanskrit using Google Translate. Our framework then compares Google Translate version of Bhagavad Gita with expert translations using sentiment and semantic analysis via BERT-based language models. Our results indicate that in terms of sentiment and semantic analysis, there is low level of similarity in selected verses of Google Translate when compared to expert translations. In the qualitative evaluation, we find that Google translate is unsuitable for translation of certain Sanskrit words and phrases due to its poetic nature, contextual significance, metaphor and imagery. The mistranslations are not surprising since the Bhagavad Gita is known as a difficult text not only to translate, but also to interpret since it relies on contextual, philosophical and historical information. Our framework lays the foundation for automatic evaluation of other languages by Google Translate △ Less

Submitted 27 February, 2023; originally announced March 2023.

arXiv:2302.14130 [pdf, other]

doi 10.1016/j.neucom.2022.11.029

Leveraging Angular Distributions for Improved Knowledge Distillation

Authors: Eun Som Jeon, Hongjun Choi, Ankita Shukla, Pavan Turaga

Abstract: Knowledge distillation as a broad class of methods has led to the development of lightweight and memory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, i… ▽ More Knowledge distillation as a broad class of methods has led to the development of lightweight and memory efficient models, using a pre-trained model with a large capacity (teacher network) to train a smaller model (student network). Recently, additional variations for knowledge distillation, utilizing activation maps of intermediate layers as the source of knowledge, have been studied. Generally, in computer vision applications, it is seen that the feature activation learned by a higher capacity model contains richer knowledge, highlighting complete objects while focusing less on the background. Based on this observation, we leverage the dual ability of the teacher to accurately distinguish between positive (relevant to the target object) and negative (irrelevant) areas. We propose a new loss function for distillation, called angular margin-based distillation (AMD) loss. AMD loss uses the angular distance between positive and negative features by projecting them onto a hypersphere, motivated by the near angular distributions seen in many feature extractors. Then, we create a more attentive feature that is angularly distributed on the hypersphere by introducing an angular margin to the positive feature. Transferring such knowledge from the teacher network enables the student model to harness the higher discrimination of positive and negative features for the teacher, thus distilling superior student models. The proposed method is evaluated for various student-teacher network pairs on four public datasets. Furthermore, we show that the proposed method has advantages in compatibility with other learning techniques, such as using fine-grained features, augmentation, and other distillation methods. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Neurocomputing, Volume 518, 21 January 2023, Pages 466-481

Journal ref: Neurocomputing, Volume 518, 2023, Pages 466-481

arXiv:2212.12861 [pdf, other]

An efficient quantum-classical hybrid algorithm for distorted alphanumeric character identification

Authors: Ankur Pal, Abhishek Shukla, Anirban Pathak

Abstract: An algorithm for image processing is proposed. The proposed algorithm, which can be viewed as a quantum-classical hybrid algorithm, can transform a low-resolution bitonal image of a character from the set of alphanumeric characters (A-Z, 0-9) into a high-resolution image. The quantum part of the proposed algorithm fruitfully utilizes a variant of Grover's search algorithm, known as the fixed point… ▽ More An algorithm for image processing is proposed. The proposed algorithm, which can be viewed as a quantum-classical hybrid algorithm, can transform a low-resolution bitonal image of a character from the set of alphanumeric characters (A-Z, 0-9) into a high-resolution image. The quantum part of the proposed algorithm fruitfully utilizes a variant of Grover's search algorithm, known as the fixed point search algorithm. Further, the quantum part of the algorithm is simulated using CQASM and the advantage of the algorithm is established through the complexity analysis. Additional analysis has also revealed that this scheme for optical character recognition (OCR) leads to high confidence value and generally works in a more efficient manner compared to the existing classical, quantum, and hybrid algorithms for a similar task. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Comments: A quantum-assisted algorithm for optical character recognition (OCR) is proposed using fixed point Grover's algorithm

arXiv:2212.01745 [pdf, other]

Design of an All-Purpose Terrace Farming Robot

Authors: Vibhakar Mohta, Adarsh Patnaik, Shivam Kumar Panda, Siva Vignesh Krishnan, Abhinav Gupta, Abhay Shukla, Gauri Wadhwa, Shrey Verma, Aditya Bandopadhyay

Abstract: Automation in farming processes is a growing field of research in both academia and industries. A considerable amount of work has been put into this field to develop systems robust enough for farming. Terrace farming, in particular, provides a varying set of challenges, including robust stair climbing methods and stable navigation in unstructured terrains. We propose the design of a novel autonomo… ▽ More Automation in farming processes is a growing field of research in both academia and industries. A considerable amount of work has been put into this field to develop systems robust enough for farming. Terrace farming, in particular, provides a varying set of challenges, including robust stair climbing methods and stable navigation in unstructured terrains. We propose the design of a novel autonomous terrace farming robot, Aarohi, that can effectively climb steep terraces of considerable heights and execute several farming operations. The design optimisation strategy for the overall mechanical structure is elucidated. Further, the embedded and software architecture along with fail-safe strategies are presented for a working prototype. Algorithms for autonomous traversal over the terrace steps using the scissor lift mechanism and performing various farming operations have also been discussed. The adaptability of the design to specific operational requirements and modular farm tools allow Aarohi to be customised for a wide variety of use cases. △ Less

Submitted 4 December, 2022; originally announced December 2022.

arXiv:2211.16172 [pdf, other]

Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

Authors: Devansh Mehta, Harshita Diddee, Ananya Saxena, Anurag Shukla, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Vishnu Prasad, Venkanna U, Kalika Bali

Abstract: The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this pr… ▽ More The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: In Submission (Revised) to Language Resources and Evaluation Journal. arXiv admin note: text overlap with arXiv:2004.10270

arXiv:2211.05100 [pdf, other]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License. △ Less

Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.03946 [pdf, other]

Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Authors: Hongjun Choi, Eun Som Jeon, Ankita Shukla, Pavan Turaga

Abstract: Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a s… ▽ More Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD. △ Less

Submitted 8 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: To be presented at WACV 2023

arXiv:2210.07544 [pdf, other]

Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation

Authors: Abhay Shukla, Paheli Bhattacharya, Soham Poddar, Rajdeep Mukherjee, Kripabandhu Ghosh, Pawan Goyal, Saptarshi Ghosh

Abstract: Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input… ▽ More Summarization of legal case judgement documents is a challenging problem in Legal NLP. However, not much analyses exist on how different families of summarization models (e.g., extractive vs. abstractive) perform when applied to legal case documents. This question is particularly important since many recent transformer-based abstractive summarization models have restrictions on the number of input tokens, and legal documents are known to be very long. Also, it is an open question on how best to evaluate legal case document summarization systems. In this paper, we carry out extensive experiments with several extractive and abstractive summarization methods (both supervised and unsupervised) over three legal summarization datasets that we have developed. Our analyses, that includes evaluation by law practitioners, lead to several interesting insights on legal summarization in specific and long document summarization in general. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted at The 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP), 2022

arXiv:2209.11632 [pdf, other]

Facilitating Change Implementation for Continuous ML-Safety Assurance

Authors: Chih-Hong Cheng, Nguyen Anh Vu Doan, Balahari Balu, Franziska Schwaiger, Emmanouil Seferis, Simon Burton, Yassine Qamsane, Ankit Shukla, Yinchong Yang, Zhiliang Wu, Andreas Hapfelmeier, Ingo Thon

Abstract: We propose a method for deploying a safety-critical machine-learning component into continuously evolving environments where an increased degree of automation in the engineering process is desired. We associate semantic tags with the safety case argumentation and turn each piece of evidence into a quantitative metric or a logic formula. With proper tool support, the impact can be characterized by… ▽ More We propose a method for deploying a safety-critical machine-learning component into continuously evolving environments where an increased degree of automation in the engineering process is desired. We associate semantic tags with the safety case argumentation and turn each piece of evidence into a quantitative metric or a logic formula. With proper tool support, the impact can be characterized by a query over the safety argumentation tree to highlight evidence turning invalid. The concept is exemplified using a vision-based emergency braking system of an autonomous guided vehicle for factory automation. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2205.14760 [pdf, other]

Scalable almost-linear dynamical Ising machines

Authors: Aditya Shukla, Mikhail Erementchouk, Pinaki Mazumder

Abstract: The past decade has seen the emergence of Ising machines targeting hard combinatorial optimization problems by minimizing the Ising Hamiltonian with spins represented by continuous dynamical variables. However, capabilities of these machines at larger scales are yet to be fully explored. We investigate an Ising machine based on a network of almost-linearly coupled analog spins. We show that such n… ▽ More The past decade has seen the emergence of Ising machines targeting hard combinatorial optimization problems by minimizing the Ising Hamiltonian with spins represented by continuous dynamical variables. However, capabilities of these machines at larger scales are yet to be fully explored. We investigate an Ising machine based on a network of almost-linearly coupled analog spins. We show that such networks leverage the computational resource similar to that of the semidefinite positive relaxation of the Ising model. We estimate the expected performance of the almost-linear machine and benchmark it on a set of {0,1}-weighted graphs. We show that the running time of the investigated machine scales polynomially (linearly with the number of edges in the connectivity graph). As an example of the physical realization of the machine, we present a CMOS-compatible implementation comprising an array of vertices efficiently storing the continuous spins on charged capacitors and communicating externally via analog current. △ Less

Submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.11722 [pdf, other]

Improving Shape Awareness and Interpretability in Deep Networks Using Geometric Moments

Authors: Rajhans Singh, Ankita Shukla, Pavan Turaga

Abstract: Deep networks for image classification often rely more on texture information than object shape. While efforts have been made to make deep-models shape-aware, it is often difficult to make such models simple, interpretable, or rooted in known mathematical definitions of shape. This paper presents a deep-learning model inspired by geometric moments, a classically well understood approach to measure… ▽ More Deep networks for image classification often rely more on texture information than object shape. While efforts have been made to make deep-models shape-aware, it is often difficult to make such models simple, interpretable, or rooted in known mathematical definitions of shape. This paper presents a deep-learning model inspired by geometric moments, a classically well understood approach to measure shape-related properties. The proposed method consists of a trainable network for generating coordinate bases and affine parameters for making the features geometrically invariant yet in a task-specific manner. The proposed model improves the final feature's interpretation. We demonstrate the effectiveness of our method on standard image classification datasets. The proposed model achieves higher classification performance compared to the baseline and standard ResNet models while substantially improving interpretability. △ Less

Submitted 22 May, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Accepted at CVPR 2023 Workshop: Deep Learning for Geometric Computing

arXiv:2204.02591 [pdf]

Contextual Attention Mechanism, SRGAN Based Inpainting System for Eliminating Interruptions from Images

Authors: Narayana Darapaneni, Vaibhav Kherde, Kameswara Rao, Deepali Nikam, Swanand Katdare, Anima Shukla, Anagha Lomate, Anwesh Reddy Paduri

Abstract: The new alternative is to use deep learning to inpaint any image by utilizing image classification and computer vision techniques. In general, image inpainting is a task of recreating or reconstructing any broken image which could be a photograph or oil/acrylic painting. With the advancement in the field of Artificial Intelligence, this topic has become popular among AI enthusiasts. With our appro… ▽ More The new alternative is to use deep learning to inpaint any image by utilizing image classification and computer vision techniques. In general, image inpainting is a task of recreating or reconstructing any broken image which could be a photograph or oil/acrylic painting. With the advancement in the field of Artificial Intelligence, this topic has become popular among AI enthusiasts. With our approach, we propose an initial end-to-end pipeline for inpainting images using a complete Machine Learning approach instead of a conventional application-based approach. We first use the YOLO model to automatically identify and localize the object we wish to remove from the image. Using the result obtained from the model we can generate a mask for the same. After this, we provide the masked image and original image to the GAN model which uses the Contextual Attention method to fill in the region. It consists of two generator networks and two discriminator networks and is also called a coarse-to-fine network structure. The two generators use fully convolutional networks while the global discriminator gets hold of the entire image as input while the local discriminator gets the grip of the filled region as input. The contextual Attention mechanism is proposed to effectively borrow the neighbor information from distant spatial locations for reconstructing the missing pixels. The third part of our implementation uses SRGAN to resolve the inpainted image back to its original size. Our work is inspired by the paper Free-Form Image Inpainting with Gated Convolution and Generative Image Inpainting with Contextual Attention. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2201.00111 [pdf, other]

doi 10.1109/JIOT.2021.3139038

Role of Data Augmentation Strategies in Knowledge Distillation for Wearable Sensor Data

Authors: Eun Som Jeon, Anirudh Som, Ankita Shukla, Kristina Hasanaj, Matthew P. Buman, Pavan Turaga

Abstract: Deep neural networks are parametrized by several thousands or millions of parameters, and have shown tremendous success in many classification problems. However, the large number of parameters makes it difficult to integrate these models into edge devices such as smartphones and wearable devices. To address this problem, knowledge distillation (KD) has been widely employed, that uses a pre-trained… ▽ More Deep neural networks are parametrized by several thousands or millions of parameters, and have shown tremendous success in many classification problems. However, the large number of parameters makes it difficult to integrate these models into edge devices such as smartphones and wearable devices. To address this problem, knowledge distillation (KD) has been widely employed, that uses a pre-trained high capacity network to train a much smaller network, suitable for edge devices. In this paper, for the first time, we study the applicability and challenges of using KD for time-series data for wearable devices. Successful application of KD requires specific choices of data augmentation methods during training. However, it is not yet known if there exists a coherent strategy for choosing an augmentation approach during KD. In this paper, we report the results of a detailed study that compares and contrasts various common choices and some hybrid data augmentation strategies in KD based human activity analysis. Research in this area is often limited as there are not many comprehensive databases available in the public domain from wearable devices. Our study considers databases from small scale publicly available to one derived from a large scale interventional study into human activity and sedentary behavior. We find that the choice of data augmentation techniques during KD have a variable level of impact on end performance, and find that the optimal network choice as well as data augmentation strategies are specific to a dataset at hand. However, we also conclude with a general set of recommendations that can provide a strong baseline performance across databases. △ Less

Submitted 31 December, 2021; originally announced January 2022.

arXiv:2112.11044 [pdf, ps, other]

Extending Merge Resolution to a Family of Proof Systems

Authors: Sravanthi Chede, Anil Shukla

Abstract: Merge Resolution (MRes [Beyersdorff et al. J. Autom. Reason.'2021]) is a recently introduced proof system for false QBFs. It stores the countermodels as merge maps. Merge maps are deterministic branching programs in which isomorphism checking is efficient making MRes a polynomial time verifiable proof system. In this paper, we introduce a family of proof systems MRes-R in which, the countermodel… ▽ More Merge Resolution (MRes [Beyersdorff et al. J. Autom. Reason.'2021]) is a recently introduced proof system for false QBFs. It stores the countermodels as merge maps. Merge maps are deterministic branching programs in which isomorphism checking is efficient making MRes a polynomial time verifiable proof system. In this paper, we introduce a family of proof systems MRes-R in which, the countermodels are stored in any pre-fixed complete representation R, instead of merge maps. Hence corresponding to each such R, we have a sound and refutationally complete QBF-proof system in MRes-R. To handle arbitrary representations for the strategies, we introduce consistency checking rules in MRes-R instead of isomorphism checking. As a result these proof systems are not polynomial time verifiable. Consequently, the paper shows that using merge maps is too restrictive and can be replaced with arbitrary representations leading to several interesting proof systems. Exploring proof theoretic properties of MRes-R, we show that eFrege+$\forall$red simulates all valid refutations from proof systems in MRes-R. In order to simulate arbitrary representations in MRes-R, we first represent the steps used by the proof systems as a new complete structure. Consequently, the corresponding proof system belonging to MRes-R is able to simulate all proof systems in MRes-R. Finally, we simulate this proof system via eFrege+$\forall$red using the ideas from [Chew et al. ECCC.'2021]. On the lower bound side, we show that the completion principle formulas from [Jonata et al. Theor. Comput. Sci.'2015] which are shown to be hard for regular MRes in [Beyersdorff et al. FSTTCS.'2020], are also hard for any regular proof system in MRes-R. Thereby, the paper lifts the lower bound of regular MRes to an entire class of proof systems, which use some complete representation, including those undiscovered, instead of merge maps. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: 27 pages, 4 figures

MSC Class: 03F20 ACM Class: F.2.2

arXiv:2111.14579 [pdf, other]

doi 10.1145/3493425.3502751

Shortcutting Fast Failover Routes in the Data Plane

Authors: Apoorv Shukla, Klaus-Tycho Foerster

Abstract: In networks, availability is of paramount importance. As link failures are disruptive, modern networks in turn provide Fast ReRoute (FRR) mechanisms to rapidly restore connectivity. However, existing FRR approaches heavily impact performance until the slower convergence protocols kick in. The fast failover routes commonly involve unnecessary loops and detours, disturbing other traffic while causin… ▽ More In networks, availability is of paramount importance. As link failures are disruptive, modern networks in turn provide Fast ReRoute (FRR) mechanisms to rapidly restore connectivity. However, existing FRR approaches heavily impact performance until the slower convergence protocols kick in. The fast failover routes commonly involve unnecessary loops and detours, disturbing other traffic while causing costly packet loss. In this paper, we make a case for augmenting FRR mechanisms to avoid such inefficiencies. We introduce ShortCut that routes the packets in a loop free fashion, avoiding costly detours and decreasing link load. ShortCut achieves this by leveraging data plane programmability: when a loop is locally observed, it can be removed by short-cutting the respective route parts. As such, ShortCut is topology-independent and agnostic to the type of FRR currently deployed. Our first experimental simulations show that ShortCut can outperform control plane convergence mechanisms; moreover avoiding loops and keeping packet loss minimal opposed to existing FRR mechanisms. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: To appear at the ACM/IEEE Symposium on Architectures for Networking and Communications Systems 2021 (ANCS'21)

arXiv:2111.14053 [pdf, other]

Towards Conditional Generation of Minimal Action Potential Pathways for Molecular Dynamics

Authors: John Kevin Cava, John Vant, Nicholas Ho, Ankita Shukla, Pavan Turaga, Ross Maciejewski, Abhishek Singharoy

Abstract: In this paper, we utilized generative models, and reformulate it for problems in molecular dynamics (MD) simulation, by introducing an MD potential energy component to our generative model. By incorporating potential energy as calculated from TorchMD into a conditional generative framework, we attempt to construct a low-potential energy route of transformation between the helix~$\rightarrow$~coil… ▽ More In this paper, we utilized generative models, and reformulate it for problems in molecular dynamics (MD) simulation, by introducing an MD potential energy component to our generative model. By incorporating potential energy as calculated from TorchMD into a conditional generative framework, we attempt to construct a low-potential energy route of transformation between the helix~$\rightarrow$~coil structures of a protein. We show how to add an additional loss function to conditional generative models, motivated by potential energy of molecular configurations, and also present an optimization technique for such an augmented loss function. Our results show the benefit of this additional loss term on synthesizing realistic molecular trajectories. △ Less

Submitted 5 January, 2022; v1 submitted 28 November, 2021; originally announced November 2021.

Comments: Accepted to ELLIS ML4Molecules Workshop 2021

arXiv:2111.12798 [pdf, other]

Geometric Priors for Scientific Generative Models in Inertial Confinement Fusion

Authors: Ankita Shukla, Rushil Anirudh, Eugene Kur, Jayaraman J. Thiagarajan, Peer-Timo Bremer, Brian K. Spears, Tammy Ma, Pavan Turaga

Abstract: In this paper, we develop a Wasserstein autoencoder (WAE) with a hyperspherical prior for multimodal data in the application of inertial confinement fusion. Unlike a typical hyperspherical generative model that requires computationally inefficient sampling from distributions like the von Mis Fisher, we sample from a normal distribution followed by a projection layer before the generator. Finally,… ▽ More In this paper, we develop a Wasserstein autoencoder (WAE) with a hyperspherical prior for multimodal data in the application of inertial confinement fusion. Unlike a typical hyperspherical generative model that requires computationally inefficient sampling from distributions like the von Mis Fisher, we sample from a normal distribution followed by a projection layer before the generator. Finally, to determine the validity of the generated samples, we exploit a known relationship between the modalities in the dataset as a scientific constraint, and study different properties of the proposed model. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Comments: 5 pages, 4 figures, Fourth Workshop on Machine Learning and the Physical Sciences, NeurIPS 2021

arXiv:2110.08429 [pdf, other]

TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models

Authors: Soumick Chatterjee, Arnab Das, Chirag Mandal, Budhaditya Mukhopadhyay, Manish Vipinraj, Aniruddh Shukla, Rajatha Nagaraja Rao, Chompunuch Sarasaen, Oliver Speck, Andreas Nürnberger

Abstract: Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and… ▽ More Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods. △ Less

Submitted 7 February, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

arXiv:2110.01904 [pdf, other]

doi 10.1016/j.cosrev.2022.100496

System Security Assurance: A Systematic Literature Review

Authors: Ankur Shukla, Basel Katt, Livinus Obiora Nweke, Prosper Kandabongee Yeng, Goitom Kahsay Weldehawaryat

Abstract: System security assurance provides the confidence that security features, practices, procedures, and architecture of software systems mediate and enforce the security policy and are resilient against security failure and attacks. Alongside the significant benefits of security assurance, the evolution of new information and communication technology (ICT) introduces new challenges regarding informat… ▽ More System security assurance provides the confidence that security features, practices, procedures, and architecture of software systems mediate and enforce the security policy and are resilient against security failure and attacks. Alongside the significant benefits of security assurance, the evolution of new information and communication technology (ICT) introduces new challenges regarding information protection. Security assurance methods based on the traditional tools, techniques, and procedures may fail to account new challenges due to poor requirement specifications, static nature, and poor development processes. The common criteria (CC) commonly used for security evaluation and certification process also comes with many limitations and challenges. In this paper, extensive efforts have been made to study the state-of-the-art, limitations and future research directions for security assurance of the ICT and cyber-physical systems (CPS) in a wide range of domains. We conducted a systematic review of requirements, processes, and activities involved in system security assurance including security requirements, security metrics, system and environments and assurance methods. We highlighted the challenges and gaps that have been identified by the existing literature related to system security assurance and corresponding solutions. Finally, we discussed the limitations of the present methods and future research directions. △ Less

Submitted 29 July, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Report number: Volume 45

Journal ref: Computer Science Review, Volume 45, 2022, 100496

arXiv:2108.03760 [pdf]

Symptom based Hierarchical Classification of Diabetes and Thyroid disorders using Fuzzy Cognitive Maps

Authors: Anand M. Shukla, Pooja D. Pandit, Vasudev M. Purandare, Anuradha Srinivasaraghavan

Abstract: Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diag… ▽ More Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diagnosis with a different degree. Thus, FCMs are suitable to model Medical Decision Support Systems. The proposed work therefore uses FCMs arranged in hierarchical structure to classify between Diabetes, Thyroid disorders and their subtypes. Subtypes include type 1 and type 2 for diabetes and hyperthyroidism and hypothyroidism for thyroid. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2107.09320 [pdf, other]

QRAT Polynomially Simulates Merge Resolution

Authors: Sravanthi Chede, Anil Shukla

Abstract: Merge Resolution (MRes [Beyersdorff et al. J. Autom. Reason.'2021] ) is a refutational proof system for quantified Boolean formulas (QBF). Each line of MRes consists of clauses with only existential literals, together with information of countermodels stored as merge maps. As a result, MRes has strategy extraction by design. The QRAT [Heule et al. J. Autom. Reason.'2017] proof system was designed… ▽ More Merge Resolution (MRes [Beyersdorff et al. J. Autom. Reason.'2021] ) is a refutational proof system for quantified Boolean formulas (QBF). Each line of MRes consists of clauses with only existential literals, together with information of countermodels stored as merge maps. As a result, MRes has strategy extraction by design. The QRAT [Heule et al. J. Autom. Reason.'2017] proof system was designed to capture QBF preprocessing. QRAT can simulate both the expansion-based proof system $\forall$Exp+Res and CDCL-based QBF proof system LD-Q-Res. A family of false QBFs called SquaredEquality formulas were introduced in [Beyersdorff et al. J. Autom. Reason.'2021] and shown to be easy for MRes but need exponential size proofs in Q-Res, QU-Res, CP+$\forall$red, $\forall$Exp+Res, IR-calc and reductionless LD-Q-Res. As a result none of these systems can simulate MRes. In this paper, we show a short QRAT refutation of the SquaredEquality formulas. We further show that QRAT strictly p-simulates MRes. Besides highlighting the power of QRAT system, this work also presents the first simulation result for MRes. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 12 pages, 1 figure

MSC Class: 03F20 ACM Class: F.2.2

arXiv:2107.04547 [pdf, ps, other]

Does QRAT simulate IR-calc? QRAT simulation algorithm for $\forall$Exp+Res cannot be lifted to IR-calc

Authors: Sravanthi Chede, Anil Shukla

Abstract: We show that the QRAT simulation algorithm of $\forall$Exp+Res from [B. Kiesl and M. Seidl, 2019] cannot be lifted to IR-calc. We show that the QRAT simulation algorithm of $\forall$Exp+Res from [B. Kiesl and M. Seidl, 2019] cannot be lifted to IR-calc. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: 9 pages, 2 figures

MSC Class: 03F20 ACM Class: F.2.2

arXiv:2103.01046 [pdf, ps, other]

Extending Prolog for Quantified Boolean Horn Formulas

Authors: Anish Mallick, Anil Shukla

Abstract: Prolog is a well known declarative programming language based on propositional Horn formulas. It is useful in various areas, including artificial intelligence, automated theorem proving, mathematical logic and so on. An active research area for many years is to extend Prolog to larger classes of logic. Some important extensions of it includes the constraint logic programming, and the object orient… ▽ More Prolog is a well known declarative programming language based on propositional Horn formulas. It is useful in various areas, including artificial intelligence, automated theorem proving, mathematical logic and so on. An active research area for many years is to extend Prolog to larger classes of logic. Some important extensions of it includes the constraint logic programming, and the object oriented logic programming. However, it cannot solve problems having arbitrary quantified Horn formulas. To be precise, the facts, rules and queries in Prolog are not allowed to have arbitrary quantified variables. The paper overcomes this major limitations of Prolog by extending it for the quantified Boolean Horn formulas. We achieved this by extending the SLD-resolution proof system for quantified Boolean Horn formulas, followed by proposing an efficient model for implementation. The paper shows that the proposed implementation also supports the first-order predicate Horn logic with arbitrary quantified variables. The paper also introduces for the first time, a declarative programming for the quantified Boolean Horn formulas. △ Less

Submitted 1 March, 2021; originally announced March 2021.

arXiv:2102.08360 [pdf, other]

Interpretable COVID-19 Chest X-Ray Classification via Orthogonality Constraint

Authors: Ella Y. Wang, Anirudh Som, Ankita Shukla, Hongjun Choi, Pavan Turaga

Abstract: Deep neural networks have increasingly been used as an auxiliary tool in healthcare applications, due to their ability to improve performance of several diagnosis tasks. However, these methods are not widely adopted in clinical settings due to the practical limitations in the reliability, generalizability, and interpretability of deep learning based systems. As a result, methods have been develope… ▽ More Deep neural networks have increasingly been used as an auxiliary tool in healthcare applications, due to their ability to improve performance of several diagnosis tasks. However, these methods are not widely adopted in clinical settings due to the practical limitations in the reliability, generalizability, and interpretability of deep learning based systems. As a result, methods have been developed that impose additional constraints during network training to gain more control as well as improve interpretabilty, facilitating their acceptance in healthcare community. In this work, we investigate the benefit of using Orthogonal Spheres (OS) constraint for classification of COVID-19 cases from chest X-ray images. The OS constraint can be written as a simple orthonormality term which is used in conjunction with the standard cross-entropy loss during classification network training. Previous studies have demonstrated significant benefits in applying such constraints to deep learning models. Our findings corroborate these observations, indicating that the orthonormality loss function effectively produces improved semantic localization via GradCAM visualizations, enhanced classification performance, and reduced model calibration error. Our approach achieves an improvement in accuracy of 1.6% and 4.8% for two- and three-class classification, respectively; similar results are found for models with data augmentation applied. In addition to these findings, our work also presents a new application of the OS regularizer in healthcare, increasing the post-hoc interpretability and performance of deep learning models for COVID-19 classification to facilitate adoption of these methods in clinical settings. We also identify the limitations of our strategy that can be explored for further research in future. △ Less

Submitted 21 December, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

Comments: Accepted in the 2021 ACM CHIL Workshop track. An extended version of this work is under consideration at Pattern Recognition Letters

arXiv:2010.02429 [pdf, ps, other]

Modeling Preconditions in Text with a Crowd-sourced Dataset

Authors: Heeyoung Kwon, Mahnaz Koupaee, Pratyush Singh, Gargi Sawhney, Anmol Shukla, Keerthi Kumar Kallur, Nathanael Chambers, Niranjan Balasubramanian

Abstract: Preconditions provide a form of logical connection between events that explains why some events occur together and information that is complementary to the more widely studied relations such as causation, temporal ordering, entailment, and discourse relations. Modeling preconditions in text has been hampered in part due to the lack of large scale labeled data grounded in text. This paper introduce… ▽ More Preconditions provide a form of logical connection between events that explains why some events occur together and information that is complementary to the more widely studied relations such as causation, temporal ordering, entailment, and discourse relations. Modeling preconditions in text has been hampered in part due to the lack of large scale labeled data grounded in text. This paper introduces PeKo, a crowd-sourced annotation of preconditions between event pairs in newswire, an order of magnitude larger than prior text annotations. To complement this new corpus, we also introduce two challenge tasks aimed at modeling preconditions: (i) Precondition Identification -- a standard classification task defined over pairs of event mentions, and (ii) Precondition Generation -- a generative task aimed at testing a more general ability to reason about a given event. Evaluation on both tasks shows that modeling preconditions is challenging even for today's large language models (LM). This suggests that precondition knowledge is not easily accessible in LM-derived representations alone. Our generation results show that fine-tuning an LM on PeKo yields better conditional relations than when trained on raw text or temporally-ordered corpora. △ Less

Submitted 14 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

arXiv:2008.10277 [pdf, other]

Sample-Rank: Weak Multi-Objective Recommendations Using Rejection Sampling

Authors: Abhay Shukla, Jairaj Sathyanarayana, Dipyaman Banerjee

Abstract: Online food ordering marketplaces are multi-stakeholder systems where recommendations impact the experience and growth of each participant in the system. A recommender system in this setting has to encapsulate the objectives and constraints of different stakeholders in order to find utility of an item for recommendation. Constrained-optimization based approaches to this problem typically involve c… ▽ More Online food ordering marketplaces are multi-stakeholder systems where recommendations impact the experience and growth of each participant in the system. A recommender system in this setting has to encapsulate the objectives and constraints of different stakeholders in order to find utility of an item for recommendation. Constrained-optimization based approaches to this problem typically involve complex formulations and have high computational complexity in production settings involving millions of entities. Simplifications and relaxation techniques (for example, scalarization) help but introduce sub-optimality and can be time-consuming due to the amount of tuning needed. In this paper, we introduce a method involving multi-goal sampling followed by ranking for user-relevance (Sample-Rank), to nudge recommendations towards multi-objective (MO) goals of the marketplace. The proposed method's novelty is that it reduces the MO recommendation problem to sampling from a desired multi-goal distribution then using it to build a production-friendly learning-to-rank (LTR) model. In offline experiments we show that we are able to bias recommendations towards MO criteria with acceptable trade-offs in metrics like AUC and NDCG. We also show results from a large-scale online A/B experiment where this approach gave a statistically significant lift of 2.64% in average revenue per order (RPO) (objective #1) with no drop in conversion rate (CR) (objective #2) while holding the average last-mile traversed flat (objective #3), vs. the baseline ranking method. This method also significantly reduces time to model development and deployment in MO settings and allows for trivial extensions to more objectives and other types of LTR models. △ Less

Submitted 24 August, 2020; originally announced August 2020.

arXiv:2007.13895 [pdf, ps, other]

Linear Delay-cell Design for Low-energy Delay Multiplication and Accumulation

Authors: Aditya Shukla

Abstract: A practical deep neural network's (DNN) evaluation involves thousands of multiply-and-accumulate (MAC) operations. To extend DNN's superior inference capabilities to energy constrained devices, architectures and circuits that minimize energy-per-MAC must be developed. In this respect, analog delay-based MAC is advantageous due to reasons both extrinsic and intrinsic to the MAC implementation - (1)… ▽ More A practical deep neural network's (DNN) evaluation involves thousands of multiply-and-accumulate (MAC) operations. To extend DNN's superior inference capabilities to energy constrained devices, architectures and circuits that minimize energy-per-MAC must be developed. In this respect, analog delay-based MAC is advantageous due to reasons both extrinsic and intrinsic to the MAC implementation - (1) lower fixed-point precision requirement for a DNN's evaluation, (2) better dynamic range than charge-based accumulation, for smaller technology nodes, and (3) simpler analog-digital interfacing. Implementing DNNs using delay-based MAC requires mixed-signal delay multipliers that accept digitally stored weights and analog voltages as arguments. To this end, a novel, linearly tune-able delay-cell is proposed, wherein, the delay is realized using an inverted MOS capacitor's (C*) steady discharge from a linearly input-voltage dependent initial charge. The cell is analytically modeled, constraints for its functional validity are determined, and jitter-models are developed. Multiple cells with scaled delays, corresponding to each bit of the digital argument, must be cascaded to form the multiplier. To realize such bit-wise delay-scaling of the cells, a biasing circuit is proposed that generates sub-threshold gate-voltages to scale C*'s discharging rate, and thus area-expensive transistor width-scaling is avoided. For 130nm CMOS technology, the theoretical constraints and limits on jitter are used to find the optimal design-point and quantify the jitter versus bits-per-multiplier trade-off. Schematic-level simulations show a worst-case energy-consumption close to the state-of-art, and thus, feasibility of the cell. △ Less

Submitted 3 August, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

Comments: Keywords: Analog-computing, delay-cell, mixed-signal delay multiplier, multiply-and-accumulate

arXiv:2007.04134 [pdf, other]

Learning Speech Representations from Raw Audio by Joint Audiovisual Self-Supervision

Authors: Abhinav Shukla, Stavros Petridis, Maja Pantic

Abstract: The intuitive interaction between the audio and visual modalities is valuable for cross-modal self-supervised learning. This concept has been demonstrated for generic audiovisual tasks like video action recognition and acoustic scene classification. However, self-supervision remains under-explored for audiovisual speech. We propose a method to learn self-supervised speech representations from the… ▽ More The intuitive interaction between the audio and visual modalities is valuable for cross-modal self-supervised learning. This concept has been demonstrated for generic audiovisual tasks like video action recognition and acoustic scene classification. However, self-supervision remains under-explored for audiovisual speech. We propose a method to learn self-supervised speech representations from the raw audio waveform. We train a raw audio encoder by combining audio-only self-supervision (by predicting informative audio attributes) with visual self-supervision (by generating talking faces from audio). The visual pretext task drives the audio representations to capture information related to lip movements. This enriches the audio encoder with visual information and the encoder can be used for evaluation without the visual modality. Our method attains competitive performance with respect to existing self-supervised audio features on established isolated word classification benchmarks, and significantly outperforms other methods at learning from fewer labels. Notably, our method also outperforms fully supervised training, thus providing a strong initialization for speech related tasks. Our results demonstrate the potential of multimodal self-supervision in audiovisual speech for learning good audio representations. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: Accepted at the Workshop on Self-supervision in Audio and Speech at ICML 2020

Showing 1–50 of 106 results for author: Shukla, A