Skip to main content

Showing 1–50 of 87 results for author: Bhatnagar, S

  1. arXiv:2405.18560  [pdf, other

    cs.CV cs.AI cs.IR cs.LG eess.IV

    Potential Field Based Deep Metric Learning

    Authors: Shubhang Bhatnagar, Narendra Ahuja

    Abstract: Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model, inspired by electrostatic fields in physics that, instead of in tuples, represents the influence of each example (embedding) by a continuous potentia… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.12167  [pdf, other

    cs.CY

    Open-Source Assessments of AI Capabilities: The Proliferation of AI Analysis Tools, Replicating Competitor Models, and the Zhousidun Dataset

    Authors: Ritwik Gupta, Leah Walker, Eli Glickman, Raine Koizumi, Sarthak Bhatnagar, Andrew W. Reddie

    Abstract: The integration of artificial intelligence (AI) into military capabilities has become a norm for major military power across the globe. Understanding how these AI models operate is essential for maintaining strategic advantages and ensuring security. This paper demonstrates an open-source methodology for analyzing military AI models through a detailed examination of the Zhousidun dataset, a Chines… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2405.06621  [pdf, other

    cs.IT

    On Streaming Codes for Simultaneously Correcting Burst and Random Erasures

    Authors: Shobhit Bhatnagar, Biswadip Chakraborty, P. Vijay Kumar

    Abstract: Streaming codes are packet-level codes that recover dropped packets within a strict decoding-delay constraint. We study streaming codes over a sliding-window (SW) channel model which admits only those erasure patterns which allow either a single burst erasure of $\le b$ packets along with $\le e$ random packet erasures, or else, $\le a$ random packet erasures, in any sliding-window of $w$ time slo… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  4. arXiv:2405.06606  [pdf, other

    cs.IT

    On Streaming Codes for Burst and Random Errors

    Authors: Shobhit Bhatnagar, P. Vijay Kumar

    Abstract: Streaming codes (SCs) are packet-level codes that recover erased packets within a strict decoding-delay deadline. Streaming codes for various packet erasure channel models such as sliding-window (SW) channel models that admit random or burst erasures in any SW of a fixed length have been studied in the literature, and the optimal rate as well as rate-optimal code constructions of SCs over such cha… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  5. arXiv:2404.16193  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Improving Multi-label Recognition using Class Co-Occurrence Probabilities

    Authors: Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja

    Abstract: Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such c… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2403.14977  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Piecewise-Linear Manifolds for Deep Metric Learning

    Authors: Shubhang Bhatnagar, Narendra Ahuja

    Abstract: Unsupervised deep metric learning (UDML) focuses on learning a semantic representation space using only unlabeled data. This challenging problem requires accurately estimating the similarity between data points, which is used to supervise a deep network. For this purpose, we propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted at CPAL 2024 (Oral)

  7. arXiv:2402.01371  [pdf, other

    cs.LG

    Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation

    Authors: Prashansa Panda, Shalabh Bhatnagar

    Abstract: In recent years, there has been a lot of research activity focused on carrying out non-asymptotic convergence analyses for actor-critic algorithms. Recently a two-timescale critic-actor algorithm has been presented for the discounted cost setting in the look-up table case where the timescales of the actor and the critic are reversed and only asymptotic convergence shown. In our work, we present th… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  8. Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

    Authors: Saakaar Bhatnagar

    Abstract: Continuous Time Echo State Networks (CTESNs) are a promising yet under-explored surrogate modeling technique for dynamical systems, particularly those governed by stiff Ordinary Differential Equations (ODEs). A key determinant of the generalization accuracy of a CTESN surrogate is the method of projecting the reservoir state to the output. This paper shows that of the two common projection methods… ▽ More

    Submitted 5 January, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

  9. arXiv:2311.11789  [pdf, other

    cs.LG cs.MA math.OC

    Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

    Authors: Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

    Abstract: In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents, incurring huge computational costs. Thus, recent works… ▽ More

    Submitted 29 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  10. arXiv:2310.16363  [pdf, other

    cs.LG

    Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

    Authors: Prashansa Panda, Shalabh Bhatnagar

    Abstract: Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algor… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  11. arXiv:2310.05000  [pdf, ps, other

    cs.LG cs.AI eess.SY math.OC

    The Reinforce Policy Gradient Algorithm Revisited

    Authors: Shalabh Bhatnagar

    Abstract: We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm.… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  12. Physics Informed Neural Networks for Modeling of 3D Flow-Thermal Problems with Sparse Domain Data

    Authors: Saakaar Bhatnagar, Andrew Comerford, Araz Banaeizadeh

    Abstract: Successfully training Physics Informed Neural Networks (PINNs) for highly nonlinear PDEs on complex 3D domains remains a challenging task. In this paper, PINNs are employed to solve the 3D incompressible Navier-Stokes (NS) equations at moderate to high Reynolds numbers for complex geometries. The presented method utilizes very sparsely distributed solution data in the domain. A detailed investigat… ▽ More

    Submitted 3 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  13. arXiv:2308.04643  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Long-Distance Gesture Recognition using Dynamic Neural Networks

    Authors: Shubhang Bhatnagar, Sharath Gopal, Narendra Ahuja, Liu Ren

    Abstract: Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a dr… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

    Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 1307-1312

  14. arXiv:2305.12239  [pdf, other

    cs.LG cs.AI

    Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

    Authors: Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar

    Abstract: The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy… ▽ More

    Submitted 19 July, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted at ICML 2023

  15. arXiv:2305.12125  [pdf, other

    cs.LG cs.AI

    A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

    Abstract: We present a novel algorithm for training deep neural networks in supervised (classification and regression) and unsupervised (reinforcement learning) scenarios. This algorithm combines the standard stochastic gradient descent and the gradient clipping method. The output layer is updated using clipped gradients, the rest of the neural network is updated using standard gradients. Updating the outpu… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: 30 pages, 12 figures

    MSC Class: 90B05; 90C40; 90C90

  16. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  17. arXiv:2303.07068  [pdf, other

    cs.LG

    n-Step Temporal Difference Learning with Optimal n

    Authors: Lakshmi Mandal, Shalabh Bhatnagar

    Abstract: We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure that we adopt to the discrete optimization setting by using a random projection approach. We prove the conve… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  18. Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions

    Authors: Jesse Islam, Maxime Turgeon, Robert Sladek, Sahir Bhatnagar

    Abstract: In the context of survival analysis, data-driven neural network-based methods have been developed to model complex covariate effects. While these methods may provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines th… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

  19. arXiv:2212.10477  [pdf, ps, other

    cs.LG math.ST stat.ML

    Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

    Authors: Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

    Abstract: We present in this paper a family of generalized simultaneous perturbation-based gradient search (GSPGS) estimators that use noisy function measurements. The number of function measurements required by each estimator is guided by the desired level of accuracy. We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present t… ▽ More

    Submitted 12 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: The material in this paper was presented in part at the Conference on Information Sciences and Systems (CISS) in March 2023

  20. arXiv:2211.09174  [pdf, other

    cs.LG cs.AI

    CASPR: Customer Activity Sequence-based Prediction and Representation

    Authors: Pin-Jung Chen, Sahil Bhatnagar, Sagar Goyal, Damian Konrad Kowalczyk, Mayank Shrivastava

    Abstract: Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning pr… ▽ More

    Submitted 28 November, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Presented at the Table Representation Learning Workshop, NeurIPS 2022, New Orleans. Authors listed in random order

  21. arXiv:2210.07573  [pdf, other

    cs.LG

    Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

    Authors: Ashish Kumar Jayant, Shalabh Bhatnagar

    Abstract: During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem has been recently well stud… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Proceedings of NeurIPS 2022

  22. A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

    Authors: Soumyajit Guin, Shalabh Bhatnagar

    Abstract: The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general. Another setting that has become popular in recent times is of Constrained Reinforcement Learning, wher… ▽ More

    Submitted 21 June, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

  23. Actor-Critic or Critic-Actor? A Tale of Two Time Scales

    Authors: Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

    Abstract: We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We observe that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We provide a proof of convergence and compare… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 October, 2022; originally announced October 2022.

  24. An Agent-Based Fleet Management Model for First- and Last-Mile Services

    Authors: Saumya Bhatnagar, Tarun Rambha, Gitakrishnan Ramadurai

    Abstract: With the growth of cars and car-sharing applications, commuters in many cities, particularly developing countries, are shifting away from public transport. These shifts have affected two key stakeholders: transit operators and first- and last-mile (FLM) services. Although most cities continue to invest heavily in bus and metro projects to make public transit attractive, ridership in these systems… ▽ More

    Submitted 4 December, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

  25. arXiv:2208.00290  [pdf, ps, other

    math.OC cs.LG

    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

    Authors: Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of… ▽ More

    Submitted 30 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

  26. arXiv:2201.00286  [pdf, ps, other

    cs.LG cs.AI eess.SY

    Reinforcement Learning for Task Specifications with Action-Constraints

    Authors: Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar

    Abstract: In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton… ▽ More

    Submitted 1 January, 2022; originally announced January 2022.

  27. arXiv:2112.02999  [pdf, other

    cs.RO

    Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning

    Authors: Utkarsh A. Mishra, Soumya R. Samineni, Prakhar Goel, Chandravaran Kunjeti, Himanshu Lodha, Aman Singh, Aditya Sagi, Shalabh Bhatnagar, Shishir Kolathaya

    Abstract: Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two… ▽ More

    Submitted 4 November, 2021; originally announced December 2021.

    Comments: 8 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2110.12239

  28. arXiv:2111.11768  [pdf, other

    cs.LG

    Schedule Based Temporal Difference Algorithms

    Authors: Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

    Abstract: Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($λ$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step returns in TD($λ$), controlled by the parameter $λ$, decrease exponentially with increasing $n$. In this paper, we present a $λ$-schedule procedure that generalizes the… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  29. arXiv:2111.11004  [pdf, other

    cs.LG

    Gradient Temporal Difference with Momentum: Stability and Convergence

    Authors: Rohan Deb, Shalabh Bhatnagar

    Abstract: Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of these algorithms asymptotically. In doing so, we… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  30. arXiv:2110.15093  [pdf, other

    cs.LG cs.AI

    Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids

    Authors: Vivek VP, Dr. Shalabh Bhatnagar

    Abstract: Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of i… ▽ More

    Submitted 6 August, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

  31. arXiv:2110.10969  [pdf, other

    cs.LG cs.CV cs.NE

    Memory Efficient Adaptive Attention For Multiple Domain Learning

    Authors: Himanshu Pradeep Aswani, Abhiraj Sunil Kanse, Shubhang Bhatnagar, Amit Sethi

    Abstract: Training CNNs from scratch on new domains typically demands large numbers of labeled images and computations, which is not suitable for low-power hardware. One way to reduce these requirements is to modularize the CNN architecture and freeze the weights of the heavier modules, that is, the lower layers after pre-training. Recent studies have proposed alternative modular architectures and schemes t… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: 13 pages, 3 figures, 4 graphs, 3 tables

  32. arXiv:2110.10017  [pdf, other

    cs.LG cs.AI

    Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

    Authors: Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K. J., Shalabh Bhatnagar

    Abstract: Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is ve… ▽ More

    Submitted 15 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: This paper has been accepted for presentation at the IJCNN at IEEE WCCI 2022 and for publication in the conference proceedings published by IEEE

  33. arXiv:2102.10165  [pdf, other

    cs.IT

    Analyzing Cross Validation In Compressed Sensing With Mixed Gaussian And Impulse Measurement Noise With L1 Errors

    Authors: Chinmay Gurjarpadhye, Shubhang Bhatnagar, Ajit Rajwade

    Abstract: Compressed sensing (CS) involves sampling signals at rates less than their Nyquist rates and attempting to reconstruct them after sample acquisition. Most such algorithms have parameters, for example the regularization parameter in LASSO, which need to be chosen carefully for optimal performance. These parameters can be chosen based on assumptions on the noise level or signal sparsity, but this kn… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  34. arXiv:2101.02349  [pdf, other

    cs.AI cs.MA

    Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

    Authors: P. Parnika, Raghuram Bharadwaj Diddigi, Sai Koti Reddy Danda, Shalabh Bhatnagar

    Abstract: In this work, we consider the problem of computing optimal actions for Reinforcement Learning (RL) agents in a co-operative setting, where the objective is to optimize a common goal. However, in many real-life applications, in addition to optimizing the goal, the agents are required to satisfy certain constraints specified on their actions. Under this setting, the objective of the agents is to not… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  35. arXiv:2010.16342  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

    Authors: Kartik Paigwar, Lokesh Krishna, Sashank Tirumala, Naman Khetan, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

    Abstract: In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtain… ▽ More

    Submitted 10 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

    Comments: Accepted in 4th Conference on Robot Learning 2020, MIT, USA

  36. arXiv:2010.15947  [pdf, other

    cs.CV cs.LG

    PAL : Pretext-based Active Learning

    Authors: Shubhang Bhatnagar, Sachin Goyal, Darshan Tank, Amit Sethi

    Abstract: The goal of pool-based active learning is to judiciously select a fixed-sized subset of unlabeled samples from a pool to query an oracle for their labels, in order to maximize the accuracy of a supervised learner. However, the unsaid requirement that the oracle should always assign correct labels is unreasonable for most situations. We propose an active learning technique for deep neural networks… ▽ More

    Submitted 28 March, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  37. arXiv:2010.06142  [pdf, other

    cs.LG

    Hindsight Experience Replay with Kronecker Product Approximate Curvature

    Authors: Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

    Abstract: Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. Ho… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: text overlap with arXiv:1708.05144 by other authors

  38. arXiv:2009.00821  [pdf, other

    eess.SY cs.AI

    A reinforcement learning approach to hybrid control design

    Authors: Meet Gandhi, Atreyee Kundu, Shalabh Bhatnagar

    Abstract: In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov Decision Process (MDP). This result facilitates the application of off-the-shelf algorithms from Reinforcement Learning (RL) literature towards designing optimal co… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

    Comments: 9 pages

  39. arXiv:2008.13066  [pdf, other

    stat.ML cs.LG stat.ME

    Computer Model Calibration with Time Series Data using Deep Learning and Quantile Regression

    Authors: Saumya Bhatnagar, Won Chang, Seonjin Kim Jiali Wang

    Abstract: Computer models play a key role in many scientific and engineering problems. One major source of uncertainty in computer model experiment is input parameter uncertainty. Computer model calibration is a formal statistical procedure to infer input parameters by combining information from model runs and observational data. The existing standard calibration framework suffers from inferential issues wh… ▽ More

    Submitted 8 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

  40. arXiv:2007.14290  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

    Authors: Sashank Tirumala, Sagar Gubbi, Kartik Paigwar, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

    Abstract: With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler p… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: 6 pages, Robot and Human Interaction Conference Italy 2020

  41. A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks

    Authors: Shravan Nayak, Chanakya Ajit Ekbote, Annanya Pratap Singh Chauhan, Raghuram Bharadwaj Diddigi, Prishita Ray, Abhinava Sikdar, Sai Koti Reddy Danda, Shalabh Bhatnagar

    Abstract: We consider the problem of energy management in microgrid networks. A microgrid is capable of generating a limited amount of energy from a renewable resource and is responsible for handling the demands of its dedicated customers. Owing to the variable nature of renewable generation and the demands of the customers, it becomes imperative that each microgrid optimally manages its energy. This involv… ▽ More

    Submitted 15 November, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

  42. arXiv:1912.12907  [pdf, other

    cs.RO

    Gait Library Synthesis for Quadruped Robots via Augmented Random Search

    Authors: Sashank Tirumala, Aditya Sagi, Kartik Paigwar, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

    Abstract: In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

    Comments: 7 pages, 11 figures, 1 table

  43. arXiv:1911.08826  [pdf, other

    cs.LG cs.AI

    Hierarchical Average Reward Policy Gradient Algorithms

    Authors: Akshay Dharmavaram, Matthew Riemer, Shalabh Bhatnagar

    Abstract: Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theore… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: 6 pages, 3 figures, to be published in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence

  44. arXiv:1911.05697  [pdf, other

    cs.LG stat.ML

    A Convergent Off-Policy Temporal Difference Algorithm

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of off-policy prediction. Temporal Difference (TD) learning algorithms are a popular class of algorithms for solving the prediction problem. TD algorithms with linear… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  45. Generalized Speedy Q-learning

    Authors: Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role. It is possible to generalize the Bellman operator using the technique of successive relaxa… ▽ More

    Submitted 12 February, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Journal ref: in IEEE Control Systems Letters, vol. 4, no. 3, pp. 524-529, July 2020

  46. arXiv:1906.06659  [pdf, ps, other

    cs.LG cs.GT stat.ML

    A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation that has been successfully app… ▽ More

    Submitted 18 March, 2022; v1 submitted 16 June, 2019; originally announced June 2019.

  47. arXiv:1905.13166  [pdf, other

    physics.flu-dyn cs.CE

    Prediction of Aerodynamic Flow Fields Using Convolutional Neural Networks

    Authors: Yaser Afshar, Saakaar Bhatnagar, Shaowu Pan, Karthik Duraisamy, Shailendra Kaushik

    Abstract: An approximation model based on convolutional neural networks (CNNs) is proposed for flow field predictions. The CNN is used to predict the velocity and pressure field in unseen flow conditions and geometries given the pixelated shape of the object. In particular, we consider Reynolds Averaged Navier-Stokes (RANS) flow solutions over airfoil shapes. The CNN can automatically detect essential featu… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Report number: CM-19-0035

  48. arXiv:1905.06077  [pdf, other

    cs.RO cs.LG

    Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

    Authors: Shounak Bhattacharya, Abhik Singla, Abhimanyu, Dhaivat Dholakiya, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, Shishir Kolathaya

    Abstract: In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this pr… ▽ More

    Submitted 15 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Submitted to IEEE RO-MAN 2019. Supplementary video: https://youtu.be/INp4aa-8z2E

  49. arXiv:1905.03970  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning in Non-Stationary Environments

    Authors: Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

    Abstract: Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, one often encounters situations with non-stationary environments and in these scenarios, RL methods yield sub-optimal decisions. In this pape… ▽ More

    Submitted 19 May, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

    Journal ref: Applied Intelligence 2020

  50. Generalized Second Order Value Iteration in Markov Decision Processes

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

    Abstract: Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. Su… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted for publication at IEEE Transactions on Automatic Control