Skip to main content

Showing 1–50 of 107 results for author: Shah, P

  1. arXiv:2407.11599  [pdf, other

    cs.CR cs.AI

    Enhancing TinyML Security: Study of Adversarial Attack Transferability

    Authors: Parin Shah, Yuvaraj Govindarajulu, Pavan Kulkarni, Manojkumar Parmar

    Abstract: The recent strides in artificial intelligence (AI) and machine learning (ML) have propelled the rise of TinyML, a paradigm enabling AI computations at the edge without dependence on cloud connections. While TinyML offers real-time data analysis and swift responses critical for diverse applications, its devices' intrinsic resource limitations expose them to security risks. This research delves into… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted and presented at tinyML Foundation EMEA Innovation Forum 2024

  2. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  3. arXiv:2405.16487  [pdf, other

    cs.RO

    Dynamics Models in the Aggressive Off-Road Driving Regime

    Authors: Tyler Han, Sidharth Talia, Rohan Panicker, Preet Shah, Neel Jawale, Byron Boots

    Abstract: Current developments in autonomous off-road driving are steadily increasing performance through higher speeds and more challenging, unstructured environments. However, this operating regime subjects the vehicle to larger inertial effects, where consideration of higher-order states is necessary to avoid failures such as rollovers or excessive impact forces. Aggressive driving through Model Predicti… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRA 2024 Workshop on Resilient Off-road Autonomy

  4. arXiv:2405.13351  [pdf, other

    quant-ph cs.DS

    Quantum (Inspired) $D^2$-sampling with Applications

    Authors: Ragesh Jaiswal, Poojan Shah

    Abstract: $D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2308.08167

  5. arXiv:2405.05439  [pdf, other

    cs.RO cs.AI cs.LG stat.AP

    How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

    Authors: Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, Thomas Kollar

    Abstract: With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by dist… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2402.14983  [pdf, other

    cs.LG cs.CR q-fin.RM

    Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry

    Authors: Panyi Dong, Zhiyu Quan, Brandon Edwards, Shih-han Wang, Runhuan Feng, Tianyang Wang, Patrick Foley, Prashant Shah

    Abstract: The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  8. arXiv:2402.12867  [pdf

    cs.SE cs.LG

    Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

    Authors: Pir Sami Ullah Shah, Naveed Ahmad, Mirza Omer Beg

    Abstract: Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the pr… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  9. arXiv:2402.11728  [pdf, other

    cs.CL cs.LG q-fin.CP

    Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis

    Authors: Agam Shah, Arnav Hiray, Pratvi Shah, Arkaprabha Banerjee, Anushka Singh, Dheeraj Eidnani, Bhaskar Chaudhury, Sudheer Chava

    Abstract: In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a n… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  10. arXiv:2312.17270  [pdf, other

    cs.CR cs.LG

    Anticipated Network Surveillance -- An extrapolated study to predict cyber-attacks using Machine Learning and Data Analytics

    Authors: Aviral Srivastava, Dhyan Thakkar, Dr. Sharda Valiveti, Dr. Pooja Shah, Dr. Gaurang Raval

    Abstract: Machine learning and data mining techniques are utiized for enhancement of the security of any network. Researchers used machine learning for pattern detection, anomaly detection, dynamic policy setting, etc. The methods allow the program to learn from data and make decisions without human intervention, consuming a huge training period and computation power. This paper discusses a novel technique… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  11. arXiv:2312.13333  [pdf

    eess.IV cs.CY

    Responsible Deep Learning for Software as a Medical Device

    Authors: Pratik Shah, Jenna Lester, Jana G Deflino, Vinay Pai

    Abstract: Tools, models and statistical methods for signal processing and medical image analysis and training deep learning models to create research prototypes for eventual clinical applications are of special interest to the biomedical imaging community. But material and optical properties of biological tissues are complex and not easily captured by imaging devices. Added complexity can be introduced by d… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    ACM Class: I.2; K.4.1; J.3; I.4

  12. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  13. arXiv:2312.09369  [pdf, other

    cs.SD cs.AI eess.AS

    Audio-visual fine-tuning of audio-only ASR models

    Authors: Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

    Abstract: Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data. Recently, audio-visual self-supervised learning (SSL) approaches have been developed to reduce this dependence on transcribed AV data, but these methods are quite complex and computationally expensive. In this work, we… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  14. NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving

    Authors: Kaustab Pal, Aditya Sharma, Mohd Omama, Parth N. Shah, K. Madhava Krishna

    Abstract: In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Published in 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

  15. arXiv:2309.14003  [pdf, other

    cs.LG cs.RO

    Hierarchical Imitation Learning for Stochastic Environments

    Authors: Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson

    Abstract: Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition th… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Published at IROS'23

  16. arXiv:2308.02053  [pdf, other

    cs.CL cs.AI cs.CY

    The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

    Authors: Abel Salinas, Parth Vipul Shah, Yuzhong Huang, Robert McCormack, Fred Morstatter

    Abstract: Large Language Models (LLMs) have seen widespread deployment in various real-world applications. Understanding these biases is crucial to comprehend the potential downstream consequences when using LLMs to make decisions, particularly for historically disadvantaged groups. In this work, we propose a simple method for analyzing and comparing demographic bias in LLMs, through the lens of job recomme… ▽ More

    Submitted 9 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted to EAAMO 2023

  17. arXiv:2308.00318  [pdf, other

    cs.LG cs.AI cs.RO

    Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

    Authors: Ashrya Agrawal, Priyanshi Shah, Sourabh Prakash

    Abstract: Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as w… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  18. arXiv:2307.03839  [pdf, other

    cs.RO

    Proximity and Visuotactile Point Cloud Fusion for Contact Patches in Extreme Deformation

    Authors: Jessica Yin, Paarth Shah, Naveen Kuppuswamy, Andrew Beaulieu, Avinash Uttamchandani, Alejandro Castro, James Pikul, Russ Tedrake

    Abstract: Equipping robots with the sense of touch is critical to emulating the capabilities of humans in real world manipulation tasks. Visuotactile sensors are a popular tactile sensing strategy due to data output compatible with computer vision algorithms and accurate, high resolution estimates of local object geometry. However, these sensors struggle to accommodate high deformations of the sensing surfa… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  19. arXiv:2305.20077  [pdf, other

    cs.LG cs.DC cs.SE

    Managed Geo-Distributed Feature Store: Architecture and System Design

    Authors: Anya Li, Bhala Ranganathan, Feng Pan, Mickey Zhang, Qianjun Xu, Runhan Li, Sethu Raman, Shail Paragbhai Shah, Vivienne Tang

    Abstract: Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: All the authors are from the AzureML Feature Store product group and are listed in alphabetical order. Bhala Ranganathan: System architect and tech lead of AzureML Feature Store. Feng Pan, Qianjun Xu: Engineering managers. Sethu Raman: Product Manager of AzureML Feature Store who structured and organized the product vision and specifications

  20. arXiv:2304.13216  [pdf, other

    cs.CV cs.AI

    Exploiting CNNs for Semantic Segmentation with Pascal VOC

    Authors: Sourabh Prakash, Priyanshi Shah, Ashrya Agrawal

    Abstract: In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and s… ▽ More

    Submitted 5 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

  21. arXiv:2303.08954  [pdf, other

    cs.CL

    PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs

    Authors: Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani, Max Chang, HyunJeong Choe, David Greene, Kyle He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Shah, Zhou Yu

    Abstract: Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversation… ▽ More

    Submitted 16 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: PRESTO v1 Release

  22. arXiv:2211.06516  [pdf, other

    cs.LG

    Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms

    Authors: Vashist Avadhanula, Omar Abdul Baki, Hamsa Bastani, Osbert Bastani, Caner Gocmen, Daniel Haimovich, Darren Hwang, Dima Karamshuk, Thomas Leeper, Jiayuan Ma, Gregory Macnamara, Jake Mullett, Christopher Palow, Sung Park, Varun S Rajagopal, Kevin Schaeffer, Parikshit Shah, Deeksha Sinha, Nicolas Stier-Moses, Peng Xu

    Abstract: We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation t… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  23. arXiv:2211.01595  [pdf, other

    eess.SY cs.LG

    Reinforcement Learning in Non-Markovian Environments

    Authors: Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

    Abstract: Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek g… ▽ More

    Submitted 13 February, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 19 pages, accepted for publication at Systems and Control Letters

  24. arXiv:2210.10526  [pdf, other

    cs.LG cs.SD eess.AS

    Propagating Variational Model Uncertainty for Bioacoustic Call Label Smoothing

    Authors: Georgios Rizos, Jenna Lawson, Simon Mitchell, Pranay Shah, Xin Wen, Cristina Banks-Leite, Robert Ewers, Bjoern W. Schuller

    Abstract: We focus on using the predictive uncertainty signal calculated by Bayesian neural networks to guide learning in the self-same task the model is being trained on. Not opting for costly Monte Carlo sampling of weights, we propagate the approximate hidden variance in an end-to-end manner, throughout a variational Bayesian adaptation of a ResNet with attention and squeeze-and-excitation blocks, in ord… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  25. arXiv:2210.09539  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving

    Authors: Eli Bronstein, Mark Palatucci, Dominik Notz, Brandyn White, Alex Kuefler, Yiren Lu, Supratik Paul, Payam Nikdel, Paul Mougin, Hongge Chen, Justin Fu, Austin Abrams, Punit Shah, Evan Racah, Benjamin Frenkel, Shimon Whiteson, Dragomir Anguelov

    Abstract: We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving. We augment standard MGAIL using a hierarchical model to enable generalization to arbitrary goal routes, and measure performance using a closed-loop evaluation framework with simulated interactive agents. We train policies from expert trajectorie… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: IROS 2022

    Journal ref: IEEE/RSJ international conference on intelligent robots and systems (IROS) 2022, pages 8652-8659

  26. arXiv:2210.06583  [pdf, other

    cs.CV cs.LG eess.IV

    S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

    Authors: Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré

    Abstract: Visual data such as images and videos are typically modeled as discretizations of inherently continuous, multidimensional signals. Existing continuous-signal models attempt to exploit this fact by modeling the underlying signals of visual (e.g., image) data directly. However, these models have not yet been able to achieve competitive performance on practical vision tasks such as large-scale image… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  27. arXiv:2210.02127  [pdf, other

    cs.RO

    Visual-Inertial and Leg Odometry Fusion for Dynamic Locomotion

    Authors: Victor Dhédin, Haolong Li, Shahram Khorshidi, Lukas Mack, Adithya Kumar Chinnakkonda Ravi, Avadesh Meduri, Paarth Shah, Felix Grimminger, Ludovic Righetti, Majid Khadiv, Joerg Stueckler

    Abstract: Implementing dynamic locomotion behaviors on legged robots requires a high-quality state estimation module. Especially when the motion includes flight phases, state-of-the-art approaches fail to produce reliable estimation of the robot posture, in particular base height. In this paper, we propose a novel approach for combining visual-inertial odometry (VIO) with leg odometry in an extended Kalman… ▽ More

    Submitted 10 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA), 2023

  28. arXiv:2209.06927  [pdf, other

    cs.NE

    Optimization of Rocker-Bogie Mechanism using Heuristic Approaches

    Authors: Harsh Senjaliya, Pranshav Gajjar, Brijan Vaghasiya, Pooja Shah, Paresh Gujarati

    Abstract: Optimal locomotion and efficient traversal of extraterrestrial rovers in dynamic terrains and environments is an important problem statement in the field of planetary science and geophysical systems. Designing a superlative and efficient architecture for the suspension mechanism of planetary rovers is a crucial step towards robust rovers. This paper focuses on the Rocker Bogie mechanism, a standar… ▽ More

    Submitted 25 September, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: 17 Pages, 18 Figures

  29. arXiv:2208.01637  [pdf, other

    eess.IV cs.CV

    Comparative Analysis of State-of-the-Art Deep Learning Models for Detecting COVID-19 Lung Infection from Chest X-Ray Images

    Authors: Zeba Ghaffar, Pir Masoom Shah, Hikmat Khan, Syed Farhan Alam Zaidi, Abdullah Gani, Izaz Ahmad Khan, Munam Ali Shah, Saif ul Islam

    Abstract: The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists wor… ▽ More

    Submitted 30 June, 2022; originally announced August 2022.

  30. arXiv:2207.05225  [pdf, other

    cs.LG cs.CV

    Susceptibility of Continual Learning Against Adversarial Attacks

    Authors: Hikmat Khan, Pir Masoom Shah, Syed Farhan Alam Zaidi, Saif ul Islam, Qasim Zia

    Abstract: Recent continual learning approaches have primarily focused on mitigating catastrophic forgetting. Nevertheless, two critical areas have remained relatively unexplored: 1) evaluating the robustness of proposed methods and 2) ensuring the security of learned tasks. This paper investigates the susceptibility of continually learned tasks, including current and previously acquired tasks, to adversaria… ▽ More

    Submitted 8 October, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 18 pages, 13 figures

  31. arXiv:2207.01723  [pdf, other

    cs.CV

    Adaptive Fine-Grained Sketch-Based Image Retrieval

    Authors: Ayan Kumar Bhunia, Aneeshan Sain, Parth Shah, Animesh Gupta, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

    Abstract: The recent focus on Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) has shifted towards generalising a model to new categories without any training data from them. In real-world applications, however, a trained FG-SBIR model is often applied to both new categories and different human sketchers, i.e., different drawing styles. Although this complicates the generalisation problem, fortunately, a… ▽ More

    Submitted 19 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted in ECCV 2022. Minor typos and Eq.4 corrected

  32. arXiv:2205.05009  [pdf, other

    eess.IV cs.CV

    Using Deep Learning-based Features Extracted from CT scans to Predict Outcomes in COVID-19 Patients

    Authors: Sai Vidyaranya Nuthalapati, Marcela Vizcaychipi, Pallav Shah, Piotr Chudzik, Chee Hau Leow, Paria Yousefi, Ahmed Selim, Keiran Tait, Ben Irving

    Abstract: The COVID-19 pandemic has had a considerable impact on day-to-day life. Tackling the disease by providing the necessary resources to the affected is of paramount importance. However, estimation of the required resources is not a trivial task given the number of factors which determine the requirement. This issue can be addressed by predicting the probability that an infected patient requires Inten… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

  33. arXiv:2205.03195  [pdf, other

    cs.LG cs.RO

    Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation

    Authors: Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, Shimon Whiteson

    Abstract: Simulation is a crucial tool for accelerating the development of autonomous vehicles. Making simulation realistic requires models of the human road users who interact with such cars. Such models can be obtained by applying learning from demonstration (LfD) to trajectories observed by cars already on the road. However, existing LfD methods are typically insufficient, yielding policies that frequent… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

    Comments: Accepted to ICRA-2022

  34. arXiv:2205.02475  [pdf, other

    cs.SD cs.CL eess.AS

    Speaker Recognition in the Wild

    Authors: Neeraj Chhimwal, Anirudh Gupta, Rishabh Gaur, Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Vivek Raghavan

    Abstract: In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-exper… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: This paper was submitted to Interspeech 2022

  35. Federated Learning Enables Big Data for Rare Cancer Boundary Detection

    Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

    Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More

    Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

  36. arXiv:2204.06748  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning

    Authors: Geunseob Oh, Rahul Goel, Chris Hidey, Shachi Paul, Aditya Gupta, Pararth Shah, Rushin Shah

    Abstract: Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

  37. arXiv:2203.16825  [pdf, other

    cs.CL

    indic-punct: An automatic punctuation restoration and inverse text normalization framework for Indic languages

    Authors: Anirudh Gupta, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Priyanshi Shah, Harveen Singh Chadha, Vivek Raghavan

    Abstract: Automatic Speech Recognition (ASR) generates text which is most of the times devoid of any punctuation. Absence of punctuation is text can affect readability. Also, down stream NLP tasks such as sentiment analysis, machine translation, greatly benefit by having punctuation and sentence boundary information. We present an approach for automatic punctuation of text using a pretrained IndicBERT model… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Submitted to InterSpeech 2022. arXiv admin note: text overlap with arXiv:2104.05055 by other authors

  38. arXiv:2203.16823  [pdf, other

    cs.CL cs.SD eess.AS

    Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

    Authors: Anirudh Gupta, Rishabh Gaur, Ankur Dhuriya, Harveen Singh Chadha, Neeraj Chhimwal, Priyanshi Shah, Vivek Raghavan

    Abstract: In the recent years end to end (E2E) automatic speech recognition (ASR) systems have achieved promising results given sufficient resources. Even for languages where not a lot of labelled data is available, state of the art E2E ASR systems can be developed by pretraining on huge amounts of high resource languages and finetune on low resource languages. For a lot of low resource languages the curren… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: Submitted to InterSpeech 2022

  39. arXiv:2203.16601   

    cs.CL eess.AS

    Is Word Error Rate a good evaluation metric for Speech Recognition in Indic Languages?

    Authors: Priyanshi Shah, Harveen Singh Chadha, Anirudh Gupta, Ankur Dhuriya, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

    Abstract: We propose a new method for the calculation of error rates in Automatic Speech Recognition (ASR). This new metric is for languages that contain half characters and where the same character can be written in different forms. We implement our methodology in Hindi which is one of the main languages from Indic context and we think this approach is scalable to other similar languages containing a large… ▽ More

    Submitted 15 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Need to upgrade the content completely

  40. arXiv:2203.16595   

    cs.CL eess.AS

    Improving Speech Recognition for Indic Languages using Language Model

    Authors: Ankur Dhuriya, Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Rishabh Gaur, Vivek Raghavan

    Abstract: We study the effect of applying a language model (LM) on the output of Automatic Speech Recognition (ASR) systems for Indic languages. We fine-tune wav2vec $2.0$ models for $18$ Indic languages and adjust the results with language models trained on text derived from a variety of sources. Our findings demonstrate that the average Character Error Rate (CER) decreases by over $28$ \% and the average… ▽ More

    Submitted 15 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Need to upgrade the content completely

  41. arXiv:2203.16578  [pdf, ps, other

    cs.CL eess.AS

    Code Switched and Code Mixed Speech Recognition for Indic languages

    Authors: Harveen Singh Chadha, Priyanshi Shah, Ankur Dhuriya, Neeraj Chhimwal, Anirudh Gupta, Vivek Raghavan

    Abstract: Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific. Training multilingual system for Indic languages is even more tougher due to lack of open source datasets and results on different approaches. We compare the performance of end to end multilingual speech recognition system to the performance of mo… ▽ More

    Submitted 13 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: This paper for submitted to Interspeech 2022

  42. arXiv:2203.16512  [pdf, other

    cs.CL eess.AS

    Vakyansh: ASR Toolkit for Low Resource Indic languages

    Authors: Harveen Singh Chadha, Anirudh Gupta, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

    Abstract: We present Vakyansh, an end to end toolkit for Speech Recognition in Indic languages. India is home to almost 121 languages and around 125 crore speakers. Yet most of the languages are low resource in terms of data and pretrained models. Through Vakyansh, we introduce automatic data pipelines for data creation, model training, model evaluation and deployment. We create 14,000 hours of speech data… ▽ More

    Submitted 15 June, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

  43. arXiv:2202.11217  [pdf, other

    cs.RO cs.LG

    Differentiable and Learnable Robot Models

    Authors: Franziska Meier, Austin Wang, Giovanni Sutanto, Yixin Lin, Paarth Shah

    Abstract: Building differentiable simulations of physical processes has recently received an increasing amount of attention. Specifically, some efforts develop differentiable robotic physics engines motivated by the computational benefits of merging rigid body simulations with modern differentiable machine learning libraries. Here, we present a library that focuses on the ability to combine data driven meth… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  44. arXiv:2202.05194  [pdf, other

    cs.GT

    Robust and fair work allocation

    Authors: Amine Allouah, Christian Kroer, Xuan Zhang, Vashist Avadhanula, Anil Dania, Caner Gocmen, Sergey Pupyrev, Parikshit Shah, Nicolas Stier

    Abstract: In today's digital world, interaction with online platforms is ubiquitous, and thus content moderation is important for protecting users from content that do not comply with pre-established community guidelines. Having a robust content moderation system throughout every stage of planning is particularly important. We study the short-term planning problem of allocating human content reviewers to di… ▽ More

    Submitted 14 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  45. arXiv:2201.07601  [pdf, other

    cs.RO

    BiConMP: A Nonlinear Model Predictive Control Framework for Whole Body Motion Planning

    Authors: Avadesh Meduri, Paarth Shah, Julian Viereck, Majid Khadiv, Ioannis Havoutis, Ludovic Righetti

    Abstract: Online planning of whole-body motions for legged robots is challenging due to the inherent nonlinearity in the robot dynamics. In this work, we propose a nonlinear MPC framework, the BiConMP which can generate whole body trajectories online by efficiently exploiting the structure of the robot dynamics. BiConMP is used to generate various cyclic gaits on a real quadruped robot and its performance i… ▽ More

    Submitted 15 September, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

  46. arXiv:2112.12142  [pdf

    cs.DC

    Survey the storage systems used in HPC and BDA ecosystems

    Authors: Priyam Shah, Jie Ye, Xian-He Sun

    Abstract: The advancement in HPC and BDA ecosystem demands a better understanding of the storage systems to plan effective solutions. To make applications access data more efficiently for computation, HPC and BDA ecosystems adopt different storage systems. Each storage system has its pros and cons. Therefore, it is worthwhile and interesting to explore the storage systems used in HPC and BDA respectively. A… ▽ More

    Submitted 23 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 13 pages, 10 figures, 7 tables

  47. arXiv:2110.06894  [pdf, other

    cs.CL

    Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

    Authors: Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

    Abstract: In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8). In these challenges, the best-performing systems relied heavily on human-generated descriptions of the video content, which were available in the dat… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: https://dstc10.dstc.community/home and https://github.com/dialogtekgeek/AVSD-DSTC10_Official/

  48. arXiv:2109.00115  [pdf, other

    eess.IV cs.CV cs.LG

    Uncertainty Quantified Deep Learning for Predicting Dice Coefficient of Digital Histopathology Image Segmentation

    Authors: Sambuddha Ghosal, Audrey Xie, Pratik Shah

    Abstract: Deep learning models (DLMs) can achieve state of the art performance in medical image segmentation and classification tasks. However, DLMs that do not provide feedback for their predictions such as Dice coefficients (Dice) have limited deployment potential in real world clinical settings. Uncertainty estimates can increase the trust of these automated systems by identifying predictions that need f… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

    Comments: Submitted to the 2022 IEEE International Symposium on Biomedical Imaging (ISBI) scientific conference

    MSC Class: 68T07; 54H30 ACM Class: I.2.1; G.3

  49. arXiv:2108.01797  [pdf, other

    cs.RO

    Rapid Convex Optimization of Centroidal Dynamics using Block Coordinate Descent

    Authors: Paarth Shah, Avadesh Meduri, Wolfgang Merkt, Majid Khadiv, Ioannis Havoutis, Ludovic Righetti

    Abstract: In this paper we explore the use of block coordinate descent (BCD) to optimize the centroidal momentum dynamics for dynamically consistent multi-contact behaviors. The centroidal dynamics have recently received a large amount of attention in order to create physically realizable motions for robots with hands and feet while being computationally more tractable than full rigid body dynamics models.… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

  50. arXiv:2107.07402  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    CLSRIL-23: Cross Lingual Speech Representations for Indic Languages

    Authors: Anirudh Gupta, Harveen Singh Chadha, Priyanshi Shah, Neeraj Chhimwal, Ankur Dhuriya, Rishabh Gaur, Vivek Raghavan

    Abstract: We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages. It is built on top of wav2vec 2.0 which is solved by training a contrastive task over masked latent speech representations and jointly learns the quantization of latents shared across all languages. We compare the language wise… ▽ More

    Submitted 13 January, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: 7 pages, 2 figures