Skip to main content

Showing 1–50 of 89 results for author: Jha, A

  1. arXiv:2407.07858  [pdf, other

    cs.LG cs.CL

    FACTS About Building Retrieval Augmented Generation-based Chatbots

    Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

    Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

  2. arXiv:2407.04207  [pdf, other

    cs.CV

    Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning

    Authors: Mainak Singha, Ankit Jha, Divyam Gupta, Pranav Singla, Biplab Banerjee

    Abstract: We address the challenges inherent in sketch-based image retrieval (SBIR) across various settings, including zero-shot SBIR, generalized zero-shot SBIR, and fine-grained zero-shot SBIR, by leveraging the vision-language foundation model, CLIP. While recent endeavors have employed CLIP to enhance SBIR, these approaches predominantly follow uni-modal prompt processing and overlook to fully exploit C… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted in ECCV 2024

  3. arXiv:2407.00534  [pdf

    cs.CR

    Blockchain based Decentralized Petition System

    Authors: Jagdeep Kaur, Kevin Antony, Nikhil Pujar, Ankit Jha

    Abstract: A decentralized online petition system enables individuals or groups to create, sign, and share petitions without a central authority. Using blockchain technology, these systems ensure the integrity and transparency of the petition process by recording every signature or action on the blockchain, making alterations or deletions impossible. This provides a permanent, tamper-proof record of the peti… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.01044  [pdf

    physics.med-ph cs.AI

    Nuclear Medicine Artificial Intelligence in Action: The Bethesda Report (AI Summit 2024)

    Authors: Arman Rahmim, Tyler J. Bradshaw, Guido Davidzon, Joyita Dutta, Georges El Fakhri, Munir Ghesani, Nicolas A. Karakatsanis, Quanzheng Li, Chi Liu, Emilie Roncali, Babak Saboury, Tahir Yusufaly, Abhinav K. Jha

    Abstract: The 2nd SNMMI Artificial Intelligence (AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD, on February 29 - March 1, 2024. Bringing together various community members and stakeholders, and following up on a prior successful 2022 AI Summit, the summit theme was: AI in Action. Six key topics included (i) an overview of prior and ongoing efforts by the AI task force, (ii) em… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.15341  [pdf, other

    cs.AI cs.CV

    V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

    Authors: Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

    Abstract: In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting th… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2404.16048  [pdf, other

    cs.HC cs.AI

    GUIDE: Graphical User Interface Data for Execution

    Authors: Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

    Abstract: In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, t… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 8 figures, 3 Tables and 1 Algorithm

  7. arXiv:2404.13693  [pdf, other

    eess.IV cs.CV

    PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images

    Authors: Abhishek Jha, Yogesh Rawat, Shruti Vyas

    Abstract: Photovoltaic (PV) systems allow us to tap into all abundant solar energy, however they require regular maintenance for high efficiency and to prevent degradation. Traditional manual health check, using Electroluminescence (EL) imaging, is expensive and logistically challenging making automated defect detection essential. Current automation approaches require extensive manual expert labeling, which… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  8. arXiv:2404.05366  [pdf, other

    cs.CV

    CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery

    Authors: Sai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee

    Abstract: In Generalized Category Discovery (GCD), we cluster unlabeled samples of known and novel classes, leveraging a training dataset of known classes. A salient challenge arises due to domain shifts between these datasets. To address this, we present a novel setting: Across Domain Generalized Category Discovery (AD-GCD) and bring forth CDAD-NET (Class Discoverer Across Domains) as a remedy. CDAD-NET is… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted in L3D-IVU, CVPR Workshop, 2024

  9. arXiv:2404.00710  [pdf, other

    cs.CV

    Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

    Authors: Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee

    Abstract: We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harne… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024

  10. arXiv:2403.08773  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Veagle: Advancements in Multimodal Representation Learning

    Authors: Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

    Abstract: Lately, researchers in artificial intelligence have been really interested in how language and vision come together, giving rise to the development of multimodal models that aim to seamlessly integrate textual and visual information. Multimodal models, an extension of Large Language Models (LLMs), have exhibited remarkable capabilities in addressing a diverse array of tasks, ranging from image cap… ▽ More

    Submitted 18 January, 2024; originally announced March 2024.

  11. arXiv:2403.00788  [pdf

    cs.CL cs.AI cs.HC cs.LG

    PRECISE Framework: GPT-based Text For Improved Readability, Reliability, and Understandability of Radiology Reports For Patient-Centered Care

    Authors: Satvik Tripathi, Liam Mutter, Meghana Muppuri, Suhani Dheer, Emiliano Garza-Frias, Komal Awan, Aakash Jha, Michael Dezube, Azadeh Tabari, Christopher P. Bridge, Dania Daye

    Abstract: This study introduces and evaluates the PRECISE framework, utilizing OpenAI's GPT-4 to enhance patient engagement by providing clearer and more accessible chest X-ray reports at a sixth-grade reading level. The framework was tested on 500 reports, demonstrating significant improvements in readability, reliability, and understandability. Statistical analyses confirmed the effectiveness of the PRECI… ▽ More

    Submitted 19 February, 2024; originally announced March 2024.

  12. arXiv:2402.14957  [pdf, other

    cs.CV cs.LG

    The Common Stability Mechanism behind most Self-Supervised Learning Approaches

    Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

    Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

  13. arXiv:2402.08697  [pdf, other

    eess.IV cs.CV

    Weakly Supervised Detection of Pheochromocytomas and Paragangliomas in CT

    Authors: David C. Oluigboa, Bikash Santra, Tejas Sudharshan Mathai, Pritam Mukherjee, Jianfei Liu, Abhishek Jha, Mayank Patel, Karel Pacak, Ronald M. Summers

    Abstract: Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors which have the potential to metastasize. For the management of patients with PPGLs, CT is the preferred modality of choice for precise localization and estimation of their progression. However, due to the myriad variations in size, morphology, and appearance of the tumors in different anatomical regions, radiolo… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted at SPIE 2024. arXiv admin note: text overlap with arXiv:2402.00175

  14. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  15. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  16. arXiv:2401.13156  [pdf, other

    quant-ph cs.SC

    Local Hamiltonian decomposition and classical simulation of parametrized quantum circuits

    Authors: Bibhas Adhikari, Aryan Jha

    Abstract: In this paper we develop a classical algorithm of complexity $O(K \, 2^n)$ to simulate parametrized quantum circuits (PQCs) of $n$ qubits, where $K$ is the total number of one-qubit and two-qubit control gates. The algorithm is developed by finding $2$-sparse unitary matrices of order $2^n$ explicitly corresponding to any single-qubit and two-qubit control gates in an $n$-qubit system. Finally, we… ▽ More

    Submitted 31 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  17. arXiv:2401.06310  [pdf, other

    cs.CV cs.CL cs.CY

    ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

    Authors: Akshita Jha, Vinodkumar Prabhakaran, Remi Denton, Sarah Laszlo, Shachi Dave, Rida Qadri, Chandan K. Reddy, Sunipa Dev

    Abstract: Recent studies have shown that Text-to-Image (T2I) model generations can reflect social stereotypes present in the real world. However, existing approaches for evaluating stereotypes have a noticeable lack of coverage of global identity groups and their associated stereotypes. To address this gap, we introduce the ViSAGe (Visual Stereotypes Around the Globe) dataset to enable the evaluation of kno… ▽ More

    Submitted 14 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Association for Computational Linguistics (ACL) 2024

  18. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Project Page: https://paloma.allen.ai/

  19. arXiv:2311.15812  [pdf, other

    cs.CV

    C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing

    Authors: Avigyan Bhattacharya, Mainak Singha, Ankit Jha, Biplab Banerjee

    Abstract: We focus on domain and class generalization problems in analyzing optical remote sensing images, using the large-scale pre-trained vision-language model (VLM), CLIP. While contrastively trained VLMs show impressive zero-shot generalization performance, their effectiveness is limited when dealing with diverse domains during training and testing. Existing prompt learning techniques overlook the impo… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted in ACM ICVGIP 2023

  20. arXiv:2311.13133  [pdf, other

    cs.LG cs.AI cs.CL

    LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

    Authors: Aditi Jha, Sam Havens, Jeremy Dohmann, Alex Trott, Jacob Portes

    Abstract: Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning sampl… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 36 pages, 12 figures, NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following

  21. arXiv:2311.02599  [pdf, other

    cs.CV

    Learning Class and Domain Augmentations for Single-Source Open-Domain Generalization

    Authors: Prathmesh Bele, Valay Bundele, Avigyan Bhattacharya, Ankit Jha, Gemma Roig, Biplab Banerjee

    Abstract: Single-source open-domain generalization (SS-ODG) addresses the challenge of labeled source domains with supervision during training and unlabeled novel target domains during testing. The target domain includes both known classes from the source domain and samples from previously unseen classes. Existing techniques for SS-ODG primarily focus on calibrating source-domain classifiers to identify ope… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 11 pages, WACV 2024

  22. arXiv:2309.13470  [pdf, other

    cs.CV

    HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

    Authors: Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee

    Abstract: Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime,… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: 8 Page, 2 Figures, 2 Tables, Accepted in Adapting to Change: Reliable Multimodal Learning Across Domains Workshop, ECML PKDD 2023

  23. arXiv:2309.05127  [pdf, other

    cs.IR

    Learning Personalized User Preference from Cold Start in Multi-turn Conversations

    Authors: Deguang Kong, Abhay Jha, Lei Yun

    Abstract: This paper presents a novel teachable conversation interaction system that is capable of learning users preferences from cold start by gradually adapting to personal preferences. In particular, the TAI system is able to automatically identify and label user preference in live interactions, manage dialogue flows for interactive teaching sessions, and reuse learned preference for preference elicitat… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: preference, personalization, cold-start, dialogue, LLM. embedding

  24. arXiv:2308.11605  [pdf, other

    cs.CV

    GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning

    Authors: Mainak Singha, Ankit Jha, Biplab Banerjee

    Abstract: Large-scale foundation models, such as CLIP, have demonstrated remarkable success in visual recognition tasks by embedding images in a semantically rich space. Self-supervised learning (SSL) has also shown promise in improving visual recognition by learning invariant features. However, the combination of CLIP with SSL is found to face challenges due to the multi-task framework that blends CLIP's c… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted at BMVC 2023

  25. arXiv:2308.05659  [pdf, other

    cs.CV

    AD-CLIP: Adapting Domains in Prompt Space Using CLIP

    Authors: Mainak Singha, Harsh Pal, Ankit Jha, Biplab Banerjee

    Abstract: Although deep learning models have shown impressive performance on supervised learning tasks, they often struggle to generalize well when the training (source) and test (target) domains differ. Unsupervised domain adaptation (DA) has emerged as a popular solution to this problem. However, current DA techniques rely on visual backbones, which may lack semantic richness. Despite the potential of lar… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: 10 pages, 8 figures, 4 tables. Accepted at OOD-CV, ICCV Workshop, 2023

  26. arXiv:2307.10506  [pdf, other

    eess.IV cs.CV cs.CY

    Is Grad-CAM Explainable in Medical Images?

    Authors: Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh

    Abstract: Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making proc… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  27. arXiv:2306.09822  [pdf, other

    cs.CV math.NA

    Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition

    Authors: Ashish Jha, Dimitrii Ermilov, Konstantin Sobolev, Anh Huy Phan, Salman Ahmadi-Asl, Naveed Ahmed, Imran Junejo, Zaher AL Aghbari, Thar Baker, Ahmed Mohamed Khedr, Andrzej Cichocki

    Abstract: Pedestrian Attribute Recognition (PAR) deals with the problem of identifying features in a pedestrian image. It has found interesting applications in person retrieval, suspect re-identification and soft biometrics. In the past few years, several Deep Neural Networks (DNNs) have been designed to solve the task; however, the developed DNNs predominantly suffer from over-parameterization and high com… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  28. arXiv:2306.04249  [pdf, other

    physics.med-ph cs.CV eess.IV

    DEMIST: A deep-learning-based task-specific denoising approach for myocardial perfusion SPECT

    Authors: Md Ashequr Rahman, Zitong Yu, Richard Laforest, Craig K. Abbey, Barry A. Siegel, Abhinav K. Jha

    Abstract: There is an important need for methods to process myocardial perfusion imaging (MPI) SPECT images acquired at lower radiation dose and/or acquisition time such that the processed images improve observer performance on the clinical task of detecting perfusion defects. To address this need, we build upon concepts from model-observer theory and our understanding of the human visual system to propose… ▽ More

    Submitted 25 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

  29. arXiv:2306.03577  [pdf, other

    cs.CV

    An Open Patch Generator based Fingerprint Presentation Attack Detection using Generative Adversarial Network

    Authors: Anuj Rai, Ashutosh Anshul, Ashwini Jha, Prayag Jain, Ramprakash Sharma, Somnath Dey

    Abstract: The low-cost, user-friendly, and convenient nature of Automatic Fingerprint Recognition Systems (AFRS) makes them suitable for a wide range of applications. This spreading use of AFRS also makes them vulnerable to various security threats. Presentation Attack (PA) or spoofing is one of the threats which is caused by presenting a spoof of a genuine fingerprint to the sensor of AFRS. Fingerprint Pre… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

  30. arXiv:2305.14864  [pdf, other

    cs.CL

    Just CHOP: Embarrassingly Simple LLM Compression

    Authors: Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy

    Abstract: Large language models (LLMs) enable unparalleled few- and zero-shot reasoning capabilities but at a high computational footprint. A growing assortment of methods for compression promises to reduce the computational burden of LLMs in deployment, but so far, only quantization approaches have been demonstrated to be effective for LLM compression while maintaining zero-shot performance. A critical ste… ▽ More

    Submitted 9 July, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 13 pages, 6 figures, 6 tables

  31. arXiv:2305.11840  [pdf, other

    cs.CL cs.CY

    SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

    Authors: Akshita Jha, Aida Davani, Chandan K. Reddy, Shachi Dave, Vinodkumar Prabhakaran, Sunipa Dev

    Abstract: Stereotype benchmark datasets are crucial to detect and mitigate social stereotypes about groups of people in NLP models. However, existing datasets are limited in size and coverage, and are largely restricted to stereotypes prevalent in the Western society. This is especially problematic as language technologies gain hold across the globe. To address this gap, we present SeeGULL, a broad-coverage… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  32. arXiv:2305.11715  [pdf

    eess.IV cs.CV physics.med-ph

    A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

    Authors: Xiyao Jin, Yao Hao, Jessica Hilliard, Zhehao Zhang, Maria A. Thomas, Hua Li, Abhinav K. Jha, Geoffrey D. Hugo

    Abstract: To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  33. arXiv:2304.13506  [pdf, other

    cs.DC cs.CV

    Cloud-Based Deep Learning: End-To-End Full-Stack Handwritten Digit Recognition

    Authors: Ruida Zeng, Aadarsh Jha, Ashwin Kumar, Terry Luo

    Abstract: Herein, we present Stratus, an end-to-end full-stack deep learning application deployed on the cloud. The rise of productionized deep learning necessitates infrastructure in the cloud that can provide such service (IaaS). In this paper, we explore the use of modern cloud infrastructure and micro-services to deliver accurate and high-speed predictions to an end-user, using a Deep Neural Network (DN… ▽ More

    Submitted 1 February, 2023; originally announced April 2023.

  34. arXiv:2304.05995  [pdf, other

    cs.CV

    APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP

    Authors: Mainak Singha, Ankit Jha, Bhupendra Solanki, Shirsha Bose, Biplab Banerjee

    Abstract: In recent years, the success of large-scale vision-language models (VLMs) such as CLIP has led to their increased usage in various computer vision tasks. These models enable zero-shot inference through carefully crafted instructional text prompts without task-specific supervision. However, the potential of VLMs for generalization tasks in remote sensing (RS) has not been fully realized. To address… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 11 Pages, 6 figures, 8 tables, Accepted in Earth Vision (CVPR 2023)

  35. arXiv:2303.02110  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Need for Objective Task-based Evaluation of Deep Learning-Based Denoising Methods: A Study in the Context of Myocardial Perfusion SPECT

    Authors: Zitong Yu, Md Ashequr Rahman, Richard Laforest, Thomas H. Schindler, Robert J. Gropler, Richard L. Wahl, Barry A. Siegel, Abhinav K. Jha

    Abstract: Artificial intelligence-based methods have generated substantial interest in nuclear medicine. An area of significant interest has been using deep-learning (DL)-based approaches for denoising images acquired with lower doses, shorter acquisition times, or both. Objective evaluation of these approaches is essential for clinical application. DL-based approaches for denoising nuclear-medicine images… ▽ More

    Submitted 1 April, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  36. arXiv:2303.00212  [pdf, other

    eess.IV cs.CV physics.med-ph

    A task-specific deep-learning-based denoising approach for myocardial perfusion SPECT

    Authors: Md Ashequr Rahman, Zitong Yu, Barry A. Siegel, Abhinav K. Jha

    Abstract: Deep-learning (DL)-based methods have shown significant promise in denoising myocardial perfusion SPECT images acquired at low dose. For clinical application of these methods, evaluation on clinical tasks is crucial. Typically, these methods are designed to minimize some fidelity-based criterion between the predicted denoised image and some reference normal-dose image. However, while promising, st… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  37. arXiv:2302.09251  [pdf, other

    cs.CV

    StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

    Authors: Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee

    Abstract: Large-scale foundation models, such as CLIP, have demonstrated impressive zero-shot generalization performance on downstream tasks, leveraging well-designed language prompts. However, these prompt learning techniques often struggle with domain shift, limiting their generalization capabilities. In our study, we tackle this issue by proposing StyLIP, a novel approach for Domain Generalization (DG) t… ▽ More

    Submitted 28 November, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: 23 pages,5 figures, 7 tables, Accepted in WACV 2024

  38. arXiv:2302.03765  [pdf, other

    cs.CL

    Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

    Authors: Akshita Jha, Adithya Samavedhi, Vineeth Rakesh, Jaideep Chandrashekar, Chandan K. Reddy

    Abstract: Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance gain provided by transformer-based models comes at a steep cost - both in terms of the required training time and the resource (memory and energy) consumption.… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

  39. arXiv:2211.03783  [pdf

    physics.med-ph cs.AI

    Issues and Challenges in Applications of Artificial Intelligence to Nuclear Medicine -- The Bethesda Report (AI Summit 2022)

    Authors: Arman Rahmim, Tyler J. Bradshaw, Irène Buvat, Joyita Dutta, Abhinav K. Jha, Paul E. Kinahan, Quanzheng Li, Chi Liu, Melissa D. McCradden, Babak Saboury, Eliot Siegel, John J. Sunderland, Richard L. Wahl

    Abstract: The SNMMI Artificial Intelligence (SNMMI-AI) Summit, organized by the SNMMI AI Task Force, took place in Bethesda, MD on March 21-22, 2022. It brought together various community members and stakeholders from academia, healthcare, industry, patient representatives, and government (NIH, FDA), and considered various key themes to envision and facilitate a bright future for routine, trustworthy use of… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  40. arXiv:2209.13418  [pdf, other

    cs.CV cs.RO

    UAV-based Visual Remote Sensing for Automated Building Inspection

    Authors: Kushagra Srivastava, Dhruv Patel, Aditya Kumar Jha, Mohhit Kumar Jha, Jaskirat Singh, Ravi Kiran Sarvadevabhatla, Pradeep Kumar Ramancharla, Harikumar Kandath, K. Madhava Krishna

    Abstract: Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the co… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: Paper accepted at CVCIE Workshop at ECCV, 2022 and the project page is https://uvrsabi.github.io/

  41. arXiv:2208.14357  [pdf, other

    cs.CV

    Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

    Authors: Tianyuan Yao, Chang Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, Shunxing Bao, Mengyang Zhao, Agnes B. Fogo, Bennett A. Landman, Haichun Yang, Catie Chang, Yuankai Huo

    Abstract: With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:025.html. arXiv admin note: substantial text overlap with arXiv:2107.08650

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 1 (2022)

  42. [Re] Differentiable Spatial Planning using Transformers

    Authors: Rohit Ranjan, Himadri Bhakta, Animesh Jha, Parv Maheshwari, Debashish Chakravarty

    Abstract: This report covers our reproduction effort of the paper 'Differentiable Spatial Planning using Transformers' by Chaplot et al. . In this paper, the problem of spatial path planning in a differentiable way is considered. They show that their proposed method of using Spatial Planning Transformers outperforms prior data-driven models and leverages differentiable structures to learn mapping without a… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

    Journal ref: ReScience C 8.2 (#34) 2022

  43. Reproducibility Report: Contrastive Learning of Socially-aware Motion Representations

    Authors: Roopsa Sen, Sidharth Sinha, Parv Maheshwari, Animesh Jha, Debashish Chakravarty

    Abstract: The following paper is a reproducibility report for "Social NCE: Contrastive Learning of Socially-aware Motion Representations" {\cite{liu2020snce}} published in ICCV 2021 as part of the ML Reproducibility Challenge 2021. The original code was made available by the author \footnote{\href{https://github.com/vita-epfl/social-nce}{https://github.com/vita-epfl/social-nce}}. We attempted to verify the… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Journal ref: RescienceC 2022

  44. arXiv:2208.04153  [pdf, other

    cs.AI

    [Reproducibility Report] Path Planning using Neural A* Search

    Authors: Shreya Bhatt, Aayush Jain, Parv Maheshwari, Animesh Jha, Debashish Chakravarty

    Abstract: The following paper is a reproducibility report for "Path Planning using Neural A* Search" published in ICML2 2021 as part of the ML Reproducibility Challenge 2021. The original paper proposes the Neural A* planner, and claims it achieves an optimal balance between the reduction of node expansions and path accuracy. We verify this claim by reimplementing the model in a different framework and repr… ▽ More

    Submitted 16 July, 2022; originally announced August 2022.

  45. arXiv:2207.11033  [pdf

    cs.HC cs.CV

    GesSure- A Robust Face-Authentication enabled Dynamic Gesture Recognition GUI Application

    Authors: Ankit Jha, Ishita, Pratham G. Shenwai, Ayush Batra, Siddharth Kotian, Piyush Modi

    Abstract: Using physical interactive devices like mouse and keyboards hinders naturalistic human-machine interaction and increases the probability of surface contact during a pandemic. Existing gesture-recognition systems do not possess user authentication, making them unreliable. Static gestures in current gesture-recognition technology introduce long adaptation periods and reduce user compatibility. Our t… ▽ More

    Submitted 7 September, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: Accepted at International Conference on Artificial Intelligence Advances (AIAD 2022)

    Journal ref: IJCI Conference Proceedings, International Conference on Artificial Intelligence Advances (AIAD 2022)

  46. Characterizing Python Library Migrations

    Authors: Mohayeminul Islam, Ajay Kumar Jha, Ildar Akhmetov, Sarah Nadi

    Abstract: Developers heavily rely on Application Programming Interfaces (APIs) from libraries to build their software. As software evolves, developers may need to replace the used libraries with alternate libraries, a process known as library migration. Doing this manually can be tedious, time-consuming, and prone to errors. Automated migration techniques can help alleviate some of this burden. However, des… ▽ More

    Submitted 29 January, 2024; v1 submitted 3 July, 2022; originally announced July 2022.

  47. arXiv:2206.00123  [pdf, other

    cs.CV

    Glo-In-One: Holistic Glomerular Detection, Segmentation, and Lesion Characterization with Large-scale Web Image Mining

    Authors: Tianyuan Yao, Yuzhe Lu, Jun Long, Aadarsh Jha, Zheyu Zhu, Zuhayr Asad, Haichun Yang, Agnes B. Fogo, Yuankai Huo

    Abstract: The quantitative detection, segmentation, and characterization of glomeruli from high-resolution whole slide imaging (WSI) play essential roles in the computer-assisted diagnosis and scientific research in digital renal pathology. Historically, such comprehensive quantification requires extensive programming skills in order to be able to handle heterogeneous and customized computational tools. To… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

  48. arXiv:2206.00052  [pdf, other

    cs.CL cs.CR

    CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

    Authors: Akshita Jha, Chandan K. Reddy

    Abstract: Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the natural channel of code, i.e., they are primarily concerned with the human understanding of the code. They are not robust to changes in the input and thus, are p… ▽ More

    Submitted 18 April, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: AAAI Conference on Artificial Intelligence (AAAI) 2023

  49. arXiv:2203.03727  [pdf, other

    cs.CV

    Barlow constrained optimization for Visual Question Answering

    Authors: Abhishek Jha, Badri N. Patro, Luc Van Gool, Tinne Tuytelaars

    Abstract: Visual question answering is a vision-and-language multimodal task, that aims at predicting answers given samples from the question and image modalities. Most recent methods focus on learning a good joint embedding space of images and questions, either by improving the interaction between these two modalities, or by making it a more discriminant space. However, how informative this joint space is,… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  50. arXiv:2203.01918  [pdf, other

    physics.med-ph cs.CV eess.IV

    Investigating the limited performance of a deep-learning-based SPECT denoising approach: An observer-study-based characterization

    Authors: Zitong Yu, Md Ashequr Rahman, Abhinav K. Jha

    Abstract: Multiple objective assessment of image-quality-based studies have reported that several deep-learning-based denoising methods show limited performance on signal-detection tasks. Our goal was to investigate the reasons for this limited performance. To achieve this goal, we conducted a task-based characterization of a DL-based denoising approach for individual signal properties. We conducted this st… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.