Skip to main content

Showing 1–50 of 201 results for author: Basu, S

  1. arXiv:2407.09710  [pdf, other

    quant-ph cs.PL

    DisQ: A Markov Decision Process Based Language for Quantum Distributed Systems

    Authors: Le Chang, Saitej Yavvari, Rance Cleaveland, Samik Basu, Liyi Li

    Abstract: The development of quantum computers has reached a great milestone, in spite of restrictions on important quantum resources: the number of qubits being entangled at a single-location quantum computer. Recently, there has been some work to combine single-location quantum computing and quantum networking techniques to develop distributed quantum systems such that large entangled qubit groups can be… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Version 1

  2. arXiv:2406.14657  [pdf, other

    cs.CL cs.AI cs.LG

    OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

    Authors: Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Shwartz-Ziv

    Abstract: We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for Publication to ARGMIN 2024 at ACL2024

  3. arXiv:2406.13683  [pdf, other

    cs.CV cs.AI

    IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning

    Authors: Soumya Suvra Ghosal, Samyadeep Basu, Soheil Feizi, Dinesh Manocha

    Abstract: Image-text contrastive models such as CLIP learn transferable and robust representations for zero-shot transfer to a variety of downstream tasks. However, to obtain strong downstream performances, prompts need to be carefully curated, which can be a tedious engineering task. To address the issue of manual prompt engineering, prompt-tuning is used where a set of contextual vectors are learned by le… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.12824  [pdf, other

    cs.CL cs.AI

    From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries

    Authors: Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh

    Abstract: Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. However, the exact nature of how this approach works isn't clearly understood. In this… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.08624  [pdf, other

    cs.DS cs.DM cs.SI

    A Sublinear Algorithm for Approximate Shortest Paths in Large Networks

    Authors: Sabyasachi Basu, Nadia Kōshima, Talya Eden, Omri Ben-Eliezer, C. Seshadhri

    Abstract: Computing distances and finding shortest paths in massive real-world networks is a fundamental algorithmic task in network analysis. There are two main approaches to solving this task. On one hand are traversal-based algorithms like bidirectional breadth-first search (BiBFS) with no preprocessing step and slow individual distance inquiries. On the other hand are indexing-based approaches, which ma… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.07844  [pdf, other

    cs.CV

    Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models

    Authors: Arman Zarei, Keivan Rezaei, Samyadeep Basu, Mehrdad Saberi, Mazda Moayeri, Priyatham Kattakinda, Soheil Feizi

    Abstract: Recent text-to-image diffusion-based generative models have the stunning ability to generate highly detailed and photo-realistic images and achieve state-of-the-art low FID scores on challenging image generation benchmarks. However, one of the primary failure modes of these text-to-image generative models is in composing attributes, objects, and their associated relationships accurately into an im… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2406.04236  [pdf, other

    cs.CV

    Understanding Information Storage and Transfer in Multi-modal Large Language Models

    Authors: Samyadeep Basu, Martin Grayson, Cecily Morrison, Besmira Nushi, Soheil Feizi, Daniela Massiceti

    Abstract: Understanding the mechanisms of information storage and transfer in Transformer-based models is important for driving model understanding progress. Recent work has studied these mechanisms for Large Language Models (LLMs), revealing insights on how information is stored in a model's parameters and how information flows to and from these parameters in response to specific prompts. However, these st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages

  8. arXiv:2406.01583  [pdf, other

    cs.CV cs.LG

    Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP

    Authors: Sriram Balasubramanian, Samyadeep Basu, Soheil Feizi

    Abstract: Recent works have explored how individual components of the CLIP-ViT model contribute to the final representation by leveraging the shared image-text representation space of CLIP. These components, such as attention heads and MLPs, have been shown to capture distinct image features like shape, color or texture. However, understanding the role of these components in arbitrary vision transformers (V… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 15 figures

    ACM Class: I.5.1

  9. arXiv:2405.18750  [pdf, other

    cs.CV

    T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

    Authors: Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

    Abstract: Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achiev… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://t2v-turbo.github.io/

  10. arXiv:2405.16538  [pdf, other

    cs.CV cs.AI

    Gamified AI Approch for Early Detection of Dementia

    Authors: Paramita Kundu Maji, Soubhik Acharya, Priti Paul, Sanjay Chakraborty, Saikat Basu

    Abstract: This paper aims to develop a new deep learning-inspired gaming approach for early detection of dementia. This research integrates a robust convolutional neural network (CNN)-based model for early dementia detection using health metrics data as well as facial image data through a cognitive assessment-based gaming application. We have collected 1000 data samples of health metrics dataset from Apollo… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 50 Pages, 29 Figures

  11. arXiv:2405.05942  [pdf, other

    cs.DS

    Improved Evolutionary Algorithms for Submodular Maximization with Cost Constraints

    Authors: Yanhui Zhu, Samik Basu, A Pavan

    Abstract: We present an evolutionary algorithm evo-SMC for the problem of Submodular Maximization under Cost constraints (SMC). Our algorithm achieves $1/2$-approximation with a high probability $1-1/n$ within $\mathcal{O}(n^2K_β)$ iterations, where $K_β$ denotes the maximum size of a feasible solution set with cost constraint $β$. To the best of our knowledge, this is the best approximation guarantee offer… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024

  12. arXiv:2405.01008  [pdf, other

    cs.CV

    On Mechanistic Knowledge Localization in Text-to-Image Generative Models

    Authors: Samyadeep Basu, Keivan Rezaei, Priyatham Kattakinda, Ryan Rossi, Cherry Zhao, Vlad Morariu, Varun Manjunatha, Soheil Feizi

    Abstract: Identifying layers within text-to-image models which control visual attributes can facilitate efficient model editing through closed-form updates. Recent work, leveraging causal tracing show that early Stable-Diffusion variants confine knowledge primarily to the first layer of the CLIP text-encoder, while it diffuses throughout the UNet.Extending this framework, we observe that for recent models (… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Appearing in ICML 2024

  13. arXiv:2404.08030  [pdf, other

    cs.CV cs.AI

    Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models

    Authors: Mazda Moayeri, Samyadeep Basu, Sriram Balasubramanian, Priyatham Kattakinda, Atoosa Chengini, Robert Brauneis, Soheil Feizi

    Abstract: Recent text-to-image generative models such as Stable Diffusion are extremely adept at mimicking and generating copyrighted content, raising concerns amongst artists that their unique styles may be improperly copied. Understanding how generative models copy "artistic style" is more complex than duplicating a single image, as style is comprised by a set of elements (or signature) that frequently co… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  14. arXiv:2403.08848  [pdf, other

    eess.IV cs.CV

    FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders

    Authors: Soumen Basu, Mayuna Gupta, Chetan Madan, Pankaj Gupta, Chetan Arora

    Abstract: In recent years, automated Gallbladder Cancer (GBC) detection has gained the attention of researchers. Current state-of-the-art (SOTA) methodologies relying on ultrasound sonography (US) images exhibit limited generalization, emphasizing the need for transformative approaches. We observe that individual US frames may lack sufficient information to capture disease manifestation. This study advocate… ▽ More

    Submitted 29 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: To Appear at CVPR 2024

  15. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  16. arXiv:2402.15413  [pdf, other

    cs.LG

    G-RepsNet: A Fast and General Construction of Equivariant Networks for Arbitrary Matrix Groups

    Authors: Sourya Basu, Suhas Lohit, Matthew Brand

    Abstract: Group equivariance is a strong inductive bias useful in a wide range of deep learning tasks. However, constructing efficient equivariant networks for general groups and domains is difficult. Recent work by Finzi et al. (2021) directly solves the equivariance constraint for arbitrary matrix groups to obtain equivariant MLPs (EMLPs). But this method does not scale well and scaling is crucial in deep… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  17. arXiv:2402.06187  [pdf, other

    cs.LG cs.AI cs.RO

    Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

    Authors: Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

    Abstract: We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at Forty-first International Conference on Machine Learning (ICML 2024)

  18. arXiv:2311.15384  [pdf, other

    stat.ML cs.LG stat.ME

    Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means

    Authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das

    Abstract: Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  19. arXiv:2311.07911  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction-Following Evaluation for Large Language Models

    Authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou

    Abstract: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    MSC Class: 68T50 (Primary) 68T99 (Secondary) ACM Class: I.2.7

  20. arXiv:2310.17120  [pdf, other

    cs.CL cs.AI cs.LG

    Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models

    Authors: Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Hansi Zeng, Soundararajan Srinivasan

    Abstract: Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentat… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to IntelliSys 2023. arXiv admin note: substantial text overlap with arXiv:2211.14954

  21. arXiv:2310.17041  [pdf, other

    cs.CL cs.AI cs.IR

    On Surgical Fine-tuning for Language Encoders

    Authors: Abhilasha Lodha, Gayatri Belapurkar, Saloni Chalkapurkar, Yuanming Tao, Reshmi Ghosh, Samyadeep Basu, Dmitrii Petrov, Soundararajan Srinivasan

    Abstract: Fine-tuning all the layers of a pre-trained neural language encoder (either using all the parameters or using parameter-efficient methods) is often the de-facto way of adapting it to a new task. We show evidence that for different downstream language tasks, fine-tuning only a subset of layers is sufficient to obtain performance that is close to and often better than fine-tuning all the layers in t… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  22. arXiv:2310.13730  [pdf, other

    cs.CV

    Localizing and Editing Knowledge in Text-to-Image Generative Models

    Authors: Samyadeep Basu, Nanxuan Zhao, Vlad Morariu, Soheil Feizi, Varun Manjunatha

    Abstract: Text-to-Image Diffusion Models such as Stable-Diffusion and Imagen have achieved unprecedented quality of photorealism with state-of-the-art FID scores on MS-COCO and other generation benchmarks. Given a caption, image generation requires fine-grained knowledge about attributes such as object structure, style, and viewpoint amongst others. Where does this information reside in text-to-image genera… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 61 pages

  23. arXiv:2310.09675  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Model-Agnostic Multi-Group Equivariant Networks

    Authors: Razan Baltaji, Sourya Basu, Lav R. Varshney

    Abstract: Constructing model-agnostic group equivariant networks, such as equitune (Basu et al., 2023b) and its generalizations (Kim et al., 2023), can be computationally expensive for large product groups. We address this by providing efficient model-agnostic equivariant designs for two related problems: one where the network has multiple inputs each with potentially different groups acting on them, and an… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  24. arXiv:2310.02426  [pdf, other

    cs.CV

    EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

    Authors: Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi

    Abstract: A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark fo… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  25. FragQC: An Efficient Quantum Error Reduction Technique using Quantum Circuit Fragmentation

    Authors: Saikat Basu, Arnav Das, Amit Saha, Amlan Chakrabarti, Susmita Sur-Kolay

    Abstract: Quantum computers must meet extremely stringent qualitative and quantitative requirements on their qubits in order to solve real-life problems. Quantum circuit fragmentation techniques divide a large quantum circuit into a number of sub-circuits that can be executed on the smaller noisy quantum hardware available. However, the process of quantum circuit fragmentation involves finding an ideal cut… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: 30 pages, 9 figures

    Journal ref: Journal of Systems and Software 2024

  26. arXiv:2309.05261  [pdf, other

    cs.CV

    Gall Bladder Cancer Detection from US Images with Only Image Level Labels

    Authors: Soumen Basu, Ashish Papanai, Mayank Gupta, Pankaj Gupta, Chetan Arora

    Abstract: Automated detection of Gallbladder Cancer (GBC) from Ultrasound (US) images is an important problem, which has drawn increased interest from researchers. However, most of these works use difficult-to-acquire information such as bounding box annotations or additional US videos. In this paper, we focus on GBC detection using only image-level labels. Such annotation is usually available based on the… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted at MICCAI 2023

  27. arXiv:2308.13091  [pdf, other

    quant-ph cs.CC

    Quantum Analog of Shannon's Lower Bound Theorem

    Authors: Saugata Basu, Laxmi Parida

    Abstract: Shannon proved that almost all Boolean functions require a circuit of size $Θ(2^n/n)$. We prove a quantum analog of this classical result. Unlike in the classical case the number of quantum circuits of any fixed size that we allow is uncountably infinite. Our main tool is a classical result in real algebraic geometry bounding the number of realizable sign conditions of any finite set of real polyn… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Comments welcome

    MSC Class: Primary 68Q12; Secondary 14P10; 81P68

  28. arXiv:2308.08010  [pdf, other

    cs.LG astro-ph.GA astro-ph.IM astro-ph.SR cs.AI

    GRINN: A Physics-Informed Neural Network for solving hydrodynamic systems in the presence of self-gravity

    Authors: Sayantan Auddy, Ramit Dey, Neal J. Turner, Shantanu Basu

    Abstract: Modeling self-gravitating gas flows is essential to answering many fundamental questions in astrophysics. This spans many topics including planet-forming disks, star-forming clouds, galaxy formation, and the development of large-scale structures in the Universe. However, the nonlinear interaction between gravity and fluid dynamics offers a formidable challenge to solving the resulting time-depende… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  29. ManiVault: A Flexible and Extensible Visual Analytics Framework for High-Dimensional Data

    Authors: Alexander Vieth, Thomas Kroes, Julian Thijssen, Baldur van Lew, Jeroen Eggermont, Soumyadeep Basu, Elmar Eisemann, Anna Vilanova, Thomas Höllt, Boudewijn Lelieveldt

    Abstract: Exploration and analysis of high-dimensional data are important tasks in many fields that produce large and complex data, like the financial sector, systems biology, or cultural heritage. Tailor-made visual analytics software is developed for each specific application, limiting their applicability in other fields. However, as diverse as these fields are, their characteristics and requirements for… ▽ More

    Submitted 7 November, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: 11 pages paper (incl. 2 pages references and acknowledgements), 2 pages supplement

    Journal ref: IEEE Transactions on Visualization and Computer Graphics (Proceedings of IEEE VIS 2023), 30(2), 2024

  30. arXiv:2307.16307  [pdf, other

    cs.AI cs.DB cs.LO

    Representing and Reasoning with Multi-Stakeholder Qualitative Preference Queries

    Authors: Samik Basu, Vasant Honavar, Ganesh Ram Santhanam, Jia Tao

    Abstract: Many decision-making scenarios, e.g., public policy, healthcare, business, and disaster response, require accommodating the preferences of multiple stakeholders. We offer the first formal treatment of reasoning with multi-stakeholder qualitative preferences in a setting where stakeholders express their preferences in a qualitative preference language, e.g., CP-net, CI-net, TCP-net, CP-Theory. We i… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: A shorter version is published in the proceeding of 26th European Conference on Artificial Intelligence ECAI 2023

  31. arXiv:2307.09233  [pdf, other

    cs.CV

    Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP

    Authors: Samyadeep Basu, Shell Xu Hu, Maziar Sanjabi, Daniela Massiceti, Soheil Feizi

    Abstract: Image-text contrastive models like CLIP have wide applications in zero-shot classification, image-text retrieval, and transfer learning. However, they often struggle on compositional visio-linguistic tasks (e.g., attribute-binding or object-relationships) where their performance is no better than random chance. To address this, we introduce SDS-CLIP, a lightweight and sample-efficient distillation… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Short paper

  32. arXiv:2307.08680  [pdf, ps, other

    cs.IT

    Optimal storage codes on graphs with fixed locality

    Authors: Sabyasachi Basu, Manuj Mukherjee

    Abstract: Storage codes on graphs are an instance of \emph{codes with locality}, which are used in distributed storage schemes to provide local repairability. Specifically, the nodes of the graph correspond to storage servers, and the neighbourhood of each server constitute the set of servers it can query to repair its stored data in the event of a failure. A storage code on a graph with $n$-vertices is a s… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  33. arXiv:2307.07843  [pdf, other

    cs.LG cs.CL

    Transformers are Universal Predictors

    Authors: Sourya Basu, Moulik Choraria, Lav R. Varshney

    Abstract: We find limits to the Transformer architecture for language modeling and show it has a universal prediction property in an information-theoretic sense. We further analyze performance in non-asymptotic data regimes to understand the role of various components of the Transformer architecture, especially in the context of data-efficient training. We validate our theoretical analysis with experiments… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Neural Compression Workshop (ICML 2023)

  34. arXiv:2306.15926  [pdf

    cs.CL cs.AI cs.LG

    Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio

    Authors: Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy

    Abstract: Despite rapid advancement in the field of Constrained Natural Language Generation, little time has been spent on exploring the potential of language models which have had their vocabularies lexically, semantically, and/or phonetically constrained. We find that most language models generate compelling text even under significant constraints. We present a simple and universally applicable technique… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Published in the proceedings of the 2nd Workshop on When Creative AI Meets Conversational AI (CAI2), COLING 2022, 6 pages, System Demonstration Paper

  35. arXiv:2306.02514  [pdf, other

    cs.CL

    Jambu: A historical linguistic database for South Asian languages

    Authors: Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala

    Abstract: We introduce Jambu, a cognate database of South Asian languages which unifies dozens of previous sources in a structured and accessible format. The database includes 287k lemmata from 602 lects, grouped together in 23k sets of cognates. We outline the data wrangling necessary to compile the dataset and train neural models for reflex prediction on the Indo-Aryan subset of the data. We hope that Jam… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: 5 pages main text, 10 pages total. To appear at SIGMORPHON

  36. arXiv:2305.18373  [pdf, other

    cs.CV cs.CL

    KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

    Authors: Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

    Abstract: Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  37. arXiv:2305.15393  [pdf, other

    cs.CV cs.AI

    LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

    Authors: Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

    Abstract: Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual genera… ▽ More

    Submitted 28 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  38. arXiv:2305.10722  [pdf, other

    cs.CV

    Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

    Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

    Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To an… ▽ More

    Submitted 24 April, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  39. arXiv:2305.09900  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Efficient Equivariant Transfer Learning from Pretrained Models

    Authors: Sourya Basu, Pulkit Katdare, Prasanna Sattigeri, Vijil Chenthamarakshan, Katherine Driggs-Campbell, Payel Das, Lav R. Varshney

    Abstract: Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et… ▽ More

    Submitted 10 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Journal ref: NeurIPS 2023

  40. arXiv:2304.12325  [pdf

    q-bio.QM cs.AI cs.LG physics.bio-ph

    Dependence of Physiochemical Features on Marine Chlorophyll Analysis with Learning Techniques

    Authors: Subhrangshu Adhikary, Sudhir Kumar Chaturvedi, Saikat Banerjee, Sourav Basu

    Abstract: Marine chlorophyll which is present within phytoplankton are the basis of photosynthesis and they have a high significance in sustaining ecological balance as they highly contribute toward global primary productivity and comes under the food chain of many marine organisms. Imbalance in the concentrations of phytoplankton can disrupt the ecological balance. The growth of phytoplankton depends upon… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Advances in Environment Engineering and Management. Year 2021. Springer Proceedings in Earth and Environmental Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-79065-3_29

    Journal ref: Advances in Environment Engineering and Management. Year 2021. Springer Proceedings in Earth and Environmental Sciences. Springer, Cham

  41. arXiv:2304.12177  [pdf, other

    physics.ao-ph cs.LG

    Π-ML: A dimensional analysis-based machine learning parameterization of optical turbulence in the atmospheric surface layer

    Authors: Maximilian Pierzyna, Rudolf Saathof, Sukanta Basu

    Abstract: Turbulent fluctuations of the atmospheric refraction index, so-called optical turbulence, can significantly distort propagating laser beams. Therefore, modeling the strength of these fluctuations ($C_n^2$) is highly relevant for the successful development and deployment of future free-space optical communication links. In this letter, we propose a physics-informed machine learning (ML) methodology… ▽ More

    Submitted 10 August, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

  42. Automatized marine vessel monitoring from sentinel-1 data using convolution neural network

    Authors: Surya Prakash Tiwari, Sudhir Kumar Chaturvedi, Subhrangshu Adhikary, Saikat Banerjee, Sourav Basu

    Abstract: The advancement of multi-channel synthetic aperture radar (SAR) system is considered as an upgraded technology for surveillance activities. SAR sensors onboard provide data for coastal ocean surveillance and a view of the oceanic surface features. Vessel monitoring has earlier been performed using Constant False Alarm Rate (CFAR) algorithm which is not a smart technique as it lacks decision-making… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS

    Journal ref: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 2021, pp. 1311-1314

  43. arXiv:2304.01917  [pdf, other

    cs.CV

    Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

    Authors: Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi

    Abstract: Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase on a set of base classes. Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC. Fine-tuning ViTs, however, is expensive in time, compute and storage. This has motivated the d… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  44. arXiv:2301.00512  [pdf, other

    cs.LG

    On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

    Authors: Sumana Basu, Marc-André Legault, Adriana Romero-Soriano, Doina Precup

    Abstract: Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Ma… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023

  45. arXiv:2212.09898  [pdf, other

    cs.CV

    MetaCLUE: Towards Comprehensive Visual Metaphors Research

    Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

    Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, met… ▽ More

    Submitted 2 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in CVPR 2023. Project page: https://metaclue.github.io/ , Video summary: https://youtu.be/V3TmeNETL-o

  46. arXiv:2212.05032  [pdf, other

    cs.CV cs.CL

    Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

    Authors: Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

    Abstract: Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are still considered major challenging issues, especially when involving multiple objects. In this work, we improve the compositional skills of T2I models, s… ▽ More

    Submitted 28 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera Ready version

  47. arXiv:2211.14954  [pdf, other

    cs.CL cs.AI

    Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats

    Authors: Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Soundararajan Srinivasan

    Abstract: Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentat… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 : ENLSP

  48. arXiv:2211.06352  [pdf, other

    cs.SI cs.DM cs.DS

    Spectral Triadic Decompositions of Real-World Networks

    Authors: Sabyasachi Basu, Suman Kalyan Bera, C. Seshadhri

    Abstract: A fundamental problem in mathematics and network analysis is to find conditions under which a graph can be partitioned into smaller pieces. The most important tool for this partitioning is the Fiedler vector or discrete Cheeger inequality. These results relate the graph spectrum (eigenvalues of the normalized adjacency matrix) to the ability to break a graph into two pieces, with few edge deletion… ▽ More

    Submitted 8 May, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

  49. arXiv:2211.04793  [pdf, other

    cs.CV

    RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection

    Authors: Soumen Basu, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora

    Abstract: We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cance… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: To Appear in Elsevier Medical Image Analysis

  50. arXiv:2210.10362  [pdf, other

    cs.CV cs.AI cs.CL

    CPL: Counterfactual Prompt Learning for Vision and Language Models

    Authors: Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

    Abstract: Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a no… ▽ More

    Submitted 4 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.