Skip to main content

Showing 1–50 of 191 results for author: Biswas, S

  1. arXiv:2407.08041  [pdf, other

    cs.CV

    TACLE: Task and Class-aware Exemplar-free Semi-supervised Class Incremental Learning

    Authors: Jayateja Kalla, Rohit Kumar, Soma Biswas

    Abstract: We propose a novel TACLE (TAsk and CLass-awarE) framework to address the relatively unexplored and challenging problem of exemplar-free semi-supervised class incremental learning. In this scenario, at each new task, the model has to learn new classes from both (few) labeled and unlabeled data without access to exemplars from previous classes. In addition to leveraging the capabilities of pre-train… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.06416  [pdf, other

    quant-ph cs.AI cs.CV

    Hybrid Classical-Quantum architecture for vectorised image classification of hand-written sketches

    Authors: Y. Cordero, S. Biswas, F. Vilariño, M. Bilkis

    Abstract: Quantum machine learning (QML) investigates how quantum phenomena can be exploited in order to learn data in an alternative way, \textit{e.g.} by means of a quantum computer. While recent results evidence that QML models can potentially surpass their classical counterparts' performance in specific tasks, quantum technology hardware is still unready to reach quantum advantage in tasks of significan… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2406.14498  [pdf, other

    cs.CL

    LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors

    Authors: Sheikh Asif Imran, Mohammad Nur Hossain Khan, Subrata Biswas, Bashima Islam

    Abstract: Integrating inertial measurement units (IMUs) with large language models (LLMs) advances multimodal AI by enhancing human activity understanding. We introduce SensorCaps, a dataset of 26,288 IMU-derived activity narrations, and OpenSQA, an instruction-following dataset with 257,562 question-answer pairs. Combining LIMU-BERT and Llama, we develop LLaSA, a Large Multimodal Agent capable of interpret… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review at ARR (for EMNLP 2024)

  4. arXiv:2406.13569  [pdf, other

    cs.LG cs.AI cs.CR cs.IT

    Bayes' capacity as a measure for reconstruction attacks in federated learning

    Authors: Sayan Biswas, Mark Dras, Pedro Faustini, Natasha Fernandes, Annabelle McIver, Catuscia Palamidessi, Parastoo Sadeghi

    Abstract: Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the wei… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. Optimal Kernel Orchestration for Tensor Programs with Korch

    Authors: Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai

    Abstract: Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Fix some typos in the ASPLOS version

    Journal ref: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 3 (2024) 755-769

  6. arXiv:2406.08610  [pdf, other

    cs.CV cs.AI cs.LG

    LayeredDoc: Domain Adaptive Document Restoration with a Layer Separation Approach

    Authors: Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Lladós, Ernest Valveny, Sanket Biswas

    Abstract: The rapid evolution of intelligent document processing systems demands robust solutions that adapt to diverse domains without extensive retraining. Traditional methods often falter with variable document types, leading to poor performance. To overcome these limitations, this paper introduces a text-graphic layer separation approach that enhances domain adaptability in document image restoration (D… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ICDAR 2024 (Athens, Greece) Workshop on Automatically Domain-Adapted and Personalized Document Analysis (ADAPDA)

  7. arXiv:2406.08354  [pdf, other

    cs.CV cs.AI cs.LG

    DocSynthv2: A Practical Autoregressive Modeling for Document Generation

    Authors: Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

    Abstract: While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both la… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Spotlight (Oral) Acceptance to CVPR 2024 Workshop for Graphic Design Understanding and Generation (GDUG)

  8. arXiv:2406.08226  [pdf, other

    cs.CV cs.AI cs.LG

    DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

    Authors: Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko, Marie-Francine Moens, Josep Lladós, Sanket Biswas

    Abstract: This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ICDAR 2024 (Athens, Greece)

  9. arXiv:2406.06964  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    Missingness-resilient Video-enhanced Multimodal Disfluency Detection

    Authors: Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu

    Abstract: Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audiovisual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context. Our resilient… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  10. arXiv:2406.02706  [pdf, other

    cs.CV cs.LG

    Window to Wall Ratio Detection using SegFormer

    Authors: Zoe De Simone, Sayandeep Biswas, Oscar Wu

    Abstract: Window to Wall Ratios (WWR) are key to assessing the energy, daylight and ventilation performance of buildings. Studies have shown that window area has a large impact on building performance and simulation. However, data to set up these environmental models and simulations is typically not available. Instead, a standard 40% WWR is typically assumed for all buildings. This paper leverages existing… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  11. arXiv:2406.00481  [pdf, other

    cs.CV

    Effectiveness of Vision Language Models for Open-world Single Image Test Time Adaptation

    Authors: Manogna Sreenivas, Soma Biswas

    Abstract: We propose a novel framework to address the real-world challenging task of Single Image Test Time Adaptation in an open and dynamic environment. We leverage large scale Vision Language Models like CLIP to enable real time adaptation on a per-image basis without access to source data or ground truth labels. Since the deployed model can also encounter unseen classes in an open world, we first employ… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: PrePrint

  12. arXiv:2405.21016  [pdf, other

    cs.CV

    MpoxSLDNet: A Novel CNN Model for Detecting Monkeypox Lesions and Performance Comparison with Pre-trained Models

    Authors: Fatema Jannat Dihan, Saydul Akbar Murad, Abu Jafar Md Muzahid, K. M. Aslam Uddin, Mohammed J. F. Alenazi, Anupam Kumar Bairagi, Sujit Biswas

    Abstract: Monkeypox virus (MPXV) is a zoonotic virus that poses a significant threat to public health, particularly in remote parts of Central and West Africa. Early detection of monkeypox lesions is crucial for effective treatment. However, due to its similarity with other skin diseases, monkeypox lesion detection is a challenging task. To detect monkeypox, many researchers used various deep-learning model… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  13. arXiv:2405.10426  [pdf, other

    cs.LG cs.AI cs.CL

    Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems

    Authors: Pietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan Yıldırım

    Abstract: Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compressio… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: This paper has been selected for publication at the 21st International Conference on Embedded Wireless Systems and Networks (EWSN'24)

  14. arXiv:2405.07708  [pdf, other

    cs.LG

    Secure Aggregation Meets Sparsification in Decentralized Learning

    Authors: Sayan Biswas, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos Vujasinovic

    Abstract: Decentralized learning (DL) faces increased vulnerability to privacy breaches due to sophisticated attacks on machine learning (ML) models. Secure aggregation is a computationally efficient cryptographic technique that enables multiple parties to compute an aggregate of their private data while keeping their individual inputs concealed from each other and from any central aggregator. To enhance co… ▽ More

    Submitted 14 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  15. arXiv:2405.03104  [pdf, other

    cs.CV

    GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding

    Authors: Nil Biescas, Carlos Boned, Josep Lladós, Sanket Biswas

    Abstract: This paper presents GeoContrastNet, a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs), emphasizing the significant role of geometric features. We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework, demonstrat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024 (Athens, Greece)

  16. arXiv:2405.03099  [pdf, other

    cs.CV

    SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition

    Authors: Adarsh Tiwari, Sanket Biswas, Josep Lladós

    Abstract: We present SketchGPT, a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion, and an interpretation case study for sketch recognition. By mapping complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling. SketchGPT leverages the next token prediction ob… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in ICDAR 2024

  17. arXiv:2404.10845  [pdf

    cs.LG cs.NI

    Top-k Multi-Armed Bandit Learning for Content Dissemination in Swarms of Micro-UAVs

    Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas

    Abstract: In communication-deprived disaster scenarios, this paper introduces a Micro-Unmanned Aerial Vehicle (UAV)- enhanced content management system. In the absence of cellular infrastructure, this system deploys a hybrid network of stationary and mobile UAVs to offer vital content access to isolated communities. Static anchor UAVs equipped with both vertical and lateral links cater to local users, while… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 12 pages, 7 figures, 2 algorithms, 1 table. arXiv admin note: substantial text overlap with arXiv:2312.14967

  18. arXiv:2404.10842  [pdf

    cs.SD cs.LG eess.AS

    Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

    Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas

    Abstract: This paper presents a computationally efficient and distributed speaker diarization framework for networked IoT-style audio devices. The work proposes a Federated Learning model which can identify the participants in a conversation without the requirement of a large audio database for training. An unsupervised online update mechanism is proposed for the Federated Learning model which depends on co… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 11 pages, 7 figures, 1 table

  19. arXiv:2404.09536  [pdf, other

    cs.DC cs.AI cs.CR cs.LG

    Beyond Noise: Privacy-Preserving Decentralized Learning with Virtual Nodes

    Authors: Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos

    Abstract: Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training data. Conventional privacy defenses such as differential privacy and secure aggregation fall short in effectively safeguarding user privacy in DL. We introduce Shatter, a novel DL approach in which nod… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  20. arXiv:2404.09432  [pdf, other

    cs.CV cs.AI cs.LG

    The 8th AI City Challenge

    Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

    Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

  21. arXiv:2404.05985  [pdf

    cs.CR cs.LG

    Boosting Digital Safeguards: Blending Cryptography and Steganography

    Authors: Anamitra Maiti, Subham Laha, Rishav Upadhaya, Soumyajit Biswas, Vikas Chaudhary, Biplab Kar, Nikhil Kumar, Jaydip Sen

    Abstract: In today's digital age, the internet is essential for communication and the sharing of information, creating a critical need for sophisticated data security measures to prevent unauthorized access and exploitation. Cryptography encrypts messages into a cipher text that is incomprehensible to unauthorized readers, thus safeguarding data during its transmission. Steganography, on the other hand, ori… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: This report pertains to the Capstone Project done by Group 3 of the Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The reports consists of 36 pages and it includes 11 figures and 5 tables

  22. arXiv:2404.01486  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    QuAD: Query-based Interpretable Neural Motion Planning for Autonomous Driving

    Authors: Sourav Biswas, Sergio Casas, Quinlan Sykora, Ben Agro, Abbas Sadat, Raquel Urtasun

    Abstract: A self-driving vehicle must understand its environment to determine the appropriate action. Traditional autonomy systems rely on object detection to find the agents in the scene. However, object detection assumes a discrete set of objects and loses information about uncertainty, so any errors compound when predicting the future behavior of those agents. Alternatively, dense occupancy grid maps hav… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  23. arXiv:2403.11795  [pdf, other

    cs.LG cs.DC

    Low-Cost Privacy-Aware Decentralized Learning

    Authors: Sayan Biswas, Davide Frey, Romaric Gaudel, Anne-Marie Kermarrec, Dimitri Lerévérend, Rafael Pires, Rishi Sharma, François Taïani

    Abstract: This paper introduces ZIP-DL, a novel privacy-aware decentralized learning (DL) algorithm that exploits correlated noise to provide strong privacy protection against a local adversary while yielding efficient convergence guarantees for a low communication cost. The progressive neutralization of the added noise during the distributed aggregation process results in ZIP-DL fostering a high model accu… ▽ More

    Submitted 25 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  24. arXiv:2403.00129  [pdf, ps, other

    cs.DS

    Average-Case Local Computation Algorithms

    Authors: Amartya Shankha Biswas, Ruidi Cao, Edward Pyne, Ronitt Rubinfeld

    Abstract: We initiate the study of Local Computation Algorithms on average case inputs. In the Local Computation Algorithm (LCA) model, we are given probe access to a huge graph, and asked to answer membership queries about some combinatorial structure on the graph, answering each query with sublinear work. For instance, an LCA for the $k$-spanner problem gives access to a sparse subgraph $H\subseteq G$ t… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 27 pages

  25. arXiv:2402.11401  [pdf, other

    cs.CV cs.LG

    GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constr… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  26. arXiv:2401.01858  [pdf, other

    cs.CV

    Synthetic dataset of ID and Travel Document

    Authors: Carlos Boned, Maxime Talarmain, Nabil Ghanmi, Guillaume Chiron, Sanket Biswas, Ahmad Montaser Awal, Oriol Ramos Terrades

    Abstract: This paper presents a new synthetic dataset of ID and travel documents, called SIDTD. The SIDTD dataset is created to help training and evaluating forged ID documents detection systems. Such a dataset has become a necessity as ID documents contain personal information and a public dataset of real documents can not be released. Moreover, forged documents are scarce, compared to legit ones, and the… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  27. arXiv:2312.14967  [pdf

    cs.NI cs.LG eess.SP

    Multi-Armed Bandit Learning for Content Provisioning in Network of UAVs

    Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas

    Abstract: This paper proposes an unmanned aerial vehicle (UAV) aided content management system in communication-challenged disaster scenarios. Without cellular infrastructure in such scenarios, community of stranded users can be provided access to situation-critical contents using a hybrid network of static and traveling UAVs. A set of relatively static anchor UAVs can download content from central servers… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 7 pages, 5 figures, 1 table and 1 Algorithm

  28. arXiv:2312.12908  [pdf, other

    cs.CV

    The Common Optical Music Recognition Evaluation Framework

    Authors: Pau Torras, Sanket Biswas, Alicia Fornés

    Abstract: The quality of Optical Music Recognition (OMR) systems is a rather difficult magnitude to measure. There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds, since most of them are specialised on certain approaches. As a result, most state-of-the-art works currently report metrics that cannot be compared directly. In this paper we identify the… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 18 pages, 4 figures, 3 tables, submitted (under review) for the International Journal in Document Analysis and Recognition

    ACM Class: I.4.9; J.5

  29. arXiv:2311.16496  [pdf, other

    cs.LG

    DPOD: Domain-Specific Prompt Tuning for Multimodal Fake News Detection

    Authors: Debarshi Brahma, Amartya Bhattacharya, Suraj Nagaje Mahadev, Anmol Asati, Vikas Verma, Soma Biswas

    Abstract: The spread of fake news using out-of-context images has become widespread and is a relevant problem in this era of information overload. Such out-of-context fake news may arise across different domains like politics, sports, entertainment, etc. In practical scenarios, an inherent problem of imbalance exists among news articles from such widely varying domains, resulting in a few domains with abund… ▽ More

    Submitted 12 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  30. arXiv:2311.15290  [pdf

    cs.CR

    Challenges in Blockchain as a Solution for IoT Ecosystem Threats and Access Control: A Survey

    Authors: Suranjeet Chowdhury Avik, Sujit Biswas, Md Atiqur Rahaman Ahad, Zohaib Latif, Abdullah Alghamdi, Hamad Abosaq, Anupam Kumar Bairagi

    Abstract: The Internet of Things (IoT) is increasingly influencing and transforming various aspects of our daily lives. Contrary to popular belief, it raises security and privacy issues as it is used to collect data from consumers or automated systems. Numerous articles are published that discuss issues like centralised control systems and potential alternatives like integration with blockchain. Although a… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  31. arXiv:2311.03486  [pdf, other

    cs.HC

    Fostering Human Learning in Sequential Decision-Making: Understanding the Role of Evaluative Feedback

    Authors: Piyush Gupta, Subir Biswas, Vaibhav Srivastava

    Abstract: Cognitive rehabilitation, STEM (science, technology, engineering, and math) skill acquisition, and coaching games such as chess often require tutoring decision-making strategies. The advancement of AI-driven tutoring systems for facilitating human learning requires an understanding of the impact of evaluative feedback on human decision-making and skill development. To this end, we conduct human ex… ▽ More

    Submitted 4 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  32. arXiv:2311.01227  [pdf, other

    cs.CV

    Robust Feature Learning and Global Variance-Driven Classifier Alignment for Long-Tail Class Incremental Learning

    Authors: Jayateja Kalla, Soma Biswas

    Abstract: This paper introduces a two-stage framework designed to enhance long-tail class incremental learning, enabling the model to progressively learn new classes, while mitigating catastrophic forgetting in the context of long-tailed data distributions. Addressing the challenge posed by the under-representation of tail classes in long-tail class incremental learning, our approach achieves classifier ali… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted in WACV 2024

  33. arXiv:2310.00917  [pdf, other

    cs.CV

    Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

    Authors: Alloy Das, Sanket Biswas, Ayan Banerjee, Josep Lladós, Umapada Pal, Saumik Bhattacharya

    Abstract: The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  34. arXiv:2310.00558  [pdf, other

    cs.CV

    Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

    Authors: Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós

    Abstract: When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system. However, existing state-of-the-art methods employ pretraining and fine-tuning strategies on natural scene datasets, which do not exploit the feature interaction across other complex domains. In this work, we explore and investigate the problem of d… ▽ More

    Submitted 17 February, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024

  35. arXiv:2309.11594  [pdf, other

    cs.RO

    Development of a Feeding Assistive Robot Using a Six Degree of Freedom Robotic Arm

    Authors: Md Esharuzzaman Emu, Samarjith Biswas, Rajendra Shrestha

    Abstract: This project introduces a Feeding Assistive Robot tailored to individuals with physical disabilities, including those with limited arm function or hand control. The core component is a precise 6-degree freedom robotic arm, operated seamlessly through voice commands. Integration of an Arduino-based Braccio Arm, a distance sensor, and Bluetooth module enables voice-controlled movements. The primary… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 5 pages, 6 figures

    MSC Class: 15A22

  36. arXiv:2309.05756  [pdf, other

    cs.CV

    TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

    Authors: Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós

    Abstract: The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a ``pre-train-then-fine-tune'' paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on O… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Preprint to Pattern Recognition

  37. arXiv:2309.00846  [pdf, other

    cs.CV cs.LG

    pSTarC: Pseudo Source Guided Target Clustering for Fully Test-Time Adaptation

    Authors: Manogna Sreenivas, Goirik Chakrabarty, Soma Biswas

    Abstract: Test Time Adaptation (TTA) is a pivotal concept in machine learning, enabling models to perform well in real-world scenarios, where test data distribution differs from training. In this work, we propose a novel approach called pseudo Source guided Target Clustering (pSTarC) addressing the relatively unexplored area of TTA under real-world domain shifts. This method draws inspiration from target cl… ▽ More

    Submitted 22 November, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: Accepted in WACV 2024

  38. arXiv:2309.00416  [pdf, other

    cs.LG cs.CR cs.CY stat.ML

    Advancing Personalized Federated Learning: Group Privacy, Fairness, and Beyond

    Authors: Filippo Galli, Kangsoo Jung, Sayan Biswas, Catuscia Palamidessi, Tommaso Cucinotta

    Abstract: Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner. During training, a set of participating clients process their data stored locally, sharing only the model updates obtained by minimizing a cost function over their local inputs. FL was proposed as a stepping-stone towards privacy-preserving machine learning, but it has been shown… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  39. arXiv:2308.12896  [pdf, other

    cs.CV cs.CL cs.LG

    Beyond Document Page Classification: Design, Datasets, and Challenges

    Authors: Jordy Van Landeghem, Sanket Biswas, Matthew B. Blaschko, Marie-Francine Moens

    Abstract: This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set variety) and in classification tasks considered ($f$: multi-page document, page stream, and document bundle classification, ...). We identify the lack of public mult… ▽ More

    Submitted 31 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 12 pages, accepted at WACV 2024; camera-ready (paper id 1123)

  40. arXiv:2308.09708  [pdf, other

    cs.CV

    Training with Product Digital Twins for AutoRetail Checkout

    Authors: Yue Yao, Xinyu Tian, Zheng Tang, Sujit Biswas, Huan Lei, Tom Gedeon, Liang Zheng

    Abstract: Automating the checkout process is important in smart retail, where users effortlessly pass products by hand through a camera, triggering automatic product detection, tracking, and counting. In this emerging area, due to the lack of annotated training data, we introduce a dataset comprised of product 3D models, which allows for fast, flexible, and large-scale training data generation through graph… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  41. arXiv:2308.08812  [pdf, other

    cs.CV cs.LG

    A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction

    Authors: Sanchar Palit, Sandika Biswas

    Abstract: Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. Furthermore, learning-based methods face the difficulty of creating a comprehensive training dataset for all possible classes. To this end, we propose a continual learning-b… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 15 pages

  42. arXiv:2307.14735  [pdf, other

    cs.CV eess.IV

    Test Time Adaptation for Blind Image Quality Assessment

    Authors: Subhadeep Roy, Shankhanil Mitra, Soma Biswas, Rajiv Soundararajan

    Abstract: While the design of blind image quality assessment (IQA) algorithms has improved significantly, the distribution shift between the training and testing scenarios often leads to a poor performance of these methods at inference time. This motivates the study of test time adaptation (TTA) techniques to improve their performance at inference time. Existing auxiliary tasks and loss functions used for T… ▽ More

    Submitted 26 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV 2023

  43. arXiv:2307.14570  [pdf, other

    cs.CV cs.RO

    Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach

    Authors: Sandika Biswas, Kejie Li, Biplab Banerjee, Subhasis Chaudhuri, Hamid Rezatofighi

    Abstract: Holistic 3D human-scene reconstruction is a crucial and emerging research area in robot perception. A key challenge in holistic 3D human-scene reconstruction is to generate a physically plausible 3D scene from a single monocular RGB image. The existing research mainly proposes optimization-based approaches for reconstructing the scene from a sequence of RGB frames with explicitly defined physical… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted in RAL 2023

  44. arXiv:2307.06109  [pdf, other

    cs.MS

    SpComp: A Sparsity Structure-Specific Compilation of Matrix Operations

    Authors: Barnali Basak, Uday P. Khedker, Supratim Biswas

    Abstract: Sparse matrix operations involve a large number of zero operands which makes most of the operations redundant. The amount of redundancy magnifies when a matrix operation repeatedly executes on sparse data. Optimizing matrix operations for sparsity involves either reorganization of data or reorganization of computations, performed either at compile-time or run-time. Although compile-time techniques… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  45. arXiv:2307.02246  [pdf, other

    cs.CV

    S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning

    Authors: Jayateja Kalla, Soma Biswas

    Abstract: Few-shot class-incremental learning (FSCIL) aims to learn progressively about new classes with very few labeled samples, without forgetting the knowledge of already learnt classes. FSCIL suffers from two major challenges: (i) over-fitting on the new classes due to limited amount of data, (ii) catastrophically forgetting about the old classes due to unavailability of data from these classes in the… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted in ECCV 2022

  46. arXiv:2306.14875  [pdf, other

    eess.IV cs.CV

    A Fully Unsupervised Instance Segmentation Technique for White Blood Cell Images

    Authors: Shrijeet Biswas, Amartya Bhattacharya

    Abstract: White blood cells, also known as leukocytes are group of heterogeneously nucleated cells which act as salient immune system cells. These are originated in the bone marrow and are found in blood, plasma, and lymph tissues. Leukocytes kill the bacteria, virus and other kind of pathogens which invade human body through phagocytosis that in turn results immunity. Detection of a white blood cell count… ▽ More

    Submitted 30 November, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

  47. Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

    Authors: Giang Nguyen, Sumon Biswas, Hridesh Rajan

    Abstract: Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our prop… ▽ More

    Submitted 28 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: In Proceedings of The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023)

  48. arXiv:2305.15901  [pdf, other

    cs.LG

    Consistent Optimal Transport with Empirical Conditional Measures

    Authors: Piyushi Manupriya, Rachit Keerti Das, Sayantan Biswas, Saketha Nath Jagarlapudi

    Abstract: Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation… ▽ More

    Submitted 10 June, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  49. SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

    Authors: Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

    Abstract: Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed qu… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to ICDAR 2023 (San Jose, California)

  50. SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

    Authors: Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal

    Abstract: Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal… ▽ More

    Submitted 20 August, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023)

    Journal ref: ICDAR 2023 (International Conference on Document Analysis and Recognition) Lecture Notes in Computer Science, vol 14187, pp. 342-360. Springer Nature