subscribe to arXiv mailings

Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Authors: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer

Abstract: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more sca… ▽ More Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.07726 [pdf, other]

PaliGemma: A versatile 3B VLM for transfer

Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07128 [pdf, other]

Modularity aided consistent attributed graph clustering via coarsening

Authors: Samarth Bhatia, Yukti Makhija, Manoj Kumar, Sandeep Kumar

Abstract: Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectivel… ▽ More Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities. However, current methods struggle to accurately capture true community structures and intra-cluster relations, be computationally efficient, and identify smaller communities. We address these challenges by integrating coarsening and modularity maximization, effectively leveraging both adjacency and node features to enhance clustering accuracy. We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique, resulting in superior clustering outcomes. The method is theoretically consistent under the Degree-Corrected Stochastic Block Model (DC-SBM), ensuring asymptotic error-free performance and complete label recovery. Our provably convergent and time-efficient algorithm seamlessly integrates with graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance. Extensive experiments on benchmark datasets demonstrate its superiority over existing state-of-the-art methods for both attributed and non-attributed graphs. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: The first two authors contributed equally to this work

arXiv:2407.04664 [pdf, other]

The Degree of Fairness in Efficient House Allocation

Authors: Hadi Hosseini, Medha Kumar, Sanjukta Roy

Abstract: The classic house allocation problem is primarily concerned with finding a matching between a set of agents and a set of houses that guarantees some notion of economic efficiency (e.g. utilitarian welfare). While recent works have shifted focus on achieving fairness (e.g. minimizing the number of envious agents), they often come with notable costs on efficiency notions such as utilitarian or egali… ▽ More The classic house allocation problem is primarily concerned with finding a matching between a set of agents and a set of houses that guarantees some notion of economic efficiency (e.g. utilitarian welfare). While recent works have shifted focus on achieving fairness (e.g. minimizing the number of envious agents), they often come with notable costs on efficiency notions such as utilitarian or egalitarian welfare. We investigate the trade-offs between these welfare measures and several natural fairness measures that rely on the number of envious agents, the total (aggregate) envy of all agents, and maximum total envy of an agent. In particular, by focusing on envy-free allocations, we first show that, should one exist, finding an envy-free allocation with maximum utilitarian or egalitarian welfare is computationally tractable. We highlight a rather stark contrast between utilitarian and egalitarian welfare by showing that finding utilitarian welfare maximizing allocations that minimize the aforementioned fairness measures can be done in polynomial time while their egalitarian counterparts remain intractable (for the most part) even under binary valuations. We complement our theoretical findings by giving insights into the relationship between the different fairness measures and conducting empirical analysis. △ Less

Submitted 5 July, 2024; originally announced July 2024.

ACM Class: F.2.0

arXiv:2407.04651 [pdf, other]

SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images

Authors: Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar

Abstract: We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured… ▽ More We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured in image embeddings. This innovative reformulation greatly reduces the need for time-consuming online user interactions for labeling volumetric images, such as exhaustively marking points and bounding boxes to provide prompts slice by slice. With our method, users can manually segment a few 2D slices offline, and the embeddings of these annotated image regions serve as effective prompts for online segmentation tasks. Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms while keeping the image encoder frozen. Importantly, this approach is not limited to volumetric medical images, but can generically be applied to any 2D/3D segmentation task. To thoroughly evaluate our method, we conducted extensive validation on four datasets, covering six anatomical segmentation tasks across two modalities. Furthermore, we conducted a comparative analysis of different prompting options within SAM and the fully-supervised nnU-Net. The results demonstrate the superior performance of our method compared to SAM employing only point prompts (approximately 50% improvement in IoU) and performs on-par with fully supervised methods whilst reducing the requirement of labeled data by at least an order of magnitude. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 9 pages, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024

ACM Class: I.4.6; I.5.4; I.5.1

arXiv:2407.04335 [pdf, ps, other]

Geometrically Inspired Kernel Machines for Collaborative Learning Beyond Gradient Descent

Authors: Mohit Kumar, Alexander Valentinitsch, Magdalena Fuchs, Mathias Brucker, Juliana Bowles, Adnan Husakovic, Ali Abbas, Bernhard A. Moser

Abstract: This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning proble… ▽ More This paper develops a novel mathematical framework for collaborative learning by means of geometrically inspired kernel machines which includes statements on the bounds of generalisation and approximation errors, and sample complexity. For classification problems, this approach allows us to learn bounded geometric structures around given data points and hence solve the global model learning problem in an efficient way by exploiting convexity properties of the related optimisation problem in a Reproducing Kernel Hilbert Space (RKHS). In this way, we can reduce classification problems to determining the closest bounded geometric structure from a given data point. Further advantages that come with our solution is that our approach does not require clients to perform multiple epochs of local optimisation using stochastic gradient descent, nor require rounds of communication between client/server for optimising the global model. We highlight that numerous experiments have shown that the proposed method is a competitive alternative to the state-of-the-art. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04053 [pdf, other]

Edge AI: A Taxonomy, Systematic Review and Future Directions

Authors: Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, Kejiang Ye, Prabal Verma, Surendra Kumar, Felix Cuadrado, Steve Uhlig

Abstract: Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge… ▽ More Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Preprint Version, 18 Figures

arXiv:2406.14908 [pdf, other]

Can we say a cat is a cat? Understanding the challenges in annotating physiological signal-based emotion data

Authors: Pragya Singh, Mohan Kumar, Pushpendra Singh

Abstract: Artificial Intelligence (AI) algorithms, trained on emotion data extracted from physiological signals, provide a promising approach to monitoring emotions, affect, and mental well-being. However, the field encounters challenges because there is a lack of effective methods for collecting high-quality data in everyday settings that genuinely reflect changes in emotion or affect. This paper presents… ▽ More Artificial Intelligence (AI) algorithms, trained on emotion data extracted from physiological signals, provide a promising approach to monitoring emotions, affect, and mental well-being. However, the field encounters challenges because there is a lack of effective methods for collecting high-quality data in everyday settings that genuinely reflect changes in emotion or affect. This paper presents a position discussion on the current technique of annotating physiological signal-based emotion data. Our discourse underscores the importance of adopting a nuanced understanding of annotation processes, paving the way for a more insightful exploration of the intricate relationship between physiological signals and human emotions. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 7 pages, To be published at PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI, May 11, 2024, Honolulu, HI, USA

arXiv:2406.10288 [pdf, other]

Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models

Authors: Francisco Eiras, Aleksandar Petrov, Phillip H. S. Torr, M. Pawan Kumar, Adel Bibi

Abstract: Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined ta… ▽ More Fine-tuning large language models on small, high-quality datasets can enhance their performance on specific downstream tasks. Recent research shows that fine-tuning on benign, instruction-following data can inadvertently undo the safety alignment process and increase a model's propensity to comply with harmful queries. Although critical, understanding and mitigating safety risks in well-defined tasks remains distinct from the instruction-following context due to structural differences in the data. Our work addresses the gap in our understanding of these risks across diverse types of data in closed models - where providers control how user data is utilized in the fine-tuning process. We demonstrate how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is more effective than existing baselines at re-establishing safety alignment while maintaining similar task performance. △ Less

Submitted 1 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.01698 [pdf, other]

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Authors: Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna

Abstract: Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the… ▽ More Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer . △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 12 Pages, https://github.com/abhibambhaniya/GenZ-LLM-Analyzer

arXiv:2405.18780 [pdf, other]

Quantitative Certification of Bias in Large Language Models

Authors: Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh

Abstract: Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtai… ▽ More Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. A certificate consists of high-confidence bounds on the probability of obtaining biased responses from the LLM for any set of prompts containing sensitive attributes, sampled from a distribution. We illustrate the bias certification in LLMs for prompts with various prefixes drawn from given distributions. We consider distributions of random token sequences, mixtures of manual jailbreaks, and jailbreaks in the LLM's embedding space to certify its bias. We certify popular LLMs with QuaCer-B and present novel insights into their biases. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.15341 [pdf, other]

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

Authors: Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

Abstract: In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting th… ▽ More In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting the potential of existing models to enhance automation levels. To bridge this gap, this paper presents V-Zen, an innovative Multimodal Large Language Model (MLLM) meticulously crafted to revolutionise the domain of GUI understanding and grounding. Equipped with dual-resolution image encoders, V-Zen establishes new benchmarks in efficient grounding and next-action prediction, thereby laying the groundwork for self-operating computer systems. Complementing V-Zen is the GUIDE dataset, an extensive collection of real-world GUI elements and task-based sequences, serving as a catalyst for specialised fine-tuning. The successful integration of V-Zen and GUIDE marks the dawn of a new era in multimodal AI research, opening the door to intelligent, autonomous computing experiences. This paper extends an invitation to the research community to join this exciting journey, shaping the future of GUI automation. In the spirit of open science, our code, data, and model will be made publicly available, paving the way for multimodal dialogue scenarios with intricate and precise interactions. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14857 [pdf, other]

Semantica: An Adaptable Image-Conditioned Diffusion Model

Authors: Manoj Kumar, Neil Houlsby, Emiel Hoogeboom

Abstract: We investigate the task of adapting image generative models to different datasets without finetuneing. To this end, we introduce Semantica, an image-conditioned diffusion model capable of generating images based on the semantics of a conditioning image. Semantica is trained exclusively on web-scale image pairs, that is it receives a random image from a webpage as conditional input and models anoth… ▽ More We investigate the task of adapting image generative models to different datasets without finetuneing. To this end, we introduce Semantica, an image-conditioned diffusion model capable of generating images based on the semantics of a conditioning image. Semantica is trained exclusively on web-scale image pairs, that is it receives a random image from a webpage as conditional input and models another random image from the same webpage. Our experiments highlight the expressivity of pretrained image encoders and necessity of semantic-based data filtering in achieving high-quality image generation. Once trained, it can adaptively generate new images from a dataset by simply using images from that dataset as input. We study the transfer properties of Semantica on ImageNet, LSUN Churches, LSUN Bedroom and SUN397. △ Less

Submitted 10 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.04260 [pdf, other]

Verified Neural Compressed Sensing

Authors: Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Alessandro De Palma, Robert Stanforth

Abstract: We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors.… ▽ More We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on neural network verification has focused on partial specifications that, even when satisfied, are not sufficient to ensure that a neural network never makes errors. We focus on applying neural network verification to computational tasks with a precise notion of correctness, where a verifiably correct neural network provably solves the task at hand with no caveats. In particular, we develop an approach to train and verify the first provably correct neural networks for compressed sensing, i.e., recovering sparse vectors from a number of measurements smaller than the dimension of the vector. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. Furthermore, we show that the complexity of the network (number of neurons/layers) can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.02002 [pdf, ps, other]

Optimizing Robot Dispersion on Grids: with and without Fault Tolerance

Authors: Rik Banerjee, Manish Kumar, Anisur Rahaman Molla

Abstract: The introduction and study of dispersing mobile robots across the nodes of an anonymous graph have recently gained traction and have been explored within various graph classes and settings. While optimal dispersion solution was established for {\em oriented} grids [Kshemkalyani et al., WALCOM 2020], a significant unresolved question pertains to whether achieving optimal dispersion is feasible on a… ▽ More The introduction and study of dispersing mobile robots across the nodes of an anonymous graph have recently gained traction and have been explored within various graph classes and settings. While optimal dispersion solution was established for {\em oriented} grids [Kshemkalyani et al., WALCOM 2020], a significant unresolved question pertains to whether achieving optimal dispersion is feasible on an {\em unoriented} grid. This paper investigates the dispersion problem on unoriented grids, considering both non-faulty and faulty robots. The challenge posed by unoriented grids lies in the absence of a clear sense of direction for a single robot moving between nodes, as opposed to the straightforward navigation of oriented grids. We present three deterministic algorithms tailored to our robot model. The first and second algorithms deal with the dispersion of faulty and non-faulty robots, ensuring both time and memory optimization in oriented and unoriented grids, respectively. Faulty robots that are prone to crashing at any time, causing permanent failure. In both settings, we achieve dispersion in $O(\sqrt{n})$ rounds while requiring $O(\log n)$ bits of memory per robot. The third algorithm tackles faulty robots prone to crash faults in an unoriented grid. In this scenario, our algorithm operates within $O(\sqrt{n} \log n)$ time and uses $O(\sqrt{n} \log n)$ bits of memory per robot. The robots need to know the value of $n$ for termination. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01813 [pdf, other]

doi 10.1145/3555041.3589674

Towards Building Autonomous Data Services on Azure

Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

arXiv:2404.16060 [pdf]

Pocket Schlieren: a background oriented schlieren imaging platform on a smartphone

Authors: Diganta Rabha, Vimod Kumar, Akshay Kumar, Dinesh Saini, Manish Kumar

Abstract: Background-oriented schlieren (BOS) is a powerful technique for flow visualization. Nevertheless, the widespread dissemination of BOS is impeded by its dependence on scientific cameras, computing hardware, and dedicated analysis software. In this work, we aim to democratize BOS by providing a smartphone based scientific tool called "Pocket Schlieren". Pocket Schlieren enables users to directly cap… ▽ More Background-oriented schlieren (BOS) is a powerful technique for flow visualization. Nevertheless, the widespread dissemination of BOS is impeded by its dependence on scientific cameras, computing hardware, and dedicated analysis software. In this work, we aim to democratize BOS by providing a smartphone based scientific tool called "Pocket Schlieren". Pocket Schlieren enables users to directly capture, process, and visualize flow phenomena on their smartphones. The underlying algorithm incorporates consecutive frame subtraction (CFS) and optical flow (OF) techniques to compute the density gradients inside a flow. It performs on both engineered and natural background patterns. Using Pocket Schlieren, we successfully visualized the flow produced from a burning candle flame, butane lighter, hot soldering iron, room heater, water immersion heating rod, and a large outdoor butane flame. Pocket Schlieren promises to serve as a frugal yet potent instrument for scientific and educational purposes. We have made it publicly available at doi: 10.5281/zenodo.10949271. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 24 pages, 6 figures, 4 Supplementary figures

arXiv:2404.16048 [pdf, other]

GUIDE: Graphical User Interface Data for Execution

Authors: Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

Abstract: In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, t… ▽ More In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, the last action taken, CoT and the next action to be performed along with grounding information of where the action needs to be executed. The data is collected using our in-house advanced annotation tool NEXTAG (Next Action Grounding and Annotation Tool). The data is adapted for multiple OS, browsers and display types. It is collected by multiple annotators to capture the variation of design and the way person uses a website. Through this dataset, we aim to facilitate research and development in the realm of LLMs for graphical user interfaces, particularly in tasks related to RPA. The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks. We believe that our dataset will serve as a valuable resource for advancing the capabilities of multi-platform LLMs in practical applications, fostering innovation in the field of automation and natural language understanding. Using GUIDE, we build V-Zen, the first RPA model to automate multiple websites using our in-House Automation tool AUTONODE △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 11 pages, 8 figures, 3 Tables and 1 Algorithm

arXiv:2404.12944 [pdf, other]

doi 10.1145/3656650.3656667

Visualizing Intelligent Tutor Interactions for Responsive Pedagogy

Authors: Grace Guo, Aishwarya Mudgal Sunil Kumar, Adit Gupta, Adam Coscia, Chris MacLellan, Alex Endert

Abstract: Intelligent tutoring systems leverage AI models of expert learning and student knowledge to deliver personalized tutoring to students. While these intelligent tutors have demonstrated improved student learning outcomes, it is still unclear how teachers might integrate them into curriculum and course planning to support responsive pedagogy. In this paper, we conducted a design study with five teach… ▽ More Intelligent tutoring systems leverage AI models of expert learning and student knowledge to deliver personalized tutoring to students. While these intelligent tutors have demonstrated improved student learning outcomes, it is still unclear how teachers might integrate them into curriculum and course planning to support responsive pedagogy. In this paper, we conducted a design study with five teachers who have deployed Apprentice Tutors, an intelligent tutoring platform, in their classes. We characterized their challenges around analyzing student interaction data from intelligent tutoring systems and built VisTA (Visualizations for Tutor Analytics), a visual analytics system that shows detailed provenance data across multiple coordinated views. We evaluated VisTA with the same five teachers, and found that the visualizations helped them better interpret intelligent tutor data, gain insights into student problem-solving provenance, and decide on necessary follow-up actions - such as providing students with further support or reviewing skills in the classroom. Finally, we discuss potential extensions of VisTA into sequence query and detection, as well as the potential for the visualizations to be useful for encouraging self-directed learning in students. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures, ACM AVI 2024

arXiv:2404.07533 [pdf, other]

IITP-VDLand: A Comprehensive Dataset on Decentraland Parcels

Authors: Ankit K. Bhagat, Dipika Jha, Raju Halder, Rajendra N. Paramanik, Chandra M. Kumar

Abstract: This paper presents IITP-VDLand, a comprehensive dataset of Decentraland parcels sourced from diverse platforms. Unlike existing datasets which have limited attributes and records, IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions. Alongside, we introduce a key attribute in the dataset,… ▽ More This paper presents IITP-VDLand, a comprehensive dataset of Decentraland parcels sourced from diverse platforms. Unlike existing datasets which have limited attributes and records, IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions. Alongside, we introduce a key attribute in the dataset, namely Rarity score, which measures the uniqueness of each parcel within the virtual world. Addressing the significant challenge posed by the dispersed nature of this data across various sources, we employ a systematic approach, utilizing both available APIs and custom scripts, to gather it. Subsequently, we meticulously curate and organize the information into four distinct segments: (1) Characteristics Data-Fragment, (2) OpenSea Trading History Data-Fragment, (3) Ethereum Activity Transactions Data-Fragment, and (4) Social Media Data-Fragment. We envisage that this dataset would serve as a robust resource for training machine- and deep-learning models specifically designed to address real-world challenges within the domain of Decentraland parcels. The performance benchmarking of more than 20 state-of-the-art price prediction models on our dataset yields promising results, achieving a maximum R2 score of 0.8251 and an accuracy of 74.23% in case of Extra Trees Regressor and Classifier. The key findings reveal that the ensemble models performs better than both deep learning and linear models for our dataset. We observe a significant impact of coordinates, geographical proximity, rarity score, and few other economic indicators on the prediction of parcel prices. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.17849 [pdf, other]

Multi Agent Pathfinding for Noise Restricted Hybrid Fuel Unmanned Aerial Vehicles

Authors: Drew Scott, Satyanarayana G. Manyam, David W. Casbeer, Manish Kumar, Isaac E. Weintraub

Abstract: Multi Agent Path Finding (MAPF) seeks the optimal set of paths for multiple agents from respective start to goal locations such that no paths conflict. We address the MAPF problem for a fleet of hybrid-fuel unmanned aerial vehicles which are subject to location-dependent noise restrictions. We solve this problem by searching a constraint tree for which the subproblem at each node is a set of short… ▽ More Multi Agent Path Finding (MAPF) seeks the optimal set of paths for multiple agents from respective start to goal locations such that no paths conflict. We address the MAPF problem for a fleet of hybrid-fuel unmanned aerial vehicles which are subject to location-dependent noise restrictions. We solve this problem by searching a constraint tree for which the subproblem at each node is a set of shortest path problems subject to the noise and fuel constraints and conflict zone avoidance. A labeling algorithm is presented to solve this subproblem, including the conflict zones which are treated as dynamic obstacles. We present the experimental results of the algorithms for various graph sizes and number of agents. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 6 pages, 7 figures

arXiv:2403.13716 [pdf, ps, other]

Agent-based Leader Election, MST, and Beyond

Authors: Ajay D. Kshemkalyani, Manish Kumar, Anisur Rahaman Molla, Gokarna Sharma

Abstract: Leader election is one of the fundamental and well-studied problems in distributed computing. In this paper, we initiate the study of leader election using mobile agents. Suppose $n$ agents are positioned initially arbitrarily on the nodes of an arbitrary, anonymous, $n$-node, $m$-edge graph $G$. The agents relocate themselves autonomously on the nodes of $G$ and elect an agent as a leader such th… ▽ More Leader election is one of the fundamental and well-studied problems in distributed computing. In this paper, we initiate the study of leader election using mobile agents. Suppose $n$ agents are positioned initially arbitrarily on the nodes of an arbitrary, anonymous, $n$-node, $m$-edge graph $G$. The agents relocate themselves autonomously on the nodes of $G$ and elect an agent as a leader such that the leader agent knows it is a leader and the other agents know they are not leaders. The objective is to minimize time and memory requirements. Following the literature, we consider the synchronous setting in which each agent performs its operations synchronously with others and hence the time complexity can be measured in rounds. The quest in this paper is to provide solutions without agents knowing any graph parameter, such as $n$, a priori. We first establish that, without agents knowing any graph parameter a priori, there exists a deterministic algorithm to elect an agent as a leader in $O(m)$ rounds with $O(n\log n)$ bits at each agent. Using this leader election result, we develop a deterministic algorithm for agents to construct a minimum spanning tree of $G$ in $O(m+n\log n)$ rounds using $O(n \log n)$ bits memory at each agent, without agents knowing any graph parameter a priori. Finally, using the same leader election result, we provide improved time/memory results for other fundamental distributed graph problems, namely, gathering, maximal independent set, and minimal dominating sets, removing the assumptions on agents knowing graph parameters a priori. △ Less

Submitted 22 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: 25 pages

arXiv:2403.12335 [pdf, other]

Temporally-Consistent Koopman Autoencoders for Forecasting Dynamical Systems

Authors: Indranil Nayak, Debdipta Goswami, Mrinal Kumar, Fernando Teixeira

Abstract: Absence of sufficiently high-quality data often poses a key challenge in data-driven modeling of high-dimensional spatio-temporal dynamical systems. Koopman Autoencoders (KAEs) harness the expressivity of deep neural networks (DNNs), the dimension reduction capabilities of autoencoders, and the spectral properties of the Koopman operator to learn a reduced-order feature space with simpler, linear… ▽ More Absence of sufficiently high-quality data often poses a key challenge in data-driven modeling of high-dimensional spatio-temporal dynamical systems. Koopman Autoencoders (KAEs) harness the expressivity of deep neural networks (DNNs), the dimension reduction capabilities of autoencoders, and the spectral properties of the Koopman operator to learn a reduced-order feature space with simpler, linear dynamics. However, the effectiveness of KAEs is hindered by limited and noisy training datasets, leading to poor generalizability. To address this, we introduce the Temporally-Consistent Koopman Autoencoder (tcKAE), designed to generate accurate long-term predictions even with constrained and noisy training data. This is achieved through a consistency regularization term that enforces prediction coherence across different time steps, thus enhancing the robustness and generalizability of tcKAE over existing models. We provide analytical justification for this approach based on Koopman spectral theory and empirically demonstrate tcKAE's superior performance over state-of-the-art KAE models across a variety of test cases, including simple pendulum oscillations, kinetic plasmas, fluid flows, and sea surface temperature data. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10519 [pdf, other]

Frozen Feature Augmentation for Few-Shot Image Classification

Authors: Andreas Bär, Neil Houlsby, Mostafa Dehghani, Manoj Kumar

Abstract: Training a linear classifier or lightweight model on top of pretrained vision model outputs, so-called 'frozen features', leads to impressive performance on a number of downstream few-shot tasks. Currently, frozen features are not modified during training. On the other hand, when networks are trained directly on images, data augmentation is a standard recipe that improves performance with no subst… ▽ More Training a linear classifier or lightweight model on top of pretrained vision model outputs, so-called 'frozen features', leads to impressive performance on a number of downstream few-shot tasks. Currently, frozen features are not modified during training. On the other hand, when networks are trained directly on images, data augmentation is a standard recipe that improves performance with no substantial overhead. In this paper, we conduct an extensive pilot study on few-shot image classification that explores applying data augmentations in the frozen feature space, dubbed 'frozen feature augmentation (FroFA)', covering twenty augmentations in total. Our study demonstrates that adopting a deceptively simple pointwise FroFA, such as brightness, can improve few-shot performance consistently across three network architectures, three large pretraining datasets, and eight transfer datasets. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: CVPR 2024 (18 pages, main paper + supplementary material)

arXiv:2403.01965 [pdf, ps, other]

Towards Deterministic Algorithms for Constant-Depth Factors of Constant-Depth Circuits

Authors: Mrinal Kumar, Varun Ramanathan, Ramprasad Saptharishi, Ben Lee Volk

Abstract: We design a deterministic subexponential time algorithm that takes as input a multivariate polynomial $f$ computed by a constant-depth circuit over rational numbers, and outputs a list $L$ of circuits (of unbounded depth and possibly with division gates) that contains all irreducible factors of $f$ computable by constant-depth circuits. This list $L$ might also include circuits that are spurious:… ▽ More We design a deterministic subexponential time algorithm that takes as input a multivariate polynomial $f$ computed by a constant-depth circuit over rational numbers, and outputs a list $L$ of circuits (of unbounded depth and possibly with division gates) that contains all irreducible factors of $f$ computable by constant-depth circuits. This list $L$ might also include circuits that are spurious: they either do not correspond to factors of $f$ or are not even well-defined, e.g. the input to a division gate is a sub-circuit that computes the identically zero polynomial. The key technical ingredient of our algorithm is a notion of the pseudo-resultant of $f$ and a factor $g$, which serves as a proxy for the resultant of $g$ and $f/g$, with the advantage that the circuit complexity of the pseudo-resultant is comparable to that of the circuit complexity of $f$ and $g$. This notion, which might be of independent interest, together with the recent results of Limaye, Srinivasan and Tavenas, helps us derandomize one key step of multivariate polynomial factorization algorithms - that of deterministically finding a good starting point for Newton Iteration for the case when the input polynomial as well as the irreducible factor of interest have small constant-depth circuits. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.14817 [pdf, other]

Cameras as Rays: Pose Estimation via Ray Diffusion

Authors: Jason Y. Zhang, Amy Lin, Moneish Kumar, Tzu-Hsuan Yang, Deva Ramanan, Shubham Tulsiani

Abstract: Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatia… ▽ More Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views (<10). In contrast to existing approaches that pursue top-down prediction of global parametrizations of camera extrinsics, we propose a distributed representation of camera pose that treats a camera as a bundle of rays. This representation allows for a tight coupling with spatial image features improving pose precision. We observe that this representation is naturally suited for set-level transformers and develop a regression-based approach that maps image patches to corresponding rays. To capture the inherent uncertainties in sparse-view pose inference, we adapt this approach to learn a denoising diffusion model which allows us to sample plausible modes while improving performance. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D while generalizing to unseen object categories and in-the-wild captures. △ Less

Submitted 4 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: In ICLR 2024 (oral). v2-3: updated references. Project webpage: https://jasonyzhang.com/RayDiffusion

arXiv:2311.17841 [pdf, ps, other]

Fast list-decoding of univariate multiplicity and folded Reed-Solomon codes

Authors: Rohan Goyal, Prahladh Harsha, Mrinal Kumar, Ashutosh Shankar

Abstract: We show that the known list-decoding algorithms for univariate multiplicity and folded Reed-Solomon codes can be made to run in $\tilde{O}(n)$ time. Univariate multiplicity codes and FRS codes are natural variants of Reed-Solomon codes that were discovered and studied for their applications to list decoding. It is known that for every $ε>0$, and rate $r \in (0,1)$, there exist explicit families of… ▽ More We show that the known list-decoding algorithms for univariate multiplicity and folded Reed-Solomon codes can be made to run in $\tilde{O}(n)$ time. Univariate multiplicity codes and FRS codes are natural variants of Reed-Solomon codes that were discovered and studied for their applications to list decoding. It is known that for every $ε>0$, and rate $r \in (0,1)$, there exist explicit families of these codes that have rate $r$ and can be list decoded from a $(1-r-ε)$ fraction of errors with constant list size in polynomial time (Guruswami & Wang (IEEE Trans. Inform. Theory 2013) and Kopparty, Ron-Zewi, Saraf & Wootters (SIAM J. Comput. 2023)). In this work, we present randomized algorithms that perform the above list-decoding tasks in $\tilde{O}(n)$, where $n$ is the block-length of the code. Our algorithms have two main components. The first component builds upon the lattice-based approach of Alekhnovich (IEEE Trans. Inf. Theory 2005), who designed a $\tilde{O}(n)$ time list-decoding algorithm for Reed-Solomon codes approaching the Johnson radius. As part of the second component, we design $\tilde{O}(n)$ time algorithms for two natural algebraic problems: given a $(m+2)$-variate polynomial $Q(x,y_0,\dots,y_m) = \tilde{Q}(x) + \sum_{i=0}^m Q_i(x)\cdot y_i$ the first algorithm solves order-$m$ linear differential equations of the form $Q\left(x, f(x), \frac{df}{dx}, \dots,\frac{d^m f}{dx^m}\right) \equiv 0$ while the second solves functional equations of the form $Q\left(x, f(x), f(γx), \dots,f(γ^m x)\right) \equiv 0$, where $m$ is an arbitrary constant and $γ$ is a field element of sufficiently high order. These algorithms can be viewed as generalizations of classical $\tilde{O}(n)$ time algorithms of Sieveking (Computing 1972) and Kung (Numer. Math. 1974) for computing the modular inverse of a power series, and might be of independent interest. △ Less

Submitted 12 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: Modified abstract and included references for HRW and KRRSS. At the time of Version [v1], we were unaware of the nearly-linear-time list-decoders of [HRW] and [KRRSW]. This has been addressed in the subsequent versions

arXiv:2311.14335 [pdf, other]

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

Authors: Usneek Singh, Piyush Arora, Shamika Ganesan, Mohit Kumar, Siddhant Kulkarni, Salil R. Joshi

Abstract: We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pr… ▽ More We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: Accepted at 7th Joint International Conference on Data Science & Management of Data (11th ACMIKDD CODS and 29th COMAD)

arXiv:2311.12752 [pdf, ps, other]

An Improved Line-Point Low-Degree Test

Authors: Prahladh Harsha, Mrinal Kumar, Ramprasad Saptharishi, Madhu Sudan

Abstract: We prove that the most natural low-degree test for polynomials over finite fields is ``robust'' in the high-error regime for linear-sized fields. Specifically we consider the ``local'' agreement of a function $f: \mathbb{F}_q^m \to \mathbb{F}_q$ from the space of degree-$d$ polynomials, i.e., the expected agreement of the function from univariate degree-$d$ polynomials over a randomly chosen line… ▽ More We prove that the most natural low-degree test for polynomials over finite fields is ``robust'' in the high-error regime for linear-sized fields. Specifically we consider the ``local'' agreement of a function $f: \mathbb{F}_q^m \to \mathbb{F}_q$ from the space of degree-$d$ polynomials, i.e., the expected agreement of the function from univariate degree-$d$ polynomials over a randomly chosen line in $\mathbb{F}_q^m$, and prove that if this local agreement is $ε\geq Ω((\frac{d}{q})^τ))$ for some fixed $τ> 0$, then there is a global degree-$d$ polynomial $Q: \mathbb{F}_q^m \to \mathbb{F}_q$ with agreement nearly $ε$ with $f$. This settles a long-standing open question in the area of low-degree testing, yielding an $O(d)$-query robust test in the ``high-error'' regime (i.e., when $ε< \frac{1}{2}$). The previous results in this space either required $ε> \frac{1}{2}$ (Polishchuk \& Spielman, STOC 1994), or $q = Ω(d^4)$ (Arora \& Sudan, Combinatorica 2003), or needed to measure local distance on $2$-dimensional ``planes'' rather than one-dimensional lines leading to $Ω(d^2)$-query complexity (Raz \& Safra, STOC 1997). Our analysis follows the spirit of most previous analyses in first analyzing the low-variable case ($m = O(1)$) and then ``bootstrapping'' to general multivariate settings. Our main technical novelty is a new analysis in the bivariate setting that exploits a previously known connection between multivariate factorization and finding (or testing) low-degree polynomials, in a non ``black-box'' manner. A second contribution is a bootstrapping analysis which manages to lift analyses for $m=2$ directly to analyses for general $m$, where previous works needed to work with $m = 3$ or $m = 4$ -- arguably this bootstrapping is significantly simpler than those in prior works. △ Less

Submitted 21 November, 2023; originally announced November 2023.

MSC Class: 68Q25; 11T06 ACM Class: I.1.2; E.4

arXiv:2311.07888 [pdf, other]

RoboSense At Edge: Detecting Slip, Crumple and Shape of the Object in Robotic Hand for Teleoprations

Authors: Sudev Kumar Padhi, Mohit Kumar, Debanka Giri, Subidh Ali

Abstract: Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic… ▽ More Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic hand. We proposed ML model will detect the slip, crumple, and shape using the force/torque exerted and the angular positions of the actuators present in the RH. The proposed model would be integrated into the loop of a robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency in case of teleoperation △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.08437 [pdf, other]

Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future Directions

Authors: Muhammed Golec, Guneet Kaur Walia, Mohit Kumar, Felix Cuadrado, Sukhpal Singh Gill, Steve Uhlig

Abstract: Recently, academics and the corporate sector have paid attention to serverless computing, which enables dynamic scalability and an economic model. In serverless computing, users pay only for the time they actually spend using the resources. Although zero scaling optimises cost and resource utilisation, it is the fundamental reason for the serverless cold start problem. Various academic and corpora… ▽ More Recently, academics and the corporate sector have paid attention to serverless computing, which enables dynamic scalability and an economic model. In serverless computing, users pay only for the time they actually spend using the resources. Although zero scaling optimises cost and resource utilisation, it is the fundamental reason for the serverless cold start problem. Various academic and corporate sector studies are being conducted to tackle the cold start problem, which has large research challenges. To study the "cold start" problem in serverless computing, this article provides a comprehensive literature overview of recent research. In addition, we present a detailed taxonomy of several approaches to addressing the issue of cold start latency in serverless computing. Several academic and industrial organisations have proposed methods for cutting down the cold start time and cold start frequency, and this taxonomy is being used to explore these methods. There are several categories in which a current study on cold start latency is organised: caching and application-level optimization-based solutions, as well as AI/ML-based solutions. We have analysed the current methods and grouped them into categories based on their commonalities and features. Finally, we conclude with a review of current challenges and possible future research directions. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 34 Pages, 16 Figures

arXiv:2310.08004 [pdf, other]

On the Rational Degree of Boolean Functions and Applications

Authors: Vishnu Iyer, Siddhartha Jain, Matt Kovacs-Deak, Vinayak M. Kumar, Luke Schaeffer, Daochen Wang, Michael Whitmeyer

Abstract: We study a natural complexity measure of Boolean functions known as the (exact) rational degree. For total functions $f$, it is conjectured that $\mathrm{rdeg}(f)$ is polynomially related to $\mathrm{deg}(f)$, where $\mathrm{deg}(f)$ is the Fourier degree. Towards this conjecture, we show that symmetric functions have rational degree at least $\mathrm{deg}(f)/2$ and monotone functions have rationa… ▽ More We study a natural complexity measure of Boolean functions known as the (exact) rational degree. For total functions $f$, it is conjectured that $\mathrm{rdeg}(f)$ is polynomially related to $\mathrm{deg}(f)$, where $\mathrm{deg}(f)$ is the Fourier degree. Towards this conjecture, we show that symmetric functions have rational degree at least $\mathrm{deg}(f)/2$ and monotone functions have rational degree at least $\sqrt{\mathrm{deg}(f)}$. We observe that both of these lower bounds are tight. In addition, we show that all read-once depth-$d$ Boolean formulae have rational degree at least $Ω(\mathrm{deg}(f)^{1/d})$. Furthermore, we show that almost every Boolean function on $n$ variables has rational degree at least $n/2 - O(\sqrt{n})$. In contrast to total functions, we exhibit partial functions that witness unbounded separations between rational and approximate degree, in both directions. As a consequence, we show that for quantum computers, post-selection and bounded-error are incomparable resources in the black-box model. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 17 pages, 3 figures

arXiv:2310.03884 [pdf, other]

Information Geometry for the Working Information Theorist

Authors: Kumar Vijay Mishra, M. Ashok Kumar, Ting-Kam Leonard Wong

Abstract: Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas… ▽ More Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 12 pages, 3 figures, 1 table

arXiv:2309.09701 [pdf, ps, other]

Deterministic Algorithms for Low Degree Factors of Constant Depth Circuits

Authors: Mrinal Kumar, Varun Ramanathan, Ramprasad Saptharishi

Abstract: For every constant $d$, we design a subexponential time deterministic algorithm that takes as input a multivariate polynomial $f$ given as a constant depth algebraic circuit over the field of rational numbers, and outputs all irreducible factors of $f$ of degree at most $d$ together with their respective multiplicities. Moreover, if $f$ is a sparse polynomial, then the algorithm runs in quasipolyn… ▽ More For every constant $d$, we design a subexponential time deterministic algorithm that takes as input a multivariate polynomial $f$ given as a constant depth algebraic circuit over the field of rational numbers, and outputs all irreducible factors of $f$ of degree at most $d$ together with their respective multiplicities. Moreover, if $f$ is a sparse polynomial, then the algorithm runs in quasipolynomial time. Our results are based on a more fine-grained connection between polynomial identity testing (PIT) and polynomial factorization in the context of constant degree factors and rely on a clean connection between divisibility testing of polynomials and PIT due to Forbes and on subexponential time deterministic PIT algorithms for constant depth algebraic circuits from the recent work of Limaye, Srinivasan and Tavenas. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.04504 [pdf, other]

Compositional Learning of Visually-Grounded Concepts Using Reinforcement

Authors: Zijun Lin, Haidi Azaman, M Ganesh Kumar, Cheston Tan

Abstract: Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen combinations remains unclear. Hence, we investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthet… ▽ More Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen combinations remains unclear. Hence, we investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthetic 3D environments. First, we show that when RL agents are naively trained to navigate to target color-shape combinations, they implicitly learn to decompose the combinations, allowing them to (re-)compose these and succeed at held-out test combinations ("compositional learning"). Second, when agents are pretrained to learn invariant shape and color concepts ("concept learning"), the number of episodes subsequently needed for compositional learning decreased by 20 times. Furthermore, only agents trained on both concept and compositional learning could solve a more complex, out-of-distribution environment in zero-shot fashion. Finally, we verified that only text encoders pretrained on image-text datasets (e.g. CLIP) reduced the number of training episodes needed for our agents to demonstrate compositional learning, and also generalized to 5 unseen colors in zero-shot fashion. Overall, our results are the first to demonstrate that RL agents can be trained to implicitly learn concepts and compositionality, to solve more complex environments in zero-shot fashion. △ Less

Submitted 3 May, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.03483 [pdf, other]

DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners

Authors: Clarence Lee, M Ganesh Kumar, Cheston Tan

Abstract: State-of-the-art visual grounding models can achieve high detection accuracy, but they are not designed to distinguish between all objects versus only certain objects of interest. In natural language, in order to specify a particular object or set of objects of interest, humans use determiners such as "my", "either" and "those". Determiners, as an important word class, are a type of schema in natu… ▽ More State-of-the-art visual grounding models can achieve high detection accuracy, but they are not designed to distinguish between all objects versus only certain objects of interest. In natural language, in order to specify a particular object or set of objects of interest, humans use determiners such as "my", "either" and "those". Determiners, as an important word class, are a type of schema in natural language about the reference or quantity of the noun. Existing grounded referencing datasets place much less emphasis on determiners, compared to other word classes such as nouns, verbs and adjectives. This makes it difficult to develop models that understand the full variety and complexity of object referencing. Thus, we have developed and released the DetermiNet dataset , which comprises 250,000 synthetically generated images and captions based on 25 determiners. The task is to predict bounding boxes to identify objects of interest, constrained by the semantics of the given determiner. We find that current state-of-the-art visual grounding models do not perform well on the dataset, highlighting the limitations of existing models on reference and quantification tasks. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 10 pages, 6 figures

arXiv:2309.01252 [pdf, other]

S2RF: Semantically Stylized Radiance Fields

Authors: Dishani Lahiri, Neeraj Panse, Moneish Kumar

Abstract: We present our method for transferring style from any arbitrary image(s) to object(s) within a 3D scene. Our primary objective is to offer more control in 3D scene stylization, facilitating the creation of customizable and stylized scene images from arbitrary viewpoints. To achieve this, we propose a novel approach that incorporates nearest neighborhood-based loss, allowing for flexible 3D scene r… ▽ More We present our method for transferring style from any arbitrary image(s) to object(s) within a 3D scene. Our primary objective is to offer more control in 3D scene stylization, facilitating the creation of customizable and stylized scene images from arbitrary viewpoints. To achieve this, we propose a novel approach that incorporates nearest neighborhood-based loss, allowing for flexible 3D scene reconstruction while effectively capturing intricate style details and ensuring multi-view consistency. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: AI for 3D Content Creation at International Conference on Computer Vision 2023

arXiv:2308.15960 [pdf, ps, other]

Fusing Pseudo Labels with Weak Supervision for Dynamic Traffic Scenarios

Authors: Harshith Mohan Kumar, Sean Lawrence

Abstract: Advanced Driver Assistance Systems (ADAS) have made significant strides, capitalizing on computer vision to enhance perception and decision-making capabilities. Nonetheless, the adaptation of these systems to diverse traffic scenarios poses challenges due to shifts in data distribution stemming from factors such as location, weather, and road infrastructure. To tackle this, we introduce a weakly-s… ▽ More Advanced Driver Assistance Systems (ADAS) have made significant strides, capitalizing on computer vision to enhance perception and decision-making capabilities. Nonetheless, the adaptation of these systems to diverse traffic scenarios poses challenges due to shifts in data distribution stemming from factors such as location, weather, and road infrastructure. To tackle this, we introduce a weakly-supervised label unification pipeline that amalgamates pseudo labels from a multitude of object detection models trained on heterogeneous datasets. Our pipeline engenders a unified label space through the amalgamation of labels from disparate datasets, rectifying bias and enhancing generalization. We fine-tune multiple object detection models on individual datasets, subsequently crafting a unified dataset featuring pseudo labels, meticulously validated for precision. Following this, we retrain a solitary object detection model using the merged label space, culminating in a resilient model proficient in dynamic traffic scenarios. We put forth a comprehensive evaluation of our approach, employing diverse datasets originating from varied Asian countries, effectively demonstrating its efficacy in challenging road conditions. Notably, our method yields substantial enhancements in object detection performance, culminating in a model with heightened resistance against domain shifts. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: This work was accepted as an extended abstract at the International Conference on Computer Vision (ICCV) 2023 BRAVO Workshop, Paris, France

arXiv:2308.06082 [pdf, other]

Security of XCB and HCTR

Authors: Manish Kumar

Abstract: Tweakable Enciphering Scheme (TES) is a length preserving scheme which provides confidentiality and admissible integrity. XCB (Extended Code Book) is a TES which was introduced in 2004. In 2007, it was modified and security bound was provided. Later, these two versions were referred to as XCBv1 and XCBv2 respectively. XCBv2 was proposed as the IEEE-std 1619.2 2010 for encryption of sector oriented… ▽ More Tweakable Enciphering Scheme (TES) is a length preserving scheme which provides confidentiality and admissible integrity. XCB (Extended Code Book) is a TES which was introduced in 2004. In 2007, it was modified and security bound was provided. Later, these two versions were referred to as XCBv1 and XCBv2 respectively. XCBv2 was proposed as the IEEE-std 1619.2 2010 for encryption of sector oriented storage media. In 2013, first time Security bound of XCBv1 was given and XCBv2's security bound was enhanced. A constant of $2^{22}$ appears in the security bounds of the XCBv1 and XCBv2. We showed that this constant of $2^{22}$ can be reduced to $2^{5}$. Further, we modified the XCB (MXCB) scheme such that it gives better security bound compared to the present XCB scheme. We also analyzed some weak keys attack on XCB and a type of TES known as HCTR (proposed in 2005). We performed distinguishing attack and the hash key recovery attack on HCTR. Next, we analyzed the dependency of the two different keys in HCTR. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: M.Tech Dissertation. Indian Statistical Institute, Kolkata, July 2018

arXiv:2308.04599 [pdf, ps, other]

Determinants vs. Algebraic Branching Programs

Authors: Abhranil Chatterjee, Mrinal Kumar, Ben Lee Volk

Abstract: We show that for every homogeneous polynomial of degree $d$, if it has determinantal complexity at most $s$, then it can be computed by a homogeneous algebraic branching program (ABP) of size at most $O(d^5s)$. Moreover, we show that for $\textit{most}$ homogeneous polynomials, the width of the resulting homogeneous ABP is just $s-1$ and the size is at most $O(ds)$. Thus, for constant degree hom… ▽ More We show that for every homogeneous polynomial of degree $d$, if it has determinantal complexity at most $s$, then it can be computed by a homogeneous algebraic branching program (ABP) of size at most $O(d^5s)$. Moreover, we show that for $\textit{most}$ homogeneous polynomials, the width of the resulting homogeneous ABP is just $s-1$ and the size is at most $O(ds)$. Thus, for constant degree homogeneous polynomials, their determinantal complexity and ABP complexity are within a constant factor of each other and hence, a super-linear lower bound for ABPs for any constant degree polynomial implies a super-linear lower bound on determinantal complexity; this relates two open problems of great interest in algebraic complexity. As of now, super-linear lower bounds for ABPs are known only for polynomials of growing degree, and for determinantal complexity the best lower bounds are larger than the number of variables only by a constant factor. While determinantal complexity and ABP complexity are classically known to be polynomially equivalent, the standard transformation from the former to the latter incurs a polynomial blow up in size in the process, and thus, it was unclear if a super-linear lower bound for ABPs implies a super-linear lower bound on determinantal complexity. In particular, a size preserving transformation from determinantal complexity to ABPs does not appear to have been known prior to this work, even for constant degree polynomials. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2307.13603 [pdf]

doi 10.1016/j.iotcps.2023.05.006

Blockchain inspired secure and reliable data exchange architecture for cyber-physical healthcare system 4.0

Authors: Mohit Kumar, Hritu Raj, Nisha Chaurasia, Sukhpal Singh Gill

Abstract: A cyber-physical system is considered to be a collection of strongly coupled communication systems and devices that poses numerous security trials in various industrial applications including healthcare. The security and privacy of patient data is still a big concern because healthcare data is sensitive and valuable, and it is most targeted over the internet. Moreover, from the industrial perspect… ▽ More A cyber-physical system is considered to be a collection of strongly coupled communication systems and devices that poses numerous security trials in various industrial applications including healthcare. The security and privacy of patient data is still a big concern because healthcare data is sensitive and valuable, and it is most targeted over the internet. Moreover, from the industrial perspective, the cyber-physical system plays a crucial role in the exchange of data remotely using sensor nodes in distributed environments. In the healthcare industry, Blockchain technology offers a promising solution to resolve most securities-related issues due to its decentralized, immutability, and transparency properties. In this paper, a blockchain-inspired secure and reliable data exchange architecture is proposed in the cyber-physical healthcare industry 4.0. The proposed system uses the BigchainDB, Tendermint, Inter-Planetary-File-System (IPFS), MongoDB, and AES encryption algorithms to improve Healthcare 4.0. Furthermore, blockchain-enabled secure healthcare architecture for accessing and managing the records between Doctors and Patients is introduced. The development of a blockchain-based Electronic Healthcare Record (EHR) exchange system is purely patient-centric, which means the entire control of data is in the owner's hand which is backed by blockchain for security and privacy. Our experimental results reveal that the proposed architecture is robust to handle more security attacks and can recover the data if 2/3 of nodes are failed. The proposed model is patient-centric, and control of data is in the patient's hand to enhance security and privacy, even system administrators can't access data without user permission. △ Less

Submitted 28 June, 2023; originally announced July 2023.

Journal ref: Internet of Things and Cyber-Physical Systems, Volume 3, 2023, Pages 309-322

arXiv:2307.13602 [pdf]

doi 10.1016/j.iotcps.2023.05.001

Fortaleza: The emergence of a network hub

Authors: Eric Bragion, Habiba Akter, Mohit Kumar, Minxian Xu, Ahmed M. Abdelmoniem, Sukhpal Singh Gill

Abstract: Digitalisation, accelerated by the pandemic, has brought the opportunity for companies to expand their businesses beyond their geographic location and has considerably affected networks around the world. Cloud services have a better acceptance nowadays, and it is foreseen that this industry will grow exponentially in the following years. With more distributed networks that need to support customer… ▽ More Digitalisation, accelerated by the pandemic, has brought the opportunity for companies to expand their businesses beyond their geographic location and has considerably affected networks around the world. Cloud services have a better acceptance nowadays, and it is foreseen that this industry will grow exponentially in the following years. With more distributed networks that need to support customers in different locations, the model of one-single server in big financial centres has become outdated and companies tend to look for alternatives that will meet their needs, and this seems to be the case with Fortaleza, in Brazil. With several submarine cables connections available, the city has stood out as a possible hub to different regions, and this is what this paper explores. Making use of real traffic data through looking glasses, we established a latency classification that ranges from exceptionally low to high and analysed 800 latencies from Roubaix, Fortaleza and Sao Paulo to Miami, Mexico City, Frankfurt, Paris, Milan, Prague, Sao Paulo, Santiago, Buenos Aires and Luanda. We found that non-developed countries have a big dependence on the United States to route Internet traffic. Despite this, Fortaleza proves to be an alternative for serving different regions with relatively low latencies. △ Less

Submitted 28 June, 2023; originally announced July 2023.

Journal ref: Published in Internet of Things and Cyber-Physical Systems, Volume 3, 2023, Pages 272-279

arXiv:2307.10934 [pdf, other]

OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios

Authors: Aditya Nalgunda Ganesh, Dhruval Pobbathi Badrinath, Harshith Mohan Kumar, Priya SS, Surabhi Narayan

Abstract: Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases.… ▽ More Modern approaches for vision-centric environment perception for autonomous navigation make extensive use of self-supervised monocular depth estimation algorithms that output disparity maps. However, when this disparity map is projected onto 3D space, the errors in disparity are magnified, resulting in a depth estimation error that increases quadratically as the distance from the camera increases. Though Light Detection and Ranging (LiDAR) can solve this issue, it is expensive and not feasible for many applications. To address the challenge of accurate ranging with low-cost sensors, we propose, OCTraN, a transformer architecture that uses iterative-attention to convert 2D image features into 3D occupancy features and makes use of convolution and transpose convolution to efficiently operate on spatial information. We also develop a self-supervised training pipeline to generalize the model to any scene by eliminating the need for LiDAR ground truth by substituting it with pseudo-ground truth labels obtained from boosted monocular depth estimation. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: This work was accepted as a spotlight presentation at the Transformers for Vision Workshop @CVPR 2023

arXiv:2307.09610 [pdf, other]

Design and Analysis of Pairing-Friendly Elliptic Curves for Cryptographic Primitives

Authors: Mahender Kumar

Abstract: Elliptic curve cryptography (ECC) is a remarkable mathematical tool that offers the same level of security as traditional public-key cryptography (PKC) with a significantly smaller key size and lower computational requirements. The use of pairing on elliptic curves has emerged as a vibrant field of research that provides enhanced security measures for the next generation of cryptographic systems.… ▽ More Elliptic curve cryptography (ECC) is a remarkable mathematical tool that offers the same level of security as traditional public-key cryptography (PKC) with a significantly smaller key size and lower computational requirements. The use of pairing on elliptic curves has emerged as a vibrant field of research that provides enhanced security measures for the next generation of cryptographic systems. This thesis explores using ECC and Pairing-Based Cryptosystems (PBC) as effective mathematical tools for achieving high-security levels with minimal key size and computation cost. Specifically, the research aims to analyze Pairing-Friendly Elliptic Curves (PF-EC) and their practicality in resource-constrained environments. It proposes solutions to some of the limitations of existing applications of pairing-based cryptography. The thesis begins by presenting a comprehensive framework for constructing PF-EC and evaluating the practical security of several families of pairing-friendly curves. The study then explores the limitations of Identity-Based Encryption (IBE), a recognized application of pairing-based cryptography. It proposes mechanisms to address issues such as key escrow, secure key issuing, user slandering, and key abusing problems. The proposed solutions include an Escrow-Free Identity-Based Encryption (EF-IBE) scheme secured against confidentiality and an Escrow-Free Identity-Based Signature (EF-IBS) scheme that is forgeable and secure. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.05922 [pdf, ps, other]

Sublinear Message Bounds of Authenticated Implicit Byzantine Agreement

Authors: Manish Kumar, Anisur Rahaman Molla

Abstract: This paper studies the message complexity of authenticated Byzantine agreement (BA) in synchronous, fully-connected distributed networks under an honest majority. We focus on the so-called {\em implicit} Byzantine agreement problem where each node starts with an input value and at the end a non-empty subset of the honest nodes should agree on a common input value by satisfying the BA properties (i… ▽ More This paper studies the message complexity of authenticated Byzantine agreement (BA) in synchronous, fully-connected distributed networks under an honest majority. We focus on the so-called {\em implicit} Byzantine agreement problem where each node starts with an input value and at the end a non-empty subset of the honest nodes should agree on a common input value by satisfying the BA properties (i.e., there can be undecided nodes). We show that a sublinear (in $n$, number of nodes) message complexity BA protocol under honest majority is possible in the standard PKI model when the nodes have access to an unbiased global coin and hash function. In particular, we present a randomized Byzantine agreement algorithm which, with high probability achieves implicit agreement, uses $\tilde{O}(\sqrt{n})$ messages, and runs in $\tilde{O}(1)$ rounds while tolerating $(1/2 - ε)n$ Byzantine nodes for any fixed $ε> 0$, the notation $\Tilde{O}$ hides a $O(\polylog{n})$ factor. The algorithm requires standard cryptographic setup PKI and hash function with a static Byzantine adversary. The algorithm works in the CONGEST model and each node does not need to know the identity of its neighbors, i.e., works in the $KT_0$ model. The message complexity (and also the time complexity) of our algorithm is optimal up to a $\polylog n$ factor, as we show a $Ω(\sqrt{n})$ lower bound on the message complexity. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.02411 [pdf]

An Approach to Remove Key Escrow Problem in ID-Based Encryption From Pairing

Authors: Mahender Kumar

Abstract: Key escrow refers to storing a copy of a cryptographic key with a trusted third party, typically a government agency or some other organization. Key escrow aims to ensure that law enforcement agencies can access encrypted data when necessary, for example, in criminal investigations or national security matters. However, key escrow also raises concerns about privacy and security. If the trusted thi… ▽ More Key escrow refers to storing a copy of a cryptographic key with a trusted third party, typically a government agency or some other organization. Key escrow aims to ensure that law enforcement agencies can access encrypted data when necessary, for example, in criminal investigations or national security matters. However, key escrow also raises concerns about privacy and security. If the trusted third party is compromised, the stored keys could be exposed, and unauthorized parties could access sensitive information. This could result in a significant breach of privacy and potentially harm national security. In identity-based cryptography, the key escrow problem arises because a trusted third party, the Private Key Generator (PKG), generates the private keys for all users. This means that the PKG has complete control over the private keys, which raises concerns about the security and privacy of the users. To balance security and privacy concerns, some approaches have been proposed to address the key escrow problem. We propose an efficient democratic identity-based encryption model that balances the government's and users' rights while ensuring security and privacy. The key objective of the proposed scheme is to provide the government with authority to monitor unlawful messages while ensuring the user's privacy for their lawful messages. To achieve this, the scheme involves two entities: PKG and PKPO. The user's partial key is escrowed at PKG, while the partial key is stored at PKPO. The latter provides a privacy service to the user by confusing their signature, which the user with their personal information can only unlock. △ Less

Submitted 6 May, 2023; originally announced July 2023.

Comments: M.Tech Dissertation

arXiv:2306.17035 [pdf]

doi 10.1145/3618260.3649611

Relaxed Local Correctability from Local Testing

Authors: Vinayak M. Kumar, Geoffrey Mon

Abstract: We construct the first asymptotically good relaxed locally correctable codes with polylogarithmic query complexity, bringing the upper bound polynomially close to the lower bound of Gur and Lachish (SICOMP 2021). Our result follows from showing that a high-rate locally testable code can boost the block length of a smaller relaxed locally correctable code, while preserving the correcting radius and… ▽ More We construct the first asymptotically good relaxed locally correctable codes with polylogarithmic query complexity, bringing the upper bound polynomially close to the lower bound of Gur and Lachish (SICOMP 2021). Our result follows from showing that a high-rate locally testable code can boost the block length of a smaller relaxed locally correctable code, while preserving the correcting radius and incurring only a modest additive cost in rate and query complexity. We use the locally testable code's tester to check if the amount of corruption in the input is low; if so, we can "zoom-in" to a suitable substring of the input and recurse on the smaller code's local corrector. Hence, iterating this operation with a suitable family of locally testable codes due to Dinur, Evra, Livne, Lubotzky, and Mozes (STOC 2022) yields asymptotically good codes with relaxed local correctability, arbitrarily large block length, and polylogarithmic query complexity. Our codes asymptotically inherit the rate and distance of any locally testable code used in the final invocation of the operation. Therefore, our framework also yields nonexplicit relaxed locally correctable codes with polylogarithmic query complexity that have rate and distance approaching the Gilbert-Varshamov bound. △ Less

Submitted 1 April, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 15 pages. Improved exposition, changed notation; to appear in STOC 2024

arXiv:2306.07915 [pdf, other]

Image Captioners Are Scalable Vision Learners Too

Authors: Michael Tschannen, Manoj Kumar, Andreas Steiner, Xiaohua Zhai, Neil Houlsby, Lucas Beyer

Abstract: Contrastive pretraining on image-text pairs from the web is one of the most popular large-scale pretraining strategies for vision backbones, especially in the context of large multimodal models. At the same time, image captioning on this type of data is commonly considered an inferior pretraining strategy. In this paper, we perform a fair comparison of these two pretraining strategies, carefully m… ▽ More Contrastive pretraining on image-text pairs from the web is one of the most popular large-scale pretraining strategies for vision backbones, especially in the context of large multimodal models. At the same time, image captioning on this type of data is commonly considered an inferior pretraining strategy. In this paper, we perform a fair comparison of these two pretraining strategies, carefully matching training data, compute, and model capacity. Using a standard encoder-decoder transformer, we find that captioning alone is surprisingly effective: on classification tasks, captioning produces vision encoders competitive with contrastively pretrained encoders, while surpassing them on vision & language tasks. We further analyze the effect of the model architecture and scale, as well as the pretraining data on the representation quality, and find that captioning exhibits the same or better scaling behavior along these axes. Overall our results show that plain image captioning is a more powerful pretraining strategy than was previously believed. △ Less

Submitted 21 December, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted at NeurIPS 2023. v2 adds SugarCrepe results and more ablations, v3 has minor fixes. v4 adds a code link ( https://github.com/google-research/big_vision ). v5 has minor fixes

arXiv:2306.04431 [pdf, other]

Faithful Knowledge Distillation

Authors: Tom A. Lamb, Rudy Brunel, Krishnamurthy DJ Dvijotham, M. Pawan Kumar, Philip H. S. Torr, Francisco Eiras

Abstract: Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In parti… ▽ More Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD setting, previous works overlook what we term the relative calibration of the student network with respect to its teacher in terms of soft confidences. In particular, we focus on two crucial questions with regard to a teacher-student pair: (i) do the teacher and student disagree at points close to correctly classified dataset examples, and (ii) is the distilled student as confident as the teacher around dataset examples? These are critical questions when considering the deployment of a smaller student network trained from a robust teacher within a safety-critical setting. To address these questions, we introduce a faithful imitation framework to discuss the relative calibration of confidences and provide empirical and certified methods to evaluate the relative calibration of a student w.r.t. its teacher. Further, to verifiably align the relative calibration incentives of the student to those of its teacher, we introduce faithful distillation. Our experiments on the MNIST, Fashion-MNIST and CIFAR-10 datasets demonstrate the need for such an analysis and the advantages of the increased verifiability of faithful distillation over alternative adversarial distillation methods. △ Less

Submitted 11 August, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 7pgs (main content), 4 figures

arXiv:2305.16583 [pdf, other]

Detecting Errors in a Numerical Response via any Regression Model

Authors: Hang Zhou, Jonas Mueller, Mayank Kumar, Jane-Ling Wang, Jing Lei

Abstract: Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider general regression settings with covariates and a potentially corrupted response whose observed values may contain errors. By accounting for various uncertaint… ▽ More Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider general regression settings with covariates and a potentially corrupted response whose observed values may contain errors. By accounting for various uncertainties, we introduced veracity scores that distinguish between genuine errors and natural data fluctuations, conditioned on the available covariate information in the dataset. We propose a simple yet efficient filtering procedure for eliminating potential errors, and establish theoretical guarantees for our method. We also contribute a new error detection benchmark involving 5 regression datasets with real-world numerical errors (for which the true values are also known). In this benchmark and additional simulation studies, our method identifies incorrect values with better precision/recall than other approaches. △ Less

Submitted 12 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Showing 1–50 of 327 results for author: Kumar, M