subscribe to arXiv mailings

arXiv:2310.20704 [pdf, other]

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders

Authors: Srijan Das, Tanmay Jain, Dominick Reilly, Pranav Balaji, Soumyajit Karmakar, Shyam Marjit, Xiang Li, Abhijit Das, Michael S. Ryoo

Abstract: Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training ViTs with self-supervised learning (SSL) and fine-tuning sequentially. However, we observe that jointly optimizing ViTs for the primary task and a Self-Supervis… ▽ More Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training ViTs with self-supervised learning (SSL) and fine-tuning sequentially. However, we observe that jointly optimizing ViTs for the primary task and a Self-Supervised Auxiliary Task (SSAT) is surprisingly beneficial when the amount of training data is limited. We explore the appropriate SSL tasks that can be optimized alongside the primary task, the training schemes for these tasks, and the data scale at which they can be most effective. Our findings reveal that SSAT is a powerful technique that enables ViTs to leverage the unique characteristics of both the self-supervised and primary tasks, achieving better performance than typical ViTs pre-training with SSL and fine-tuning sequentially. Our experiments, conducted on 10 datasets, demonstrate that SSAT significantly improves ViT performance while reducing carbon footprint. We also confirm the effectiveness of SSAT in the video domain for deepfake detection, showcasing its generalizability. Our code is available at https://github.com/dominickrei/Limited-data-vits. △ Less

Submitted 27 December, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted to WACV 2024

arXiv:2310.01995 [pdf, other]

Development of Machine Vision Approach for Mechanical Component Identification based on its Dimension and Pitch

Authors: Toshit Jain, Faisel Mushtaq, K Ramesh, Sandip Deshmukh, Tathagata Ray, Chandu Parimi, Praveen Tandon, Pramod Kumar Jha

Abstract: In this work, a highly customizable and scalable vision based system for automation of mechanical assembly lines is described. The proposed system calculates the features that are required to classify and identify the different kinds of bolts that are used in the assembly line. The system describes a novel method of calculating the pitch of the bolt in addition to bolt identification and calculati… ▽ More In this work, a highly customizable and scalable vision based system for automation of mechanical assembly lines is described. The proposed system calculates the features that are required to classify and identify the different kinds of bolts that are used in the assembly line. The system describes a novel method of calculating the pitch of the bolt in addition to bolt identification and calculating the dimensions of the bolts. This identification and classification system is extremely lightweight and can be run on bare minimum hardware. The system is very fast in the order of milliseconds, hence the system can be used successfully even if the components are steadily moving on a conveyor. The results show that our system can correctly identify the parts in our dataset with 98% accuracy using the calculated features. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 8 pages

ACM Class: I.4.7

arXiv:2309.14996 [pdf, other]

Implementation-Oblivious Transparent Checkpoint-Restart for MPI

Authors: Yao Xu, Leonid Belyaev, Twinkle Jain, Derek Schafer, Anthony Skjellum, Gene Cooperman

Abstract: This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each… ▽ More This work presents experience with traditional use cases of checkpointing on a novel platform. A single codebase (MANA) transparently checkpoints production workloads for major available MPI implementations: "develop once, run everywhere". The new platform enables application developers to compile their application against any of the available standards-compliant MPI implementations, and test each MPI implementation according to performance or other features. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 17 pages, 4 figures

arXiv:2309.14328 [pdf, other]

doi 10.2312/envirvis.20231100

pyParaOcean: A System for Visual Analysis of Ocean Data

Authors: Toshit Jain, Varun Singh, Vijay Kumar Boda, Upkar Singh, Ingrid Hotz, P. N. Vinayachandran, Vijay Natarajan

Abstract: Visual analysis is well adopted within the field of oceanography for the analysis of model simulations, detection of different phenomena and events, and tracking of dynamic processes. With increasing data sizes and the availability of multivariate dynamic data, there is a growing need for scalable and extensible tools for visualization and interactive exploration. We describe pyParaOcean, a visual… ▽ More Visual analysis is well adopted within the field of oceanography for the analysis of model simulations, detection of different phenomena and events, and tracking of dynamic processes. With increasing data sizes and the availability of multivariate dynamic data, there is a growing need for scalable and extensible tools for visualization and interactive exploration. We describe pyParaOcean, a visualization system that supports several tasks routinely used in the visual analysis of ocean data. The system is available as a plugin to Paraview and is hence able to leverage its distributed computing capabilities and its rich set of generic analysis and visualization functionalities. pyParaOcean provides modules to support different visual analysis tasks specific to ocean data, such as eddy identification and salinity movement tracking. These modules are available as Paraview filters and this seamless integration results in a system that is easy to install and use. A case study on the Bay of Bengal illustrates the utility of the system for the study of ocean phenomena and processes. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 8 pages, EnvirVis2023

ACM Class: F.7; I.3.6

Journal ref: envirvis2023

arXiv:2304.08665 [pdf, other]

Insta(nt) Pet Therapy: GAN-generated Images for Therapeutic Social Media Content

Authors: Tanish Jain

Abstract: The positive therapeutic effect of viewing pet images online has been well-studied. However, it is difficult to obtain large-scale production of such content since it relies on pet owners to capture photographs and upload them. I use a Generative Adversarial Network-based framework for the creation of fake pet images at scale. These images are uploaded on an Instagram account where they drive user… ▽ More The positive therapeutic effect of viewing pet images online has been well-studied. However, it is difficult to obtain large-scale production of such content since it relies on pet owners to capture photographs and upload them. I use a Generative Adversarial Network-based framework for the creation of fake pet images at scale. These images are uploaded on an Instagram account where they drive user engagement at levels comparable to those seen with images from accounts with traditional pet photographs, underlining the applicability of the framework to be used for pet-therapy social media content. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 7 pages, 7 figures

arXiv:2210.15030 [pdf]

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Authors: Srishti Mishra, Tvarita Jain, Dinkar Sitaram

Abstract: Anomaly detection to recognize unusual events in large scale systems in a time sensitive manner is critical in many industries, eg. bank fraud, enterprise systems, medical alerts, etc. Large-scale systems often grow in size and complexity over time, and anomaly detection algorithms need to adapt to changing structures. A hierarchical approach takes advantage of the implicit relationships in comple… ▽ More Anomaly detection to recognize unusual events in large scale systems in a time sensitive manner is critical in many industries, eg. bank fraud, enterprise systems, medical alerts, etc. Large-scale systems often grow in size and complexity over time, and anomaly detection algorithms need to adapt to changing structures. A hierarchical approach takes advantage of the implicit relationships in complex systems and localized context. The features in complex systems may vary drastically in data distribution, capturing different aspects from multiple data sources, and when put together provide a more complete view of the system. In this paper, two datasets are considered, the 1st comprising of system metrics from machines running on a cloud service, and the 2nd of application metrics from a large-scale distributed software system with inherent hierarchies and interconnections amongst its system nodes. Comparing algorithms, across the changepoint based PELT algorithm, cognitive learning-based Hierarchical Temporal Memory algorithms, Support Vector Machines and Conditional Random Fields provides a basis for proposing a Hierarchical Global-Local Conditional Random Field approach to accurately capture anomalies in complex systems across various features. Hierarchical algorithms can learn both the intricacies of specific features, and utilize these in a global abstracted representation to detect anomalous patterns robustly across multi-source feature data and distributed systems. A graphical network analysis on complex systems can further fine-tune datasets to mine relationships based on available features, which can benefit hierarchical models. Furthermore, hierarchical solutions can adapt well to changes at a localized level, learning on new data and changing environments when parts of a system are over-hauled, and translate these learnings to a global view of the system over time. △ Less

Submitted 28 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: 8 pages, Preprint, This paper was originally written in 2019

arXiv:2208.09484 [pdf]

In Silico Prediction of Blood-Brain Barrier Permeability of Chemical Compounds through Molecular Feature Modeling

Authors: Tanish Jain, Praveen Kumar Pandian Shanmuganathan

Abstract: The introduction of computational techniques to analyze chemical data has given rise to the analytical study of biological systems, known as "bioinformatics". One facet of bioinformatics is using machine learning (ML) technology to detect multivariable trends in various cases. Amongst the most pressing cases is predicting blood-brain barrier (BBB) permeability. The development of new drugs to trea… ▽ More The introduction of computational techniques to analyze chemical data has given rise to the analytical study of biological systems, known as "bioinformatics". One facet of bioinformatics is using machine learning (ML) technology to detect multivariable trends in various cases. Amongst the most pressing cases is predicting blood-brain barrier (BBB) permeability. The development of new drugs to treat central nervous system disorders presents unique challenges due to poor penetration efficacy across the blood-brain barrier. In this research, we aim to mitigate this problem through an ML model that analyzes chemical features. To do so: (i) An overview into the relevant biological systems and processes as well as the use case is given. (ii) Second, an in-depth literature review of existing computational techniques for detecting BBB permeability is undertaken. From there, an aspect unexplored across current techniques is identified and a solution is proposed. (iii) Lastly, a two-part in silico model to quantify likelihood of permeability of drugs with defined features across the BBB through passive diffusion is developed, tested, and reflected on. Testing and validation with the dataset determined the predictive logBB model's mean squared error to be around 0.112 units and the neuroinflammation model's mean squared error to be approximately 0.3 units, outperforming all relevant studies found. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: Editor Praveen Kumar Pandian Shanmuganathan, 17 pages, 5 figures

arXiv:2207.02207 [pdf, other]

None Shall Pass: A blockchain-based federated identity management system

Authors: Shlok Gilda, Tanvi Jain, Aashish Dhalla

Abstract: Authentication and authorization of a user's identity are generally done by the service providers or identity providers. However, these centralized systems limit the user's control of their own identity and are prone to massive data leaks due to their centralized nature. We propose a blockchain-based identity management system to authenticate and authorize users using attribute-based access contro… ▽ More Authentication and authorization of a user's identity are generally done by the service providers or identity providers. However, these centralized systems limit the user's control of their own identity and are prone to massive data leaks due to their centralized nature. We propose a blockchain-based identity management system to authenticate and authorize users using attribute-based access control policies and privacy-preserving algorithms and finally returning the control of a user's identity to the user. Our proposed system would use a private blockchain, which would store the re-certification events and data access and authorization requests for users' identities in a secure, verifiable manner, thus ensuring the integrity of the data. This paper suggests a mechanism to digitize documents such as passports, driving licenses, electricity bills, etc., issued by any government authority or other authority in an immutable and secure manner. The data owners are responsible for authenticating and propagating the users' identities as and when needed using the OpenID Connect protocol to enable single sign-on. We use advanced cryptographic algorithms to provide pseudonyms to the users, thus ensuring their privacy. These algorithms also ensure the auditability of transactions as and when required. Our proposed system helps in mitigating some of the issues in the recent privacy debates. The project finds its applications in citizen transfers, inter-country service providence, banks, ownership transfer, etc. The generic framework can also be extended to a consortium of banks, hospitals, etc. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted for publication in "Springer - Lecture Notes in Networks and Systems"

arXiv:2204.02368 [pdf, other]

Too Big to Fail? Active Few-Shot Learning Guided Logic Synthesis

Authors: Animesh Basak Chowdhury, Benjamin Tan, Ryan Carey, Tushit Jain, Ramesh Karri, Siddharth Garg

Abstract: Generating sub-optimal synthesis transformation sequences ("synthesis recipe") is an important problem in logic synthesis. Manually crafted synthesis recipes have poor quality. State-of-the art machine learning (ML) works to generate synthesis recipes do not scale to large netlists as the models need to be trained from scratch, for which training data is collected using time consuming synthesis ru… ▽ More Generating sub-optimal synthesis transformation sequences ("synthesis recipe") is an important problem in logic synthesis. Manually crafted synthesis recipes have poor quality. State-of-the art machine learning (ML) works to generate synthesis recipes do not scale to large netlists as the models need to be trained from scratch, for which training data is collected using time consuming synthesis runs. We propose a new approach, Bulls-Eye, that fine-tunes a pre-trained model on past synthesis data to accurately predict the quality of a synthesis recipe for an unseen netlist. This approach on achieves 2x-10x run-time improvement and better quality-of-result (QoR) than state-of-the-art machine learning approaches. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: 10 pages, 6 Tables, 7 figures

arXiv:2203.11556 [pdf, other]

VQ-Flows: Vector Quantized Local Normalizing Flows

Authors: Sahil Sidheekh, Chris B. Dock, Tushar Jain, Radu Balan, Maneesh K. Singh

Abstract: Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a low-dimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning… ▽ More Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a low-dimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds. △ Less

Submitted 18 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Accepted to The 38th Conference on Uncertainty in Artificial Intelligence (UAI) 2022

arXiv:2112.10364 [pdf, other]

NavP: Enabling Navigational Programming for Science Data Processing via Application-Initiated Checkpointing

Authors: Lei Pan, Twinkle Jain

Abstract: Science Data Systems (SDS) handle science data from acquisition through processing to distribution. They are deployed in the Cloud today, and the efficiency of Cloud instance utilization is critical to success. Conventional SDS are unable to take advantage of a cost-effective Amazon EC2 spot market, especially for long-running tasks. Some of the difficulties found in current practice at NASA/JPL a… ▽ More Science Data Systems (SDS) handle science data from acquisition through processing to distribution. They are deployed in the Cloud today, and the efficiency of Cloud instance utilization is critical to success. Conventional SDS are unable to take advantage of a cost-effective Amazon EC2 spot market, especially for long-running tasks. Some of the difficulties found in current practice at NASA/JPL are: a lack of mechanism for app programmers to save valuable partial results for future processing continuation, the heavy weight from using container-based (Singularity) sandboxes with more than 200,000 OS-level files; and the gap between scientists developing algorithms/programs on a laptop and the SDS experts deploying software in Cloud computing or supercomputing. We present a first proof-of-principle of this using NavP (Navigational Programming) and fault-tolerant computing (FTC) in SDS, by employing program state migration facilitated by Checkpoint-Restart (C/R). NavP provides a new navigational view of computations in a distributed world for the application programmers. The tool of DHP (DMTCP Hop and Publish) we developed enables the application programmers to navigate the computation among instances or nodes by inserting hop(destination) statements in their app code, and choose when to publish partial results at stages of their algorithms that they think worthwhile for future continuation. The result of using DHP is that a parallel distributed SDS becomes easier to program and deploy, and this enables more efficient leveraging of the Amazon EC2 Spot market. This technical report describes a high-level design and an initial implementation. △ Less

Submitted 20 December, 2021; originally announced December 2021.

ACM Class: D.4.5

arXiv:2110.14455 [pdf, other]

CBIR using Pre-Trained Neural Networks

Authors: Agnel Lazar Alappat, Prajwal Nakhate, Sagar Suman, Ambarish Chandurkar, Varad Pimpalkhute, Tapan Jain

Abstract: Much of the recent research work in image retrieval, has been focused around using Neural Networks as the core component. Many of the papers in other domain have shown that training multiple models, and then combining their outcomes, provide good results. This is since, a single Neural Network model, may not extract sufficient information from the input. In this paper, we aim to follow a different… ▽ More Much of the recent research work in image retrieval, has been focused around using Neural Networks as the core component. Many of the papers in other domain have shown that training multiple models, and then combining their outcomes, provide good results. This is since, a single Neural Network model, may not extract sufficient information from the input. In this paper, we aim to follow a different approach. Instead of the using a single model, we use a pretrained Inception V3 model, and extract activation of its last fully connected layer, which forms a low dimensional representation of the image. This feature matrix, is then divided into branches and separate feature extraction is done for each branch, to obtain multiple features flattened into a vector. Such individual vectors are then combined, to get a single combined feature. We make use of CUB200-2011 Dataset, which comprises of 200 birds classes to train the model on. We achieved a training accuracy of 99.46% and validation accuracy of 84.56% for the same. On further use of 3 branched global descriptors, we improve the validation accuracy to 88.89%. For this, we made use of MS-RMAC feature extraction method. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2108.03272 [pdf, other]

iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks

Authors: Chengshu Li, Fei Xia, Roberto Martín-Martín, Michael Lingelbach, Sanjana Srivastava, Bokui Shen, Kent Vainio, Cem Gokmen, Gokul Dharan, Tanish Jain, Andrey Kurenkov, C. Karen Liu, Hyowon Gweon, Jiajun Wu, Li Fei-Fei, Silvio Savarese

Abstract: Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of… ▽ More Recent research in embodied AI has been boosted by the use of simulation environments to develop and train robot learning approaches. However, the use of simulation has skewed the attention to tasks that only require what robotics simulators can simulate: motion and physical contact. We present iGibson 2.0, an open-source simulation environment that supports the simulation of a more diverse set of household tasks through three key innovations. First, iGibson 2.0 supports object states, including temperature, wetness level, cleanliness level, and toggled and sliced states, necessary to cover a wider range of tasks. Second, iGibson 2.0 implements a set of predicate logic functions that map the simulator states to logic states like Cooked or Soaked. Additionally, given a logic state, iGibson 2.0 can sample valid physical states that satisfy it. This functionality can generate potentially infinite instances of tasks with minimal effort from the users. The sampling mechanism allows our scenes to be more densely populated with small objects in semantically meaningful locations. Third, iGibson 2.0 includes a virtual reality (VR) interface to immerse humans in its scenes to collect demonstrations. As a result, we can collect demonstrations from humans on these new types of tasks, and use them for imitation learning. We evaluate the new capabilities of iGibson 2.0 to enable robot learning of novel tasks, in the hope of demonstrating the potential of this new simulator to support new research in embodied AI. iGibson 2.0 and its new dataset are publicly available at http://svl.stanford.edu/igibson/. △ Less

Submitted 3 November, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted at Conference on Robot Learning (CoRL) 2021. Project website: http://svl.stanford.edu/igibson/

arXiv:2107.09123 [pdf, ps, other]

doi 10.1109/COMSNETS53615.2022.9668356

Latency-Memory Optimized Splitting of Convolution Neural Networks for Resource Constrained Edge Devices

Authors: Tanmay Jain, Avaneesh, Rohit Verma, Rajeev Shorey

Abstract: With the increasing reliance of users on smart devices, bringing essential computation at the edge has become a crucial requirement for any type of business. Many such computations utilize Convolution Neural Networks (CNNs) to perform AI tasks, having high resource and computation requirements, that are infeasible for edge devices. Splitting the CNN architecture to perform part of the computation… ▽ More With the increasing reliance of users on smart devices, bringing essential computation at the edge has become a crucial requirement for any type of business. Many such computations utilize Convolution Neural Networks (CNNs) to perform AI tasks, having high resource and computation requirements, that are infeasible for edge devices. Splitting the CNN architecture to perform part of the computation on edge and remaining on the cloud is an area of research that has seen increasing interest in the field. In this paper, we assert that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem that minimizes the latency and maximizes resource utilization at the edge. We formulate a multi-objective optimization problem and propose the LMOS algorithm to achieve a Pareto efficient solution. Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge and also improves upon existing state-of-the-art approaches. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2103.08546 [pdf, other]

Improving scalability and reliability of MPI-agnostic transparent checkpointing for production workloads at NERSC

Authors: Prashant Singh Chouhan, Harsh Khetawat, Neil Resnik, Twinkle Jain, Rohan Garg, Gene Cooperman, Rebecca Hartman-Baker, Zhengji Zhao

Abstract: Checkpoint/restart (C/R) provides fault-tolerant computing capability, enables long running applications, and provides scheduling flexibility for computing centers to support diverse workloads with different priority. It is therefore vital to get transparent C/R capability working at NERSC. MANA, by Garg et. al., is a transparent checkpointing tool that has been selected due to its MPI-agnostic an… ▽ More Checkpoint/restart (C/R) provides fault-tolerant computing capability, enables long running applications, and provides scheduling flexibility for computing centers to support diverse workloads with different priority. It is therefore vital to get transparent C/R capability working at NERSC. MANA, by Garg et. al., is a transparent checkpointing tool that has been selected due to its MPI-agnostic and network-agnostic approach. However, originally written as a proof-of-concept code, MANA was not ready to use with NERSC's diverse production workloads, which are dominated by MPI and hybrid MPI+OpenMP applications. In this talk, we present ongoing work at NERSC to enable MANA for NERSC's production workloads, including fixing bugs that were exposed by the top applications at NERSC, adding new features to address system changes, evaluating C/R overhead at scale, etc. The lessons learned from making MANA production-ready for HPC applications will be useful for C/R tool developers, supercomputing centers and HPC end-users alike. △ Less

Submitted 16 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.04916 [pdf, other]

Transparent Checkpointing for OpenGL Applications on GPUs

Authors: David Hou, Jun Gan, Yue Li, Younes El Idrissi Yazami, Twinkle Jain

Abstract: This work presents transparent checkpointing of OpenGL applications, refining the split-process technique[1] for application in GPU-based 3D graphics. The split-process technique was earlier applied to checkpointing MPI and CUDA programs, enabling reinitialization of driver libraries. The presented design targets practical, checkpoint-package agnostic checkpointing of OpenGL applications. An ear… ▽ More This work presents transparent checkpointing of OpenGL applications, refining the split-process technique[1] for application in GPU-based 3D graphics. The split-process technique was earlier applied to checkpointing MPI and CUDA programs, enabling reinitialization of driver libraries. The presented design targets practical, checkpoint-package agnostic checkpointing of OpenGL applications. An early prototype is demonstrated on Autodesk Maya. Maya is a complex proprietary media-creation software suite used with large-scale rendering hardware for CGI (Computer-Generated Animation). Transparent checkpointing of Maya provides critically-needed fault tolerance, since Maya is prone to crash when artists use some of its bleeding-edge components. Artists then lose hours of work in re-creating their complex environment. △ Less

Submitted 1 August, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

ACM Class: D.4.5

arXiv:2103.03311 [pdf, ps, other]

Checkpointing SPAdes for Metagenome Assembly: Transparency versus Performance in Production

Authors: Twinkle Jain, Jie Wang

Abstract: The SPAdes assembler for metagenome assembly is a long-running application commonly used at the NERSC supercomputing site. However, NERSC, like many other sites, has a 48-hour limit on resource allocations. The solution is to chain together multiple resource allocations in a single run, using checkpoint-restart. This case study provides insights into the "pain points" in applying a well-known chec… ▽ More The SPAdes assembler for metagenome assembly is a long-running application commonly used at the NERSC supercomputing site. However, NERSC, like many other sites, has a 48-hour limit on resource allocations. The solution is to chain together multiple resource allocations in a single run, using checkpoint-restart. This case study provides insights into the "pain points" in applying a well-known checkpointing package (DMTCP: Distributed MultiThreaded CheckPointing) to long-running production workloads of SPAdes. This work has exposed several bugs and limitations of DMTCP, which were fixed to support the large memory and fragmented intermediate files of SPAdes. But perhaps more interesting for other applications, this work reveals a tension between the transparency goals of DMTCP and performance concerns due to an I/O bottleneck during the checkpointing process when supporting large memory and many files. Suggestions are made for overcoming this I/O bottleneck, which provides important "lessons learned" for similar applications. △ Less

Submitted 4 March, 2021; originally announced March 2021.

ACM Class: D.4.5

arXiv:2101.08201 [pdf, other]

Can Taxonomy Help? Improving Semantic Question Matching using Question Taxonomy

Authors: Deepak Gupta, Rajkumar Pujari, Asif Ekbal, Pushpak Bhattacharyya, Anutosh Maitra, Tom Jain, Shubhashis Sengupta

Abstract: In this paper, we propose a hybrid technique for semantic question matching. It uses our proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question classifier. Experiments performed on three open-domain datasets demonstrate the effectiveness of our proposed approach. We achieve state-of-… ▽ More In this paper, we propose a hybrid technique for semantic question matching. It uses our proposed two-layered taxonomy for English questions by augmenting state-of-the-art deep learning models with question classes obtained from a deep learning based question classifier. Experiments performed on three open-domain datasets demonstrate the effectiveness of our proposed approach. We achieve state-of-the-art results on partial ordering question ranking (POQR) benchmark dataset. Our empirical analysis shows that coupling standard distributional features (provided by the question encoder) with knowledge from taxonomy is more effective than either deep learning (DL) or taxonomy-based knowledge alone. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: Paper was accepted at COLING 2018, presented as a poster

arXiv:2008.10596 [pdf, ps, other]

CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM

Authors: Twinkle Jain, Gene Cooperman

Abstract: The share of the top 500 supercomputers with NVIDIA GPUs is now over 25% and continues to grow. While fault tolerance is a critical issue for supercomputing, there does not currently exist an efficient, scalable solution for CUDA applications on NVIDIA GPUs. CRAC (Checkpoint-Restart Architecture for CUDA) is new checkpoint-restart solution for fault tolerance that supports the full range of CUDA a… ▽ More The share of the top 500 supercomputers with NVIDIA GPUs is now over 25% and continues to grow. While fault tolerance is a critical issue for supercomputing, there does not currently exist an efficient, scalable solution for CUDA applications on NVIDIA GPUs. CRAC (Checkpoint-Restart Architecture for CUDA) is new checkpoint-restart solution for fault tolerance that supports the full range of CUDA applications. CRAC combines: low runtime overhead (approximately 1% or less); fast checkpoint-restart; support for scalable CUDA streams (for efficient usage of all of the thousands of GPU cores); and support for the full features of Unified Virtual Memory (eliminating the programmer's burden of migrating memory between device and host). CRAC achieves its flexible architecture by segregating application code (checkpointed) and its external GPU communication via non-reentrant CUDA libraries (not checkpointed) within a single process's memory. This eliminates the high overhead of inter-process communication in earlier approaches, and has fewer limitations. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: 24 pages, 6 figures, 3 tables; to appear in SC'20: The International Conference for High Performance Computing, Networking, Storage, and Analysis

ACM Class: D.4.5

arXiv:2005.06011 [pdf]

Data Comets: Designing a Visualization Tool for Analyzing Autonomous Aerial Vehicle Logs with Grounded Evaluation

Authors: David Saffo, Aristotelis Leventidis, Twinkle Jain, Michelle A. Borkin, Cody Dunne

Abstract: Autonomous unmanned aerial vehicles are complex systems of hardware, software, and human input. Understanding this complexity is key to their development and operation. Information visualizations already exist for exploring flight logs but comprehensive analyses currently require several disparate and custom tools. This design study helps address the pain points faced by autonomous unmanned aerial… ▽ More Autonomous unmanned aerial vehicles are complex systems of hardware, software, and human input. Understanding this complexity is key to their development and operation. Information visualizations already exist for exploring flight logs but comprehensive analyses currently require several disparate and custom tools. This design study helps address the pain points faced by autonomous unmanned aerial vehicle developers and operators. We contribute: a spiral development process model for grounded evaluation visualization development focused on progressively broadening target user involvement and refining user goals; a demonstration of the model as part of developing a deployed and adopted visualization system; a data and task abstraction for developers and operators performing post-flight analysis of autonomous unmanned aerial vehicle logs; the design and implementation of DATA COMETS, an open-source and web-based interactive visualization tool for post-flight log analysis incorporating temporal, geospatial, and multivariate data; and the results of a summative evaluation of the visualization system and our abstractions based on in-the-wild usage. A free copy of this paper and source code are available at osf.io/h4p7g △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: EuroVis 2020 Full Paper

arXiv:1812.09755 [pdf, other]

Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks

Authors: Amanpreet Singh, Tushar Jain, Sainbayar Sukhbaatar

Abstract: Learning when to communicate and doing that effectively is essential in multi-agent tasks. Recent works show that continuous communication allows efficient training with back-propagation in multi-agent scenarios, but have been restricted to fully-cooperative tasks. In this paper, we present Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than… ▽ More Learning when to communicate and doing that effectively is essential in multi-agent tasks. Recent works show that continuous communication allows efficient training with back-propagation in multi-agent scenarios, but have been restricted to fully-cooperative tasks. In this paper, we present Individualized Controlled Continuous Communication Model (IC3Net) which has better training efficiency than simple continuous communication model, and can be applied to semi-cooperative and competitive settings along with the cooperative settings. IC3Net controls continuous communication with a gating mechanism and uses individualized rewards foreach agent to gain better performance and scalability while fixing credit assignment issues. Using variety of tasks including StarCraft BroodWars explore and combat scenarios, we show that our network yields improved performance and convergence rates than the baselines as the scale increases. Our results convey that IC3Net agents learn when to communicate based on the scenario and profitability. △ Less

Submitted 23 December, 2018; originally announced December 2018.

Comments: Accepted to ICLR 2019

Showing 1–21 of 21 results for author: Jain, T