subscribe to arXiv mailings

CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent

Authors: Pranav Ajit Nair, Arun Sai Suggala

Abstract: Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-t… ▽ More Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. Through extensive evaluation on the PaLM2 model family, we demonstrate that CDQuant consistently outperforms GPTQ across diverse model sizes and quantization levels. In particular, for INT2 quantization of PaLM2-Otter, CDQuant achieves a 10% reduction in perplexity compared to GPTQ. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.06702 [pdf, other]

Malayalam Sign Language Identification using Finetuned YOLOv8 and Computer Vision Techniques

Authors: Abhinand K., Abhiram B. Nair, Dhananjay C., Hanan Hamza, Mohammed Fawaz J., Rahma Fahim K., Anoop V. S

Abstract: Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these da… ▽ More Technological advancements and innovations are advancing our daily life in all the ways possible but there is a larger section of society who are deprived of accessing the benefits due to their physical inabilities. To reap the real benefits and make it accessible to society, these talented and gifted people should also use such innovations without any hurdles. Many applications developed these days address these challenges, but localized communities and other constrained linguistic groups may find it difficult to use them. Malayalam, a Dravidian language spoken in the Indian state of Kerala is one of the twenty-two scheduled languages in India. Recent years have witnessed a surge in the development of systems and tools in Malayalam, addressing the needs of Kerala, but many of them are not empathetically designed to cater to the needs of hearing-impaired people. One of the major challenges is the limited or no availability of sign language data for the Malayalam language and sufficient efforts are not made in this direction. In this connection, this paper proposes an approach for sign language identification for the Malayalam language using advanced deep learning and computer vision techniques. We start by developing a labeled dataset for Malayalam letters and for the identification we use advanced deep learning techniques such as YOLOv8 and computer vision. Experimental results show that the identification accuracy is comparable to other sign language identification systems and other researchers in sign language identification can use the model as a baseline to develop advanced models. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.13057 [pdf, other]

"Hey..! This medicine made me sick": Sentiment Analysis of User-Generated Drug Reviews using Machine Learning Techniques

Authors: Abhiram B. Nair, Abhinand K., Anamika U., Denil Tom Jaison, Ajitha V., V. S. Anoop

Abstract: Sentiment analysis has become increasingly important in healthcare, especially in the biomedical and pharmaceutical fields. The data generated by the general public on the effectiveness, side effects, and adverse drug reactions are goldmines for different agencies and medicine producers to understand the concerns and reactions of people. Despite the challenge of obtaining datasets on drug-related… ▽ More Sentiment analysis has become increasingly important in healthcare, especially in the biomedical and pharmaceutical fields. The data generated by the general public on the effectiveness, side effects, and adverse drug reactions are goldmines for different agencies and medicine producers to understand the concerns and reactions of people. Despite the challenge of obtaining datasets on drug-related problems, sentiment analysis on this topic would be a significant boon to the field. This project proposes a drug review classification system that classifies user reviews on a particular drug into different classes, such as positive, negative, and neutral. This approach uses a dataset that is collected from publicly available sources containing drug reviews, such as drugs.com. The collected data is manually labeled and verified manually to ensure that the labels are correct. Three pre-trained language models, such as BERT, SciBERT, and BioBERT, are used to obtain embeddings, which were later used as features to different machine learning classifiers such as decision trees, support vector machines, random forests, and also deep learning algorithms such as recurrent neural networks. The performance of these classifiers is quantified using precision, recall, and f1-score, and the results show that the proposed approaches are useful in analyzing the sentiments of people on different drugs. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.09765 [pdf, other]

Hilti SLAM Challenge 2023: Benchmarking Single + Multi-session SLAM across Sensor Constellations in Construction

Authors: Ashish Devadas Nair, Julien Kindle, Plamen Levchev, Davide Scaramuzza

Abstract: Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites… ▽ More Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.00710 [pdf, other]

Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization

Authors: Mainak Singha, Ankit Jha, Shirsha Bose, Ashwin Nair, Moloud Abdar, Biplab Banerjee

Abstract: We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harne… ▽ More We delve into Open Domain Generalization (ODG), marked by domain and category shifts between training's labeled source and testing's unlabeled target domains. Existing solutions to ODG face limitations due to constrained generalizations of traditional CNN backbones and errors in detecting target open samples in the absence of prior knowledge. Addressing these pitfalls, we introduce ODG-CLIP, harnessing the semantic prowess of the vision-language model, CLIP. Our framework brings forth three primary innovations: Firstly, distinct from prevailing paradigms, we conceptualize ODG as a multi-class classification challenge encompassing both known and novel categories. Central to our approach is modeling a unique prompt tailored for detecting unknown class samples, and to train this, we employ a readily accessible stable diffusion model, elegantly generating proxy images for the open class. Secondly, aiming for domain-tailored classification (prompt) weights while ensuring a balance of precision and simplicity, we devise a novel visual stylecentric prompt learning mechanism. Finally, we infuse images with class-discriminative knowledge derived from the prompt space to augment the fidelity of CLIP's visual embeddings. We introduce a novel objective to safeguard the continuity of this infused semantic intel across domains, especially for the shared classes. Through rigorous testing on diverse datasets, covering closed and open-set DG contexts, ODG-CLIP demonstrates clear supremacy, consistently outpacing peers with performance boosts between 8%-16%. Code will be available at https://github.com/mainaksingha01/ODG-CLIP. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted in CVPR 2024

arXiv:2402.14268 [pdf, other]

Can Large Language Models Detect Misinformation in Scientific News Reporting?

Authors: Yupeng Cao, Aishwarya Muralidharan Nair, Elyon Eyimife, Nastaran Jamalipour Soofi, K. P. Subbalakshmi, John R. Wullert II, Chumki Basu, David Shallcross

Abstract: Scientific facts are often spun in the popular press with the intent to influence public opinion and action, as was evidenced during the COVID-19 pandemic. Automatic detection of misinformation in the scientific domain is challenging because of the distinct styles of writing in these two media types and is still in its nascence. Most research on the validity of scientific reporting treats this pro… ▽ More Scientific facts are often spun in the popular press with the intent to influence public opinion and action, as was evidenced during the COVID-19 pandemic. Automatic detection of misinformation in the scientific domain is challenging because of the distinct styles of writing in these two media types and is still in its nascence. Most research on the validity of scientific reporting treats this problem as a claim verification challenge. In doing so, significant expert human effort is required to generate appropriate claims. Our solution bypasses this step and addresses a more real-world scenario where such explicit, labeled claims may not be available. The central research question of this paper is whether it is possible to use large language models (LLMs) to detect misinformation in scientific reporting. To this end, we first present a new labeled dataset SciNews, containing 2.4k scientific news stories drawn from trusted and untrustworthy sources, paired with related abstracts from the CORD-19 database. Our dataset includes both human-written and LLM-generated news articles, making it more comprehensive in terms of capturing the growing trend of using LLMs to generate popular press articles. Then, we identify dimensions of scientific validity in science news articles and explore how this can be integrated into the automated detection of scientific misinformation. We propose several baseline architectures using LLMs to automatically detect false representations of scientific findings in the popular press. For each of these architectures, we use several prompt engineering strategies including zero-shot, few-shot, and chain-of-thought prompting. We also test these architectures and prompting strategies on GPT-3.5, GPT-4, and Llama2-7B, Llama2-13B. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.08644 [pdf, other]

Tandem Transformers for Inference Efficient LLMs

Authors: Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

Abstract: The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations. We introduce a novel architectu… ▽ More The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations. We introduce a novel architecture, Tandem transformers, to address these issues. This architecture uniquely combines (1) a small autoregressive model and (2) a large model operating in block mode (processing multiple tokens simultaneously). The small model's predictive accuracy is substantially enhanced by granting it attention to the large model's richer representations. On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3.3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1.16x speedup compared to a PaLM2-Otter model with comparable downstream performance. We further incorporate the tandem model within the speculative decoding (SPEED) framework where the large model validates tokens from the small model. This ensures that the Tandem of PaLM2-Bison and PaLM2-Gecko achieves substantial speedup (around 1.14x faster than using vanilla PaLM2-Gecko in SPEED) while maintaining identical downstream task accuracy. △ Less

Submitted 26 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2311.10275 [pdf, other]

Telescope: Telemetry at Terabyte Scale

Authors: Alan Nair, Sandeep Kumar, Aravinda Prasad, Andy Rudoff, Sreenivas Subramoney

Abstract: Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately,… ▽ More Data-hungry applications that require terabytes of memory have become widespread in recent years. To meet the memory needs of these applications, data centers are embracing tiered memory architectures with near and far memory tiers. Precise, efficient, and timely identification of hot and cold data and their placement in appropriate tiers is critical for performance in such systems. Unfortunately, the existing state-of-the-art telemetry techniques for hot and cold data detection are ineffective at the terabyte scale. We propose Telescope, a novel technique that profiles different levels of the application's page table tree for fast and efficient identification of hot and cold data. Telescope is based on the observation that, for a memory- and TLB-intensive workload, higher levels of a page table tree are also frequently accessed during a hardware page table walk. Hence, the hotness of the higher levels of the page table tree essentially captures the hotness of its subtrees or address space sub-regions at a coarser granularity. We exploit this insight to quickly converge on even a few megabytes of hot data and efficiently identify several gigabytes of cold data in terabyte-scale applications. Importantly, such a technique can seamlessly scale to petabyte-scale applications. Telescope's telemetry achieves 90%+ precision and recall at just 0.009% single CPU utilization for microbenchmarks with a 5 TB memory footprint. Memory tiering based on Telescope results in 5.6% to 34% throughput improvement for real-world benchmarks with a 1-2 TB memory footprint compared to other state-of-the-art telemetry techniques. △ Less

Submitted 29 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.11577 [pdf, other]

Studying the Effects of Sex-related Differences on Brain Age Prediction using brain MR Imaging

Authors: Mahsa Dibaji, Neha Gianchandani, Akhil Nair, Mansi Singhal, Roberto Souza, Mariana Bento

Abstract: While utilizing machine learning models, one of the most crucial aspects is how bias and fairness affect model outcomes for diverse demographics. This becomes especially relevant in the context of machine learning for medical imaging applications as these models are increasingly being used for diagnosis and treatment planning. In this paper, we study biases related to sex when developing a machine… ▽ More While utilizing machine learning models, one of the most crucial aspects is how bias and fairness affect model outcomes for diverse demographics. This becomes especially relevant in the context of machine learning for medical imaging applications as these models are increasingly being used for diagnosis and treatment planning. In this paper, we study biases related to sex when developing a machine learning model based on brain magnetic resonance images (MRI). We investigate the effects of sex by performing brain age prediction considering different experimental designs: model trained using only female subjects, only male subjects and a balanced dataset. We also perform evaluation on multiple MRI datasets (Calgary-Campinas(CC359) and CamCAN) to assess the generalization capability of the proposed models. We found disparities in the performance of brain age prediction models when trained on distinct sex subgroups and datasets, in both final predictions and decision making (assessed using interpretability models). Our results demonstrated variations in model generalizability across sex-specific subgroups, suggesting potential biases in models trained on unbalanced datasets. This underlines the critical role of careful experimental design in generating fair and reliable outcomes. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2308.00009 [pdf]

A 3D deep learning classifier and its explainability when assessing coronary artery disease

Authors: Wing Keung Cheung, Jeremy Kalindjian, Robert Bell, Arjun Nair, Leon J. Menezes, Riyaz Patel, Simon Wan, Kacy Chou, Jiahang Chen, Ryo Torii, Rhodri H. Davies, James C. Moon, Daniel C. Alexander, Joseph Jacob

Abstract: Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Further… ▽ More Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Furthermore, we link the 3D CAD classification to a 2D two-class semantic segmentation for improved explainability and accurate abnormality localisation. △ Less

Submitted 29 July, 2023; originally announced August 2023.

arXiv:2307.05435 [pdf, other]

One-Versus-Others Attention: Scalable Multimodal Integration for Clinical Data

Authors: Michal Golovanevsky, Eva Schiller, Akira Nair, Ritambhara Singh, Carsten Eickhoff

Abstract: Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such… ▽ More Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs. △ Less

Submitted 4 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

arXiv:2305.16820 [pdf, other]

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Authors: Pranav Ajit Nair, Sukomal Pal, Pradeepika Verma

Abstract: Domain generalization is hitherto an underexplored area applied in abstractive summarization. Moreover, most existing works on domain generalization have sophisticated training algorithms. In this paper, we propose a lightweight, weight averaging based, Domain Aligned Prefix Averaging approach to domain generalization for abstractive summarization. Given a number of source domains, our method firs… ▽ More Domain generalization is hitherto an underexplored area applied in abstractive summarization. Moreover, most existing works on domain generalization have sophisticated training algorithms. In this paper, we propose a lightweight, weight averaging based, Domain Aligned Prefix Averaging approach to domain generalization for abstractive summarization. Given a number of source domains, our method first trains a prefix for each one of them. These source prefixes generate summaries for a small number of target domain documents. The similarity of the generated summaries to their corresponding documents is used for calculating weights required to average source prefixes. In DAPA, prefix tuning allows for lightweight finetuning, and weight averaging allows for the computationally efficient addition of new source domains. When evaluated on four diverse summarization domains, DAPA shows comparable or better performance against the baselines, demonstrating the effectiveness of its prefix averaging scheme. △ Less

Submitted 29 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 13 pages, Accepted to ACL 2023 Findings

arXiv:2305.15108 [pdf, other]

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Authors: Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs)… ▽ More In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary more attuned to the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted as a short paper to ACL 2023 findings

arXiv:2304.11816 [pdf, other]

Multiplierless In-filter Computing for tinyML Platforms

Authors: Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan Singh Thakur

Abstract: Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter… ▽ More Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter acoustic classification using Margin Propagation (MP) approximation used in low-power edge devices deployable in remote areas with limited connectivity. The entire design of this classification framework is based on template-based kernel machine, which include feature extraction and inference, and uses basic primitives like addition/subtraction, shift, and comparator operations, for hardware implementation. Unlike full precision training methods for traditional classification, we use MP-based approximation for training, including backpropagation mitigating approximation errors. The proposed framework is general enough for acoustic classification. However, we demonstrate the hardware friendliness of this framework by implementing a parallel Finite Impulse Response (FIR) filter bank in a kernel machine classifier optimized for a Field Programmable Gate Array (FPGA). The FIR filter acts as the feature extractor and non-linear kernel for the kernel machine implemented using MP approximation and a downsampling method to reduce the order of the filters. The FPGA implementation on Spartan 7 shows that the MP-approximated in-filter kernel machine is more efficient than traditional classification frameworks with just less than 1K slices. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.16865 [pdf, other]

Legged Robots for Object Manipulation: A Review

Authors: Yifeng Gong, Ge Sun, Aditya Nair, Aditya Bidwai, Raghuram CS, John Grezmak, Guillaume Sartoretti, Kathryn A. Daltorio

Abstract: Legged robots can have a unique role in manipulating objects in dynamic, human-centric, or otherwise inaccessible environments. Although most legged robotics research to date typically focuses on traversing these challenging environments, many legged platform demonstrations have also included "moving an object" as a way of doing tangible work. Legged robots can be designed to manipulate a particul… ▽ More Legged robots can have a unique role in manipulating objects in dynamic, human-centric, or otherwise inaccessible environments. Although most legged robotics research to date typically focuses on traversing these challenging environments, many legged platform demonstrations have also included "moving an object" as a way of doing tangible work. Legged robots can be designed to manipulate a particular type of object (e.g., a cardboard box, a soccer ball, or a larger piece of furniture), by themselves or collaboratively. The objective of this review is to collect and learn from these examples, to both organize the work done so far in the community and highlight interesting open avenues for future work. This review categorizes existing works into four main manipulation methods: object interactions without grasping, manipulation with walking legs, dedicated non-locomotive arms, and legged teams. Each method has different design and autonomy features, which are illustrated by available examples in the literature. Based on a few simplifying assumptions, we further provide quantitative comparisons for the range of possible relative sizes of the manipulated object with respect to the robot. Taken together, these examples suggest new directions for research in legged robot manipulation, such as multifunctional limbs, terrain modeling, or learning-based control, to support a number of new deployments in challenging indoor/outdoor scenarios in warehouses/construction sites, preserved natural areas, and especially for home robotics. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Preprint of the paper submitted to Frontiers in Mechanical Engineering

arXiv:2303.13284 [pdf, other]

GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering

Authors: Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces correspondin… ▽ More In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata. △ Less

Submitted 28 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 16 pages single column format accepted at ESWC 2023 research track

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2302.05813 [pdf, other]

doi 10.1145/3597066.3597090

Lazard-style CAD and Equational Constraints

Authors: James H. Davenport, Akshar S. Nair, Gregory K. Sankaran, Ali K. Uncu

Abstract: McCallum-style Cylindrical Algebra Decomposition (CAD) is a major improvement on the original Collins version, and has had many subsequent advances, notably for total or partial equational constraints. But it suffers from a problem with nullification. The recently-justified Lazard-style CAD does not have this problem. However, transporting the equational constraints work to Lazard-style does reint… ▽ More McCallum-style Cylindrical Algebra Decomposition (CAD) is a major improvement on the original Collins version, and has had many subsequent advances, notably for total or partial equational constraints. But it suffers from a problem with nullification. The recently-justified Lazard-style CAD does not have this problem. However, transporting the equational constraints work to Lazard-style does reintroduce nullification issues. This paper explains the problem, and the solutions to it, based on the second author's Ph.D. thesis and the Brown--McCallum improvement to Lazard. With a single equational constraint, we can gain the same improvements in Lazard-style as in McCallum-style CAD . Moreover, our approach does not fail where McCallum would due to nullification. Unsurprisingly, it does not achieve the same level of improvement as it does in the non-nullified cases. We also consider the case of multiple equational constraints. △ Less

Submitted 7 December, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: 9 pages

MSC Class: 68W30 ACM Class: I.1.2

Journal ref: Proceedings of ISSAC'23, 2023

arXiv:2302.01483 [pdf, other]

SPADE: Self-supervised Pretraining for Acoustic DisEntanglement

Authors: John Harvill, Jarred Barber, Arun Nair, Ramin Pishehvar

Abstract: Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identi… ▽ More Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech. Previous self-supervised work in the speech domain has disentangled multiple attributes of speech such as linguistic content, speaker identity, and rhythm. In this work, we introduce a self-supervised approach to disentangle room acoustics from speech and use the acoustic representation on the downstream task of device arbitration. Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce, indicating that our pretraining scheme learns to encode room acoustic information while remaining invariant to other attributes of the speech signal. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2212.12652 [pdf, other]

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension

Authors: Borui Wang, Chengcheng Feng, Arjun Nair, Madelyn Mao, Jai Desai, Asli Celikyilmaz, Haoran Li, Yashar Mehdad, Dragomir Radev

Abstract: Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task… ▽ More Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing, but no previous work has explored the possibility of whether abstractive dialogue summarization can also be used as a means to boost an NLP system's performance on other important dialogue comprehension tasks. In this paper, we propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization - that can help pre-trained language models to better understand dialogues and improve their performance on important dialogue comprehension tasks. We further collect human annotations of STRUDEL summaries over 400 dialogues and introduce a new STRUDEL dialogue comprehension modeling framework that integrates STRUDEL into a graph-neural-network-based dialogue reasoning module over transformer encoder language models to improve their dialogue comprehension abilities. In our empirical experiments on two important downstream dialogue comprehension tasks - dialogue question answering and dialogue response prediction - we show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: EMNLP 2022

arXiv:2210.15206 [pdf, other]

Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision

Authors: Ashvin Nair, Brian Zhu, Gokul Narayanan, Eugen Solowjow, Sergey Levine

Abstract: Learning-based methods in robotics hold the promise of generalization, but what can be done if a learned policy does not generalize to a new situation? In principle, if an agent can at least evaluate its own success (i.e., with a reward classifier that generalizes well even when the policy does not), it could actively practice the task and finetune the policy in this situation. We study this probl… ▽ More Learning-based methods in robotics hold the promise of generalization, but what can be done if a learned policy does not generalize to a new situation? In principle, if an agent can at least evaluate its own success (i.e., with a reward classifier that generalizes well even when the policy does not), it could actively practice the task and finetune the policy in this situation. We study this problem in the setting of industrial insertion tasks, such as inserting connectors in sockets and setting screws. Existing algorithms rely on precise localization of the connector or socket and carefully managed physical setups, such as assembly lines, to succeed at the task. But in unstructured environments such as homes or even some industrial settings, robots cannot rely on precise localization and may be tasked with previously unseen connectors. Offline reinforcement learning on a variety of connector insertion tasks is a potential solution, but what if the robot is tasked with inserting previously unseen connector? In such a scenario, we will still need methods that can robustly solve such tasks with online practice. One of the main observations we make in this work is that, with a suitable representation learning and domain generalization approach, it can be significantly easier for the reward function to generalize to a new but structurally similar task (e.g., inserting a new type of connector) than for the policy. This means that a learned reward function can be used to facilitate the finetuning of the robot's policy in situations where the policy fails to generalize in zero shot, but the reward function generalizes successfully. We show that such an approach can be instantiated in the real world, pretrained on 50 different connectors, and successfully finetuned to new connectors via the learned reward function. Videos can be viewed at https://sites.google.com/view/learningonthejob △ Less

Submitted 27 February, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: 10 pages. To be presented at ICRA 2023

arXiv:2210.09366 [pdf]

Bridging the Gap between Artificial Intelligence and Artificial General Intelligence: A Ten Commandment Framework for Human-Like Intelligence

Authors: Ananta Nair, Farnoush Banaei-Kashani

Abstract: The field of artificial intelligence has seen explosive growth and exponential success. The last phase of development showcased deep learnings ability to solve a variety of difficult problems across a multitude of domains. Many of these networks met and exceeded human benchmarks by becoming experts in the domains in which they are trained. Though the successes of artificial intelligence have begun… ▽ More The field of artificial intelligence has seen explosive growth and exponential success. The last phase of development showcased deep learnings ability to solve a variety of difficult problems across a multitude of domains. Many of these networks met and exceeded human benchmarks by becoming experts in the domains in which they are trained. Though the successes of artificial intelligence have begun to overshadow its failures, there is still much that separates current artificial intelligence tools from becoming the exceptional general learners that humans are. In this paper, we identify the ten commandments upon which human intelligence is systematically and hierarchically built. We believe these commandments work collectively to serve as the essential ingredients that lead to the emergence of higher-order cognition and intelligence. This paper discusses a computational framework that could house these ten commandments and suggests new architectural modifications that could lead to the development of smarter, more explainable, and generalizable artificial systems inspired by a neuromorphic approach. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.06601 [pdf, other]

Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Gengchen Yan, Sergey Levine

Abstract: The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement lear… ▽ More The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering. △ Less

Submitted 18 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: CoRL 2022

arXiv:2210.04852 [pdf, other]

Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis

Authors: Zifan Xu, Anirudh Nair, Xuesu Xiao, Peter Stone

Abstract: Machine learning approaches have recently enabled autonomous navigation for mobile robots in a data-driven manner. Since most existing learning-based navigation systems are trained with data generated in artificially created training environments, during real-world deployment at scale, it is inevitable that robots will encounter unseen scenarios, which are out of the training distribution and ther… ▽ More Machine learning approaches have recently enabled autonomous navigation for mobile robots in a data-driven manner. Since most existing learning-based navigation systems are trained with data generated in artificially created training environments, during real-world deployment at scale, it is inevitable that robots will encounter unseen scenarios, which are out of the training distribution and therefore lead to poor real-world performance. On the other hand, directly training in the real world is generally unsafe and inefficient. To address this issue, we introduce Self-supervised Environment Synthesis (SES), in which, after real-world deployment with safety and efficiency requirements, autonomous mobile robots can utilize experience from the real-world deployment, reconstruct navigation scenarios, and synthesize representative training environments in simulation. Training in these synthesized environments leads to improved future performance in the real world. The effectiveness of SES at synthesizing representative simulation environments and improving real-world navigation performance is evaluated via a large-scale deployment in a high-fidelity, realistic simulator and a small-scale deployment on a physical robot. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2210.04839 [pdf, other]

Benchmarking Reinforcement Learning Techniques for Autonomous Navigation

Authors: Zifan Xu, Bo Liu, Xuesu Xiao, Anirudh Nair, Peter Stone

Abstract: Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackl… ▽ More Deep reinforcement learning (RL) has brought many successes for autonomous robot navigation. However, there still exists important limitations that prevent real-world use of RL-based navigation systems. For example, most learning approaches lack safety guarantees; and learned navigation systems may not generalize well to unseen environments. Despite a variety of recent learning techniques to tackle these challenges in general, a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what learning methods to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation. In this paper, we identify four major desiderata of applying deep RL approaches for autonomous navigation: (D1) reasoning under uncertainty, (D2) safety, (D3) learning from limited trial-and-error data, and (D4) generalization to diverse and novel environments. Then, we explore four major classes of learning techniques with the purpose of achieving one or more of the four desiderata: memory-based neural network architectures (D1), safe RL (D2), model-based RL (D2, D3), and domain randomization (D4). By deploying these learning techniques in a new open-source large-scale navigation benchmark and real-world environments, we perform a comprehensive study aimed at establishing to what extent can these techniques achieve these desiderata for RL-based navigation systems. △ Less

Submitted 27 June, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

arXiv:2209.05201 [pdf, other]

Proof-Stitch: Proof Combination for Divide and Conquer SAT Solvers

Authors: Abhishek Nair, Saranyu Chattopadhyay, Haoze Wu, Alex Ozdemir, Clark Barrett

Abstract: With the increasing availability of parallel computing power, there is a growing focus on parallelizing algorithms for important automated reasoning problems such as Boolean satisfiability (SAT). Divide-and-Conquer (D&C) is a popular parallel SAT solving paradigm that partitions SAT instances into independent sub-problems which are then solved in parallel. For unsatisfiable instances, state-of-the… ▽ More With the increasing availability of parallel computing power, there is a growing focus on parallelizing algorithms for important automated reasoning problems such as Boolean satisfiability (SAT). Divide-and-Conquer (D&C) is a popular parallel SAT solving paradigm that partitions SAT instances into independent sub-problems which are then solved in parallel. For unsatisfiable instances, state-of-the-art D&C solvers generate DRAT refutations for each sub-problem. However, they do not generate a single refutation for the original instance. To close this gap, we present Proof-Stitch, a procedure for combining refutations of different sub-problems into a single refutation for the original instance. We prove the correctness of the procedure and propose optimizations to reduce the size and checking time of the combined refutations by invoking existing trimming tools in the proof-combination process. We also provide an extensible implementation of the proposed technique. Experiments on instances from last year's SAT competition show that the optimized refutations are checkable up to seven times faster than unoptimized refutations. △ Less

Submitted 4 September, 2022; originally announced September 2022.

Comments: 6 pages

arXiv:2206.15432 [pdf, ps, other]

Challenges and Opportunities in Multi-device Speech Processing

Authors: Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang

Abstract: We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple smart devices". We also identify the datasets neede… ▽ More We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple smart devices". We also identify the datasets needed to support these research areas. Based on the review and our research experience in the multi-device domain, we conclude with an outlook on the future evolution △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted for INTERSPEECH 2022

arXiv:2206.14947 [pdf]

Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise

Authors: Tekin Gunasar, Alexandra Rekesh, Atul Nair, Penelope King, Anastasiya Markova, Jiaqi Zhang, Isabel Tate

Abstract: Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR param… ▽ More Electromyography signals can be used as training data by machine learning models to classify various gestures. We seek to produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience while comparing the effect of our feature extraction results on model accuracy to other more conventional methods such as the use of AR parameters on a sliding window across the channels of a signal. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting where EMG classification is being conducted, as opposed to more complicated methods such as the use of the Fourier Transform. To augment our limited training data, we used a standard technique, known as jitter, where random noise is added to each observation in a channel wise manner. Once all datasets were produced using the above methods, we performed a grid search with Random Forest and XGBoost to ultimately create a high accuracy model. For human computer interface purposes, high accuracy classification of EMG signals is of particular importance to their functioning and given the difficulty and cost of amassing any sort of biomedical data in a high volume, it is valuable to have techniques that can work with a low amount of high-quality samples with less expensive feature extraction methods that can reliably be carried out in an online application. △ Less

Submitted 29 June, 2022; originally announced June 2022.

arXiv:2205.08129 [pdf, other]

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Sergey Levine

Abstract: General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratc… ▽ More General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratch. In this paper, we propose Planning to Practice (PTP), a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve. Our approach is based on two key ideas. First, we decompose the goal-reaching problem hierarchically, with a high-level planner that sets intermediate subgoals using conditional subgoal generators in the latent space for a low-level model-free policy. Second, we propose a hybrid approach which first pre-trains both the conditional subgoal generator and the policy on previously collected data through offline reinforcement learning, and then fine-tunes the policy via online exploration. This fine-tuning process is itself facilitated by the planned subgoals, which breaks down the original target task into short-horizon goal-reaching tasks that are significantly easier to learn. We conduct experiments in both the simulation and real world, in which the policy is pre-trained on demonstrations of short primitive behaviors and fine-tuned for temporally extended tasks that are unseen in the offline data. Our experimental results show that PTP can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks. △ Less

Submitted 18 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.03530 [pdf, other]

doi 10.24963/ijcai.2022/711

Gigs with Guarantees: Achieving Fair Wage for Food Delivery Workers

Authors: Ashish Nair, Rahul Yadav, Anjali Gupta, Abhijnan Chakraborty, Sayan Ranu, Amitabha Bagchi

Abstract: With the increasing popularity of food delivery platforms, it has become pertinent to look into the working conditions of the 'gig' workers in these platforms, especially providing them fair wages, reasonable working hours, and transparency on work availability. However, any solution to these problems must not degrade customer experience and be cost-effective to ensure that platforms are willing t… ▽ More With the increasing popularity of food delivery platforms, it has become pertinent to look into the working conditions of the 'gig' workers in these platforms, especially providing them fair wages, reasonable working hours, and transparency on work availability. However, any solution to these problems must not degrade customer experience and be cost-effective to ensure that platforms are willing to adopt them. We propose WORK4FOOD, which provides income guarantees to delivery agents, while minimizing platform costs and ensuring customer satisfaction. WORK4FOOD ensures that the income guarantees are met in such a way that it does not lead to increased working hours or degrade environmental impact. To incorporate these objectives, WORK4FOOD balances supply and demand by controlling the number of agents in the system and providing dynamic payment guarantees to agents based on factors such as agent location, ratings, etc. We evaluate WORK4FOOD on a real-world dataset from a leading food delivery platform and establish its advantages over the state of the art in terms of the multi-dimensional objectives at hand. △ Less

Submitted 27 June, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

Comments: Appeared in International Joint Conference on Artificial Intelligence (IJCAI) 2022

arXiv:2204.13060 [pdf, other]

Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

Authors: Philippe Hansen-Estruch, Amy Zhang, Ashvin Nair, Patrick Yin, Sergey Levine

Abstract: Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent… ▽ More Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent with an example of an analogous task, and have the agent then infer what the goal should be for its current state. We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/view/gc-bisimulation. △ Less

Submitted 16 May, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: ICML 2022. 20 Pages, 15 Figures, 4 Tables. Website at https://sites.google.com/view/gc-bisimulation

MSC Class: 68T07 ACM Class: I.2.8

arXiv:2204.12793 [pdf, other]

doi 10.1145/3477495.3531841

Modern Baselines for SPARQL Semantic Parsing

Authors: Debayan Banerjee, Pranav Ajit Nair, Jivat Neet Kaur, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not… ▽ More In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing. △ Less

Submitted 14 September, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: 5 pages, short paper, SIGIR 2022

arXiv:2204.08580 [pdf, other]

Automatic Hardware Trojan Insertion using Machine Learning

Authors: Jonathan Cruz, Pravin Gaikwad, Abhishek Nair, Prabuddha Chakraborty, Swarup Bhunia

Abstract: Due to the current horizontal business model that promotes increasing reliance on untrusted third-party Intellectual Properties (IPs), CAD tools, and design facilities, hardware Trojan attacks have become a serious threat to the semiconductor industry. Development of effective countermeasures against hardware Trojan attacks requires: (1) fast and reliable exploration of the viable Trojan attack sp… ▽ More Due to the current horizontal business model that promotes increasing reliance on untrusted third-party Intellectual Properties (IPs), CAD tools, and design facilities, hardware Trojan attacks have become a serious threat to the semiconductor industry. Development of effective countermeasures against hardware Trojan attacks requires: (1) fast and reliable exploration of the viable Trojan attack space for a given design and (2) a suite of high-quality Trojan-inserted benchmarks that meet specific standards. The latter has become essential for the development and evaluation of design/verification solutions to achieve quantifiable assurance against Trojan attacks. While existing static benchmarks provide a baseline for comparing different countermeasures, they only enumerate a limited number of handcrafted Trojans from the complete Trojan design space. To accomplish these dual objectives, in this paper, we present MIMIC, a novel AI-guided framework for automatic Trojan insertion, which can create a large population of valid Trojans for a given design by mimicking the properties of a small set of known Trojans. While there exist tools to automatically insert Trojan instances using fixed Trojan templates, they cannot analyze known Trojan attacks for creating new instances that accurately capture the threat model. MIMIC works in two major steps: (1) it analyzes structural and functional features of existing Trojan populations in a multi-dimensional space to train machine learning models and generate a large number of "virtual Trojans" of the given design, (2) next, it binds them into the design by matching their functional/structural properties with suitable nets of the internal logic structure. We have developed a complete tool flow for MIMIC, extensively evaluated the framework by exploring several use-cases, and quantified its effectiveness to demonstrate highly promising results. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2203.16606 [pdf, other]

Enhancing Cancer Prediction in Challenging Screen-Detected Incident Lung Nodules Using Time-Series Deep Learning

Authors: Shahab Aslani, Pavan Alluri, Eyjolfur Gudmundsson, Edward Chandy, John McCabe, Anand Devaraj, Carolyn Horst, Sam M Janes, Rahul Chakkara, Arjun Nair, Daniel C Alexander, SUMMIT consortium, Joseph Jacob

Abstract: Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality by detecting cancerous lung nodules at an earlier stage. Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms. Howe… ▽ More Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality by detecting cancerous lung nodules at an earlier stage. Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms. However most existing algorithms: a) have primarily assessed single time-point CT data alone thereby failing to utilize the inherent advantages contained within longitudinal imaging datasets; b) have not integrated into computer models pertinent clinical data that might inform risk prediction; c) have not assessed algorithm performance on the spectrum of nodules that are most challenging for radiologists to interpret and where assistance from analytic tools would be most beneficial. Here we show the performance of our time-series deep learning model (DeepCAD-NLM-L) which integrates multi-model information across three longitudinal data domains: nodule-specific, lung-specific, and clinical demographic data. We compared our time-series deep learning model to a) radiologist performance on CTs from the National Lung Screening Trial enriched with the most challenging nodules for diagnosis; b) a nodule management algorithm from a North London LCS study (SUMMIT). Our model demonstrated comparable and complementary performance to radiologists when interpreting challenging lung nodules and showed improved performance (AUC=88\%) against models utilizing single time-point data only. The results emphasise the importance of time-series, multi-modal analysis when interpreting malignancy risk in LCS. △ Less

Submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.15041 [pdf, other]

Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of Demonstrations for Social Navigation

Authors: Haresh Karnan, Anirudh Nair, Xuesu Xiao, Garrett Warnell, Soeren Pirk, Alexander Toshev, Justin Hart, Joydeep Biswas, Peter Stone

Abstract: Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans. With the emergence of autonomously navigating mobile robots in human populated environments (e.g., domestic service robots in homes and restaurants and food delivery robots on public sidewalks), incorporating socially… ▽ More Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a 'socially compliant' manner in the presence of other intelligent agents such as humans. With the emergence of autonomously navigating mobile robots in human populated environments (e.g., domestic service robots in homes and restaurants and food delivery robots on public sidewalks), incorporating socially compliant navigation behaviors on these robots becomes critical to ensuring safe and comfortable human robot coexistence. To address this challenge, imitation learning is a promising framework, since it is easier for humans to demonstrate the task of social navigation rather than to formulate reward functions that accurately capture the complex multi objective setting of social navigation. The use of imitation learning and inverse reinforcement learning to social navigation for mobile robots, however, is currently hindered by a lack of large scale datasets that capture socially compliant robot navigation demonstrations in the wild. To fill this gap, we introduce Socially CompliAnt Navigation Dataset (SCAND) a large scale, first person view dataset of socially compliant navigation demonstrations. Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations that comprises multi modal data streams including 3D lidar, joystick commands, odometry, visual and inertial information, collected on two morphologically different mobile robots a Boston Dynamics Spot and a Clearpath Jackal by four different human demonstrators in both indoor and outdoor environments. We additionally perform preliminary analysis and validation through real world robot experiments and show that navigation policies learned by imitation learning on SCAND generate socially compliant behaviors △ Less

Submitted 8 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Journal ref: Robotics and Automation Letters (RA-L) 2022

arXiv:2203.08849 [pdf, other]

doi 10.1609/aaai.v36i11.21447

FairFoody: Bringing in Fairness in Food Delivery

Authors: Anjali Gupta, Rahul Yadav, Ashish Nair, Abhijnan Chakraborty, Sayan Ranu, Amitabha Bagchi

Abstract: Along with the rapid growth and rise to prominence of food delivery platforms, concerns have also risen about the terms of employment of the gig workers underpinning this growth. Our analysis on data derived from a real-world food delivery platform across three large cities from India show that there is significant inequality in the money delivery agents earn. In this paper, we formulate the probl… ▽ More Along with the rapid growth and rise to prominence of food delivery platforms, concerns have also risen about the terms of employment of the gig workers underpinning this growth. Our analysis on data derived from a real-world food delivery platform across three large cities from India show that there is significant inequality in the money delivery agents earn. In this paper, we formulate the problem of fair income distribution among agents while also ensuring timely food delivery. We establish that the problem is not only NP-hard but also inapproximable in polynomial time. We overcome this computational bottleneck through a novel matching algorithm called FairFoody. Extensive experiments over real-world food delivery datasets show FairFoody imparts up to 10 times improvement in equitable income distribution when compared to baseline strategies, while also ensuring minimal impact on customer experience. △ Less

Submitted 25 April, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

Comments: Appeared in Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) 2022

arXiv:2112.08713 [pdf, other]

doi 10.18653/v1/2022.naacl-main.415

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning

Authors: Xiangru Tang, Arjun Nair, Borui Wang, Bingyao Wang, Jai Desai, Aaron Wade, Haoran Li, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev

Abstract: Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization. Although significant progress has been achieved by using pre-trained models, substantial amounts of hallucinated content are found during the human evaluation. Pre-trained models are most commonly fine-tuned with cross-entropy loss for text summarization, which may not be… ▽ More Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization. Although significant progress has been achieved by using pre-trained models, substantial amounts of hallucinated content are found during the human evaluation. Pre-trained models are most commonly fine-tuned with cross-entropy loss for text summarization, which may not be an optimal strategy. In this work, we provide a typology of factual errors with annotation data to highlight the types of errors and move away from a binary understanding of factuality. We further propose a training strategy that improves the factual consistency and overall quality of summaries via a novel contrastive fine-tuning, called ConFiT. Based on our linguistically-informed typology of errors, we design different modular objectives that each target a specific type. Specifically, we utilize hard negative samples with errors to reduce the generation of factual inconsistency. In order to capture the key information between speakers, we also design a dialogue-specific loss. Using human evaluation and automatic faithfulness metrics, we show that our model significantly reduces all kinds of factual errors on the dialogue summarization, SAMSum corpus. Moreover, our model could be generalized to the meeting summarization, AMI corpus, and it produces significantly higher scores than most of the baselines on both datasets regarding word-overlap metrics. △ Less

Submitted 9 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Journal ref: NAACL 2022

arXiv:2111.12548 [pdf, other]

AutoDC: Automated data-centric processing

Authors: Zac Yung-Chun Liu, Shoumik Roychowdhury, Scott Tarlow, Akash Nair, Shweta Badhe, Tejas Shah

Abstract: AutoML (automated machine learning) has been extensively developed in the past few years for the model-centric approach. As for the data-centric approach, the processes to improve the dataset, such as fixing incorrect labels, adding examples that represent edge cases, and applying data augmentation, are still very artisanal and expensive. Here we develop an automated data-centric tool (AutoDC), si… ▽ More AutoML (automated machine learning) has been extensively developed in the past few years for the model-centric approach. As for the data-centric approach, the processes to improve the dataset, such as fixing incorrect labels, adding examples that represent edge cases, and applying data augmentation, are still very artisanal and expensive. Here we develop an automated data-centric tool (AutoDC), similar to the purpose of AutoML, aims to speed up the dataset improvement processes. In our preliminary tests on 3 open source image classification datasets, AutoDC is estimated to reduce roughly 80% of the manual time for data improvement tasks, at the same time, improve the model accuracy by 10-15% with the fixed ML code. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: NeurIPS 2021- Data-Centric AI (DCAI) workshop

arXiv:2111.11998 [pdf, other]

Appliance Level Short-term Load Forecasting via Recurrent Neural Network

Authors: Yuqi Zhou, Arun Sukumaran Nair, David Ganger, Abhinandan Tripathi, Chaitanya Baone, Hao Zhu

Abstract: Accurate load forecasting is critical for electricity market operations and other real-time decision-making tasks in power systems. This paper considers the short-term load forecasting (STLF) problem for residential customers within a community. Existing STLF work mainly focuses on forecasting the aggregated load for either a feeder system or a single customer, but few efforts have been made on fo… ▽ More Accurate load forecasting is critical for electricity market operations and other real-time decision-making tasks in power systems. This paper considers the short-term load forecasting (STLF) problem for residential customers within a community. Existing STLF work mainly focuses on forecasting the aggregated load for either a feeder system or a single customer, but few efforts have been made on forecasting the load at individual appliance level. In this work, we present an STLF algorithm for efficiently predicting the power consumption of individual electrical appliances. The proposed method builds upon a powerful recurrent neural network (RNN) architecture in deep learning, termed as long short-term memory (LSTM). As each appliance has uniquely repetitive consumption patterns, the patterns of prediction error will be tracked such that past prediction errors can be used for improving the final prediction performance. Numerical tests on real-world load datasets demonstrate the improvement of the proposed method over existing LSTM-based method and other benchmark approaches. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2110.06169 [pdf, other]

Offline Reinforcement Learning with Implicit Q-Learning

Authors: Ilya Kostrikov, Ashvin Nair, Sergey Levine

Abstract: Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of un… ▽ More Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the policy improvement step implicitly by treating the state value function as a random variable, with randomness determined by the action (while still integrating over the dynamics to avoid excessive optimism), and then taking a state conditional upper expectile of this random variable to estimate the value of the best actions in that state. This leverages the generalization capacity of the function approximator to estimate the value of the best available action at a given state without ever directly querying a Q-function with this unseen action. Our algorithm alternates between fitting this upper expectile value function and backing it up into a Q-function. Then, we extract the policy via advantage-weighted behavioral cloning. We dub our method implicit Q-learning (IQL). IQL demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline reinforcement learning. We also demonstrate that IQL achieves strong performance fine-tuning using online interaction after offline initialization. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.05688 [pdf]

Inclusive Design: Accessibility Settings for People with Cognitive Disabilities

Authors: Trae Waggoner, Julia Ann Jose, Ashwin Nair, Sudarsan Manikandan

Abstract: The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require mor… ▽ More The advancement of technology has progressed faster than any other field in the world and with the development of these new technologies, it is important to make sure that these tools can be used by everyone, including people with disabilities. Accessibility options in computing devices help ensure that everyone has the same access to advanced technologies. Unfortunately, for those who require more unique and sometimes challenging accommodations, such as people with Amyotrophic lateral sclerosis ( ALS), the most commonly used accessibility features are simply not enough. While assistive technology for those with ALS does exist, it requires multiple peripheral devices that can become quite expensive collectively. The purpose of this paper is to suggest a more affordable and readily available option for ALS assistive technology that can be implemented on a smartphone or tablet. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2110.04286 [pdf, other]

Is MC Dropout Bayesian?

Authors: Loic Le Folgoc, Vasileios Baltatzis, Sujal Desai, Anand Devaraj, Sam Ellis, Octavio E. Martinez Manzanera, Arjun Nair, Huaqi Qiu, Julia Schnabel, Ben Glocker

Abstract: MC Dropout is a mainstream "free lunch" method in medical imaging for approximate Bayesian computations (ABC). Its appeal is to solve out-of-the-box the daunting task of ABC and uncertainty quantification in Neural Networks (NNs); to fall within the variational inference (VI) framework; and to propose a highly multimodal, faithful predictive posterior. We question the properties of MC Dropout for… ▽ More MC Dropout is a mainstream "free lunch" method in medical imaging for approximate Bayesian computations (ABC). Its appeal is to solve out-of-the-box the daunting task of ABC and uncertainty quantification in Neural Networks (NNs); to fall within the variational inference (VI) framework; and to propose a highly multimodal, faithful predictive posterior. We question the properties of MC Dropout for approximate inference, as in fact MC Dropout changes the Bayesian model; its predictive posterior assigns $0$ probability to the true model on closed-form benchmarks; the multimodality of its predictive posterior is not a property of the true predictive posterior but a design artefact. To address the need for VI on arbitrary models, we share a generic VI engine within the pytorch framework. The code includes a carefully designed implementation of structured (diagonal plus low-rank) multivariate normal variational families, and mixtures thereof. It is intended as a go-to no-free-lunch approach, addressing shortcomings of mean-field VI with an adjustable trade-off between expressivity and computational complexity. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2110.02571 [pdf, other]

Simulation of Derivatives Post-Trade Services using an Authoritative Data Store and the ISDA Common Domain Model

Authors: Vikram A. Bakshi, Aishwarya Nair, Lee Braine

Abstract: In this paper, we present a summary of the design and implementation of a simulation of post-trade services for interest rate swaps, from execution to maturity. We use an authoritative data store (ADS) and the International Swaps and Derivatives Association (ISDA) Common Domain Model (CDM) to simulate a potential future architecture. We start by providing a brief overview of the CDM and the lifecy… ▽ More In this paper, we present a summary of the design and implementation of a simulation of post-trade services for interest rate swaps, from execution to maturity. We use an authoritative data store (ADS) and the International Swaps and Derivatives Association (ISDA) Common Domain Model (CDM) to simulate a potential future architecture. We start by providing a brief overview of the CDM and the lifecycle of an interest rate swap. We then compare our simulated future state architecture with a typical current state architecture. Next, we present the key requirements of the simulated system, several suitable design patterns, and a summary of the implementation. The simulation uses the CDM to address the industry problems of inconsistent processes and inconsistent data, and an authoritative data store to address the industry problem of duplicated data. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 16 pages, 4 figures

arXiv:2109.06171 [pdf, other]

doi 10.1109/JIOT.2021.3109739

In-filter Computing For Designing Ultra-light Acoustic Pattern Recognizers

Authors: Abhishek Ramdas Nair, Shantanu Chakrabartty, Chetan Singh Thakur

Abstract: We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of… ▽ More We present a novel in-filter computing framework that can be used for designing ultra-light acoustic classifiers for use in smart internet-of-things (IoTs). Unlike a conventional acoustic pattern recognizer, where the feature extraction and classification are designed independently, the proposed architecture integrates the convolution and nonlinear filtering operations directly into the kernels of a Support Vector Machine (SVM). The result of this integration is a template-based SVM whose memory and computational footprint (training and inference) is light enough to be implemented on an FPGA-based IoT platform. While the proposed in-filter computing framework is general enough, in this paper, we demonstrate this concept using a Cascade of Asymmetric Resonator with Inner Hair Cells (CAR-IHC) based acoustic feature extraction algorithm. The complete system has been optimized using time-multiplexing and parallel-pipeline techniques for a Xilinx Spartan 7 series Field Programmable Gate Array (FPGA). We show that the system can achieve robust classification performance on benchmark sound recognition tasks using only ~ 1.5k Look-Up Tables (LUTs) and ~ 2.8k Flip-Flops (FFs), a significant improvement over other approaches. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: in IEEE Internet of Things Journal

arXiv:2108.05494 [pdf]

A Mathematical Approach to Constraining Neural Abstraction and the Mechanisms Needed to Scale to Higher-Order Cognition

Authors: Ananta Nair

Abstract: Artificial intelligence has made great strides in the last decade but still falls short of the human brain, the best-known example of intelligence. Not much is known of the neural processes that allow the brain to make the leap to achieve so much from so little beyond its ability to create knowledge structures that can be flexibly and dynamically combined, recombined, and applied in new and novel… ▽ More Artificial intelligence has made great strides in the last decade but still falls short of the human brain, the best-known example of intelligence. Not much is known of the neural processes that allow the brain to make the leap to achieve so much from so little beyond its ability to create knowledge structures that can be flexibly and dynamically combined, recombined, and applied in new and novel ways. This paper proposes a mathematical approach using graph theory and spectral graph theory, to hypothesize how to constrain these neural clusters of information based on eigen-relationships. This same hypothesis is hierarchically applied to scale up from the smallest to the largest clusters of knowledge that eventually lead to model building and reasoning. △ Less

Submitted 11 August, 2021; originally announced August 2021.

arXiv:2108.05386 [pdf, other]

The Pitfalls of Sample Selection: A Case Study on Lung Nodule Classification

Authors: Vasileios Baltatzis, Kyriaki-Margarita Bintsi, Loic Le Folgoc, Octavio E. Martinez Manzanera, Sam Ellis, Arjun Nair, Sujal Desai, Ben Glocker, Julia A. Schnabel

Abstract: Using publicly available data to determine the performance of methodological contributions is important as it facilitates reproducibility and allows scrutiny of the published results. In lung nodule classification, for example, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the imp… ▽ More Using publicly available data to determine the performance of methodological contributions is important as it facilitates reproducibility and allows scrutiny of the published results. In lung nodule classification, for example, many works report results on the publicly available LIDC dataset. In theory, this should allow a direct comparison of the performance of proposed methods and assess the impact of individual contributions. When analyzing seven recent works, however, we find that each employs a different data selection process, leading to largely varying total number of samples and ratios between benign and malignant cases. As each subset will have different characteristics with varying difficulty for classification, a direct comparison between the proposed methods is thus not always possible, nor fair. We study the particular effect of truthing when aggregating labels from multiple experts. We show that specific choices can have severe impact on the data distribution where it may be possible to achieve superior performance on one sample distribution but not on another. While we show that we can further improve on the state-of-the-art on one sample selection, we also find that on a more challenging sample selection, on the same database, the more advanced models underperform with respect to very simple baseline methods, highlighting that the selected data distribution may play an even more important role than the model architecture. This raises concerns about the validity of claimed methodological contributions. We believe the community should be aware of these pitfalls and make recommendations on how these can be avoided in future work. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Accepted at PRIME, MICCAI 2021

arXiv:2108.04815 [pdf, other]

The Effect of the Loss on Generalization: Empirical Study on Synthetic Lung Nodule Data

Authors: Vasileios Baltatzis, Loic Le Folgoc, Sam Ellis, Octavio E. Martinez Manzanera, Kyriaki-Margarita Bintsi, Arjun Nair, Sujal Desai, Ben Glocker, Julia A. Schnabel

Abstract: Convolutional Neural Networks (CNNs) are widely used for image classification in a variety of fields, including medical imaging. While most studies deploy cross-entropy as the loss function in such tasks, a growing number of approaches have turned to a family of contrastive learning-based losses. Even though performance metrics such as accuracy, sensitivity and specificity are regularly used for t… ▽ More Convolutional Neural Networks (CNNs) are widely used for image classification in a variety of fields, including medical imaging. While most studies deploy cross-entropy as the loss function in such tasks, a growing number of approaches have turned to a family of contrastive learning-based losses. Even though performance metrics such as accuracy, sensitivity and specificity are regularly used for the evaluation of CNN classifiers, the features that these classifiers actually learn are rarely identified and their effect on the classification performance on out-of-distribution test samples is insufficiently explored. In this paper, motivated by the real-world task of lung nodule classification, we investigate the features that a CNN learns when trained and tested on different distributions of a synthetic dataset with controlled modes of variation. We show that different loss functions lead to different features being learned and consequently affect the generalization ability of the classifier on unseen data. This study provides some important insights into the design of deep learning solutions for medical imaging tasks. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: Accepted at iMIMIC, MICCAI 2021

arXiv:2108.00250 [pdf, other]

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Authors: Loic Le Folgoc, Vasileios Baltatzis, Amir Alansary, Sujal Desai, Anand Devaraj, Sam Ellis, Octavio E. Martinez Manzanera, Fahdi Kanavati, Arjun Nair, Julia Schnabel, Ben Glocker

Abstract: Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for machine learning models. They cause significant gaps between model performance in the lab and in the real world. Our work is a solution to prevalence bias. Preval… ▽ More Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for machine learning models. They cause significant gaps between model performance in the lab and in the real world. Our work is a solution to prevalence bias. Prevalence bias is the discrepancy between the prevalence of a pathology and its sampling rate in the training dataset, introduced upon collecting data or due to the practioner rebalancing the training batches. This paper lays the theoretical and computational framework for training models, and for prediction, in the presence of prevalence bias. Concretely a bias-corrected loss function, as well as bias-corrected predictive rules, are derived under the principles of Bayesian risk minimization. The loss exhibits a direct connection to the information gain. It offers a principled alternative to heuristic training losses and complements test-time procedures based on selecting an operating point from summary curves. It integrates seamlessly in the current paradigm of (deep) learning using stochastic backpropagation and naturally with Bayesian models. △ Less

Submitted 31 July, 2021; originally announced August 2021.

arXiv:2107.03974 [pdf, other]

Offline Meta-Reinforcement Learning with Online Self-Supervision

Authors: Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, Sergey Levine

Abstract: Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test ti… ▽ More Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online meta-RL on a range of challenging domains that require generalization to new tasks. △ Less

Submitted 6 July, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 8.5 pages, 6 figures, accepted to ICML 2022

arXiv:2106.01958 [pdf, other]

Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices

Authors: Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan Singh Thakur

Abstract: We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based in… ▽ More We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations. △ Less

Submitted 9 September, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Showing 1–50 of 79 results for author: Nair, A