subscribe to arXiv mailings

ACR: A Benchmark for Automatic Cohort Retrieval

Authors: Dung Ngoc Thai, Victor Ardulov, Jose Ulises Mena, Simran Tiwari, Gleb Erofeev, Ramy Eskander, Karim Tarabishy, Ravi B Parikh, Wael Salloum

Abstract: Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (… ▽ More Identifying patient cohorts is fundamental to numerous healthcare tasks, including clinical trial recruitment and retrospective studies. Current cohort retrieval methods in healthcare organizations rely on automated queries of structured data combined with manual curation, which are time-consuming, labor-intensive, and often yield low-quality results. Recent advancements in large language models (LLMs) and information retrieval (IR) offer promising avenues to revolutionize these systems. Major challenges include managing extensive eligibility criteria and handling the longitudinal nature of unstructured Electronic Medical Records (EMRs) while ensuring that the solution remains cost-effective for real-world application. This paper introduces a new task, Automatic Cohort Retrieval (ACR), and evaluates the performance of LLMs and commercial, domain-specific neuro-symbolic approaches. We provide a benchmark task, a query dataset, an EMR dataset, and an evaluation framework. Our findings underscore the necessity for efficient, high-quality ACR systems capable of longitudinal reasoning across extensive patient databases. △ Less

Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.00611 [pdf, other]

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Authors: Yinjun Wu, Mayank Keoliya, Kan Chen, Neelay Velingker, Ziyang Li, Emily J Getzen, Qi Long, Mayur Naik, Ravi B Parikh, Eric Wong

Abstract: Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers… ▽ More Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted at ICML 2024. 22 pages, 5 figures

arXiv:2309.05088 [pdf]

Towards Trustworthy Artificial Intelligence for Equitable Global Health

Authors: Hong Qin, Jude Kong, Wandi Ding, Ramneek Ahluwalia, Christo El Morr, Zeynep Engin, Jake Okechukwu Effoduh, Rebecca Hwa, Serena Jingchuan Guo, Laleh Seyyed-Kalantari, Sylvia Kiwuwa Muyingo, Candace Makeda Moore, Ravi Parikh, Reva Schwartz, Dongxiao Zhu, Xiaoqian Wang, Yiye Zhang

Abstract: Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob… ▽ More Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a global mix of experts from various disciplines, community health practitioners, policymakers, and more. Topics covered included managing AI bias in socio-technical systems, AI's potential impacts on global health, and balancing data privacy with transparency. Panel discussions examined the cultural, political, and ethical dimensions of AI in global health. FairMI4GH aimed to stimulate dialogue, facilitate knowledge transfer, and spark innovative solutions. Drawing from NIST's AI Risk Management Framework, it provided suggestions for handling AI risks and biases. The need to mitigate data biases from the research design stage, adopt a human-centered approach, and advocate for AI transparency was recognized. Challenges such as updating legal frameworks, managing cross-border data sharing, and motivating developers to reduce bias were acknowledged. The event emphasized the necessity of diverse viewpoints and multi-dimensional dialogue for creating a fair and ethical AI framework for equitable global health. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 7 pages

arXiv:2307.03882 [pdf, other]

The Busboy Problem: Efficient Tableware Decluttering Using Consolidation and Multi-Object Grasps

Authors: Kishore Srinivas, Shreya Ganti, Rishi Parikh, Ayah Ahmad, Wisdom Agboh, Mehmet Dogar, Ken Goldberg

Abstract: We present the "Busboy Problem": automating an efficient decluttering of cups, bowls, and silverware from a planar surface. As grasping and transporting individual items is highly inefficient, we propose policies to generate grasps for multiple items. We introduce the metric of Objects per Trip (OpT) carried by the robot to the collection bin to analyze the improvement seen as a result of our poli… ▽ More We present the "Busboy Problem": automating an efficient decluttering of cups, bowls, and silverware from a planar surface. As grasping and transporting individual items is highly inefficient, we propose policies to generate grasps for multiple items. We introduce the metric of Objects per Trip (OpT) carried by the robot to the collection bin to analyze the improvement seen as a result of our policies. In physical experiments with singulated items, we find that consolidation and multi-object grasps resulted in an 1.8x improvement in OpT, compared to methods without multi-object grasps. See https://sites.google.com/berkeley.edu/busboyproblem for code and supplemental materials. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.17162 [pdf, other]

Can Machines Garden? Systematically Comparing the AlphaGarden vs. Professional Horticulturalists

Authors: Simeon Adebola, Rishi Parikh, Mark Presten, Satvik Sharma, Shrey Aeron, Ananth Rao, Sandeep Mukherjee, Tomson Qu, Christina Wistrom, Eugen Solowjow, Ken Goldberg

Abstract: The AlphaGarden is an automated testbed for indoor polyculture farming which combines a first-order plant simulator, a gantry robot, a seed planting algorithm, plant phenotyping and tracking algorithms, irrigation sensors and algorithms, and custom pruning tools and algorithms. In this paper, we systematically compare the performance of the AlphaGarden to professional horticulturalists on the staf… ▽ More The AlphaGarden is an automated testbed for indoor polyculture farming which combines a first-order plant simulator, a gantry robot, a seed planting algorithm, plant phenotyping and tracking algorithms, irrigation sensors and algorithms, and custom pruning tools and algorithms. In this paper, we systematically compare the performance of the AlphaGarden to professional horticulturalists on the staff of the UC Berkeley Oxford Tract Greenhouse. The humans and the machine tend side-by-side polyculture gardens with the same seed arrangement. We compare performance in terms of canopy coverage, plant diversity, and water consumption. Results from two 60-day cycles suggest that the automated AlphaGarden performs comparably to professional horticulturalists in terms of coverage and diversity, and reduces water consumption by as much as 44%. Code, videos, and datasets are available at https://sites.google.com/berkeley.edu/systematiccomparison. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: International Conference on Robotics and Automation(ICRA) 2023 Oral

arXiv:2305.11759 [pdf, other]

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Authors: Mustafa Safa Ozdayi, Charith Peris, Jack FitzGerald, Christophe Dupuy, Jimit Majmudar, Haidar Khan, Rahil Parikh, Rahul Gupta

Abstract: Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decreas… ▽ More Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel approach which uses prompt-tuning to control the extraction rates of memorized content in LLMs. We present two prompt training strategies to increase and decrease extraction rates, which correspond to an attack and a defense, respectively. We demonstrate the effectiveness of our techniques by using models from the GPT-Neo family on a public benchmark. For the 1.3B parameter GPT-Neo model, our attack yields a 9.3 percentage point increase in extraction rate compared to our baseline. Our defense can be tuned to achieve different privacy-utility trade-offs by a user-specified hyperparameter. We achieve an extraction rate reduction of up to 97.7% relative to our baseline, with a perplexity increase of 16.9%. △ Less

Submitted 19 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 Figures, ACL 2023

arXiv:2208.10472 [pdf, other]

Automated Pruning of Polyculture Plants

Authors: Mark Presten, Rishi Parikh, Shrey Aeron, Sandeep Mukherjee, Simeon Adebola, Satvik Sharma, Mark Theis, Walter Teitelbaum, Ken Goldberg

Abstract: Polyculture farming has environmental advantages but requires substantially more pruning than monoculture farming. We present novel hardware and algorithms for automated pruning. Using an overhead camera to collect data from a physical scale garden testbed, the autonomous system utilizes a learned Plant Phenotyping convolutional neural network and a Bounding Disk Tracking algorithm to evaluate the… ▽ More Polyculture farming has environmental advantages but requires substantially more pruning than monoculture farming. We present novel hardware and algorithms for automated pruning. Using an overhead camera to collect data from a physical scale garden testbed, the autonomous system utilizes a learned Plant Phenotyping convolutional neural network and a Bounding Disk Tracking algorithm to evaluate the individual plant distribution and estimate the state of the garden each day. From this garden state, AlphaGardenSim selects plants to autonomously prune. A trained neural network detects and targets specific prune points on the plant. Two custom-designed pruning tools, compatible with a FarmBot gantry system, are experimentally evaluated and execute autonomous cuts through controlled algorithms. We present results for four 60-day garden cycles. Results suggest the system can autonomously achieve 0.94 normalized plant diversity with pruning shears while maintaining an average canopy coverage of 0.84 by the end of the cycles. For code, videos, and datasets, see https://sites.google.com/berkeley.edu/pruningpolyculture. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: CASE 2022, 8 pages. arXiv admin note: substantial text overlap with arXiv:2111.06014

arXiv:2207.00911 [pdf, other]

Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies

Authors: Satvik Sharma, Ellen Novoseller, Vainavi Viswanath, Zaynah Javed, Rishi Parikh, Ryan Hoque, Ashwin Balakrishna, Daniel S. Brown, Ken Goldberg

Abstract: Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behav… ▽ More Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behaviors on physical hardware. On the other hand, excessive training in simulation can cause policies to overfit to the visual appearance and dynamics of the simulator. In this work, we study strategies to automatically determine when policies trained in simulation can be reliably transferred to a physical robot. We specifically study these ideas in the context of robotic fabric manipulation, in which successful sim2real transfer is especially challenging due to the difficulties of precisely modeling the dynamics and visual appearance of fabric. Results in a fabric smoothing task suggest that our switching criteria correlate well with performance in real. In particular, our confidence-based switching criteria achieve average final fabric coverage of 87.2-93.7% within 55-60% of the total training budget. See https://tinyurl.com/lsc-case for code and supplemental materials. △ Less

Submitted 2 July, 2022; originally announced July 2022.

Comments: CASE 2022. The first two authors contributed equally. 9 pages; 5 figures; 1 table

arXiv:2206.13476 [pdf, other]

Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework

Authors: Rahil Parikh, Harshavardhan Sundar, Ming Sun, Chao Wang, Spyros Matsoukas

Abstract: Acoustic events are sounds with well-defined spectro-temporal characteristics which can be associated with the physical objects generating them. Acoustic scenes are collections of such acoustic events in no specific temporal order. Given this natural linkage between events and scenes, a common belief is that the ability to classify events must help in the classification of scenes. This has led to… ▽ More Acoustic events are sounds with well-defined spectro-temporal characteristics which can be associated with the physical objects generating them. Acoustic scenes are collections of such acoustic events in no specific temporal order. Given this natural linkage between events and scenes, a common belief is that the ability to classify events must help in the classification of scenes. This has led to several efforts attempting to do well on Acoustic Event Tagging (AET) and Acoustic Scene Classification (ASC) using a multi-task network. However, in these efforts, improvement in one task does not guarantee an improvement in the other, suggesting a tension between ASC and AET. It is unclear if improvements in AET translates to improvements in ASC. We explore this conundrum through an extensive empirical study and show that under certain conditions, using AET as an auxiliary task in the multi-task network consistently improves ASC performance. Additionally, ASC performance further improves with the AET data-set size and is not sensitive to the choice of events or the number of events in the AET data-set. We conclude that this improvement in ASC performance comes from the regularization effect of using AET and not from the network's improved ability to discern between acoustic events. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted at ISCA Interspeech 2022

arXiv:2206.09556 [pdf, other]

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

Authors: Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma

Abstract: End-to-end learning models have demonstrated a remarkable capability in performing speech segregation. Despite their wide-scope of real-world applications, little is known about the mechanisms they employ to group and consequently segregate individual speakers. Knowing that harmonicity is a critical cue for these networks to group sources, in this work, we perform a thorough investigation on ConvT… ▽ More End-to-end learning models have demonstrated a remarkable capability in performing speech segregation. Despite their wide-scope of real-world applications, little is known about the mechanisms they employ to group and consequently segregate individual speakers. Knowing that harmonicity is a critical cue for these networks to group sources, in this work, we perform a thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture. We perform ablation studies where we apply low-pass, high-pass, and band-stop filters of varying pass-bands to empirically analyze the harmonics most critical for segregation. We also investigate how these networks decide which output channel to assign to an estimated source by introducing discontinuities in synthetic mixtures. We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans. Replacing the encoder in these networks with a spectrogram leads to lower overall performance, but much higher stability. This work helps us to understand what information these network rely on for speech segregation, and exposes two sources of generalization-errors. It also pinpoints the encoder as the part of the network responsible for these errors, allowing for a redesign with expert knowledge or transfer learning. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: Accepted at Interspeech 2022

arXiv:2203.13920 [pdf, other]

Canary Extraction in Natural Language Understanding Models

Authors: Rahil Parikh, Christophe Dupuy, Rahul Gupta

Abstract: Natural Language Understanding (NLU) models can be trained on sensitive information such as phone numbers, zip-codes etc. Recent literature has focused on Model Inversion Attacks (ModIvA) that can extract training data from model parameters. In this work, we present a version of such an attack by extracting canaries inserted in NLU training data. In the attack, an adversary with open-box access to… ▽ More Natural Language Understanding (NLU) models can be trained on sensitive information such as phone numbers, zip-codes etc. Recent literature has focused on Model Inversion Attacks (ModIvA) that can extract training data from model parameters. In this work, we present a version of such an attack by extracting canaries inserted in NLU training data. In the attack, an adversary with open-box access to the model reconstructs the canaries contained in the model's training set. We evaluate our approach by performing text completion on canaries and demonstrate that by using the prefix (non-sensitive) tokens of the canary, we can generate the full canary. As an example, our attack is able to reconstruct a four digit code in the training dataset of the NLU model with a probability of 0.5 in its best configuration. As countermeasures, we identify several defense mechanisms that, when combined, effectively eliminate the risk of ModIvA in our experiments. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted to ACL 2022, Main Conference

arXiv:2203.05780 [pdf, other]

Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals

Authors: Rahil Parikh, Nadee Seneviratne, Ganesh Sivaraman, Shihab Shamma, Carol Espy-Wilson

Abstract: Multi-resolution spectro-temporal features of a speech signal represent how the brain perceives sounds by tuning cortical cells to different spectral and temporal modulations. These features produce a higher dimensional representation of the speech signals. The purpose of this paper is to evaluate how well the auditory cortex representation of speech signals contribute to estimate articulatory fea… ▽ More Multi-resolution spectro-temporal features of a speech signal represent how the brain perceives sounds by tuning cortical cells to different spectral and temporal modulations. These features produce a higher dimensional representation of the speech signals. The purpose of this paper is to evaluate how well the auditory cortex representation of speech signals contribute to estimate articulatory features of those corresponding signals. Since obtaining articulatory features from acoustic features of speech signals has been a challenging topic of interest for different speech communities, we investigate the possibility of using this multi-resolution representation of speech signals as acoustic features. We used U. of Wisconsin X-ray Microbeam (XRMB) database of clean speech signals to train a feed-forward deep neural network (DNN) to estimate articulatory trajectories of six tract variables. The optimal set of multi-resolution spectro-temporal features to train the model were chosen using appropriate scale and rate vector parameters to obtain the best performing model. Experiments achieved a correlation of 0.675 with ground-truth tract variables. We compared the performance of this speech inversion system with prior experiments conducted using Mel Frequency Cepstral Coefficients (MFCCs). △ Less

Submitted 25 June, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: Accepted at ISCA Interspeech 2022

arXiv:2203.04420 [pdf, other]

Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems

Authors: Rahil Parikh, Ilya Kavalerov, Carol Espy-Wilson, Shihab Shamma

Abstract: Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We eval… ▽ More Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered. We find that performance deteriorates significantly if one source is even slightly harmonically jittered, e.g., an imperceptible 3% harmonic jitter degrades performance of Conv-TasNet from 15.4 dB to 0.70 dB. Training the model on inharmonic speech does not remedy this sensitivity, instead resulting in worse performance on natural speech mixtures, making inharmonicity a powerful adversarial factor in DNN models. Furthermore, additional analyses reveal that DNN algorithms deviate markedly from biologically inspired algorithms that rely primarily on timing cues and not harmonicity to segregate speech. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: 5 pages, IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2022

arXiv:2111.06014

AlphaGarden: Learning to Autonomously Tend a Polyculture Garden

Authors: Mark Presten, Yahav Avigal, Mark Theis, Satvik Sharma, Rishi Parikh, Shrey Aeron, Sandeep Mukherjee, Sebastian Oehme, Simeon Adebola, Walter Teitelbaum, Varun Kamat, Ken Goldberg

Abstract: This paper presents AlphaGarden: an autonomous polyculture garden that prunes and irrigates living plants in a 1.5m x 3.0m physical testbed. AlphaGarden uses an overhead camera and sensors to track the plant distribution and soil moisture. We model individual plant growth and interplant dynamics to train a policy that chooses actions to maximize leaf coverage and diversity. For autonomous pruning,… ▽ More This paper presents AlphaGarden: an autonomous polyculture garden that prunes and irrigates living plants in a 1.5m x 3.0m physical testbed. AlphaGarden uses an overhead camera and sensors to track the plant distribution and soil moisture. We model individual plant growth and interplant dynamics to train a policy that chooses actions to maximize leaf coverage and diversity. For autonomous pruning, AlphaGarden uses two custom-designed pruning tools and a trained neural network to detect prune points. We present results for four 60-day garden cycles. Results suggest AlphaGarden can autonomously achieve 0.96 normalized diversity with pruning shears while maintaining an average canopy coverage of 0.86 during the peak of the cycle. Code, datasets, and supplemental material can be found at https://github.com/BerkeleyAutomation/AlphaGarden. △ Less

Submitted 22 August, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

Comments: Paper revised, extended, and resubmitted. See "Automated Pruning of Polyculture Plants."

arXiv:2109.11434 [pdf, other]

Exploring Machine Teaching with Children

Authors: Utkarsh Dwivedi, Jaina Gandhi, Raj Parikh, Merijke Coenraad, Elizabeth Bonsignore, Hernisa Kacorri

Abstract: Iteratively building and testing machine learning models can help children develop creativity, flexibility, and comfort with machine learning and artificial intelligence. We explore how children use machine teaching interfaces with a team of 14 children (aged 7-13 years) and adult co-designers. Children trained image classifiers and tested each other's models for robustness. Our study illuminates… ▽ More Iteratively building and testing machine learning models can help children develop creativity, flexibility, and comfort with machine learning and artificial intelligence. We explore how children use machine teaching interfaces with a team of 14 children (aged 7-13 years) and adult co-designers. Children trained image classifiers and tested each other's models for robustness. Our study illuminates how children reason about ML concepts, offering these insights for designing machine teaching experiences for children: (i) ML metrics (e.g. confidence scores) should be visible for experimentation; (ii) ML activities should enable children to exchange models for promoting reflection and pattern recognition; and (iii) the interface should allow quick data inspection (e.g. images vs. gestures). △ Less

Submitted 27 September, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

Comments: 11 pages, 8 images

Journal ref: IEEE Symposium on Visual Languages and Human-Centric Computing 2021

arXiv:2101.05954 [pdf, ps, other]

doi 10.1007/978-3-030-68790-8_27

Recent Advances in Video Question Answering: A Review of Datasets and Methods

Authors: Devshree Patel, Ratnam Parikh, Yesha Shastri

Abstract: Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision. Several visual information retrieval techniques like Video Captioning/Description and Video-guided Machine Translation have preceded the task of VQA. VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and dat… ▽ More Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision. Several visual information retrieval techniques like Video Captioning/Description and Video-guided Machine Translation have preceded the task of VQA. VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and datasets for the task of VQA. To the best of our knowledge, no previous survey has been conducted for the VQA task. △ Less

Submitted 18 March, 2021; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: 18 pages, 5 tables, Video and Image Question Answering Workshop, 25th International Conference on Pattern Recognition

Journal ref: Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science, vol 12662. Springer

arXiv:2005.11313 [pdf, other]

Comparative Study of Machine Learning Models and BERT on SQuAD

Authors: Devshree Patel, Param Raval, Ratnam Parikh, Yesha Shastri

Abstract: This study aims to provide a comparative analysis of performance of certain models popular in machine learning and the BERT model on the Stanford Question Answering Dataset (SQuAD). The analysis shows that the BERT model, which was once state-of-the-art on SQuAD, gives higher accuracy in comparison to other models. However, BERT requires a greater execution time even when only 100 samples are used… ▽ More This study aims to provide a comparative analysis of performance of certain models popular in machine learning and the BERT model on the Stanford Question Answering Dataset (SQuAD). The analysis shows that the BERT model, which was once state-of-the-art on SQuAD, gives higher accuracy in comparison to other models. However, BERT requires a greater execution time even when only 100 samples are used. This shows that with increasing accuracy more amount of time is invested in training the data. Whereas in case of preliminary machine learning models, execution time for full data is lower but accuracy is compromised. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:1510.02104 [pdf, ps, other]

Building Resource Adaptive Software Systems (BRASS): Objectives and System Evaluation

Authors: Jeffrey Hughes, Cassandra Sparks, Alley Stoughton, Rinku Parikh, Albert Reuther, Suresh Jagannathan

Abstract: As modern software systems continue inexorably to increase in complexity and capability, users have become accustomed to periodic cycles of updating and upgrading to avoid obsolescence -- if at some cost in terms of frustration. In the case of the U.S. military, having access to well-functioning software systems and underlying content is critical to national security, but updates are no less probl… ▽ More As modern software systems continue inexorably to increase in complexity and capability, users have become accustomed to periodic cycles of updating and upgrading to avoid obsolescence -- if at some cost in terms of frustration. In the case of the U.S. military, having access to well-functioning software systems and underlying content is critical to national security, but updates are no less problematic than among civilian users and often demand considerable time and expense. To address these challenges, DARPA has announced a new four-year research project to investigate the fundamental computational and algorithmic requirements necessary for software systems and data to remain robust and functional in excess of 100 years. The Building Resource Adaptive Software Systems, or BRASS, program seeks to realize foundational advances in the design and implementation of long-lived software systems that can dynamically adapt to changes in the resources they depend upon and environments in which they operate. MIT Lincoln Laboratory will provide the test framework and evaluation of proposed software tools in support of this revolutionary vision. △ Less

Submitted 7 October, 2015; originally announced October 2015.

arXiv:1306.4631 [pdf]

Table of Content detection using Machine Learning

Authors: Rachana Parikh, Avani R. Vasant

Abstract: Table of content (TOC) detection has drawn attention now a day because it plays an important role in digitization of multipage document. Generally book document is multipage document. So it becomes necessary to detect Table of Content page for easy navigation of multipage document and also to make information retrieval faster for desirable data from the multipage document. All the Table of content… ▽ More Table of content (TOC) detection has drawn attention now a day because it plays an important role in digitization of multipage document. Generally book document is multipage document. So it becomes necessary to detect Table of Content page for easy navigation of multipage document and also to make information retrieval faster for desirable data from the multipage document. All the Table of content pages follow the different layout, different way of presenting the contents of the document like chapter, section, subsection etc. This paper introduces a new method to detect Table of content using machine learning technique with different features. With the main aim to detect Table of Content pages is to structure the document according to their contents. △ Less

Submitted 6 June, 2013; originally announced June 2013.

Comments: International Journal of Artificial Intelligence and Applications, May-2013

arXiv:cs/0003021 [pdf, ps, other]

Relevance Sensitive Non-Monotonic Inference on Belief Sequences

Authors: Samir Chopra, Konstantinos Georgatos, Rohit Parikh

Abstract: We present a method for relevance sensitive non-monotonic inference from belief sequences which incorporates insights pertaining to prioritized inference and relevance sensitive, inconsistency tolerant belief revision. Our model uses a finite, logically open sequence of propositional formulas as a representation for beliefs and defines a notion of inference from maxiconsistent subsets of formul… ▽ More We present a method for relevance sensitive non-monotonic inference from belief sequences which incorporates insights pertaining to prioritized inference and relevance sensitive, inconsistency tolerant belief revision. Our model uses a finite, logically open sequence of propositional formulas as a representation for beliefs and defines a notion of inference from maxiconsistent subsets of formulas guided by two orderings: a temporal sequencing and an ordering based on relevance relations between the conclusion and formulas in the sequence. The relevance relations are ternary (using context as a parameter) as opposed to standard binary axiomatizations. The inference operation thus defined easily handles iterated revision by maintaining a revision history, blocks the derivation of inconsistent answers from a possibly inconsistent sequence and maintains the distinction between explicit and implicit beliefs. In doing so, it provides a finitely presented formalism and a plausible model of reasoning for automated agents. △ Less

Submitted 7 March, 2000; originally announced March 2000.

ACM Class: I.2.3

Showing 1–20 of 20 results for author: Parikh, R