subscribe to arXiv mailings

BAGEL: Bootstrapping Agents by Guiding Exploration with Language

Authors: Shikhar Murty, Christopher Manning, Peter Shaw, Mandar Joshi, Kenton Lee

Abstract: Following natural language instructions by executing actions in digital environments (e.g. web-browsers and REST APIs) is a challenging task for language model (LM) agents. Unfortunately, LM agents often fail to generalize to new environments without human demonstrations. This work presents BAGEL, a method for bootstrapping LM agents without human supervision. BAGEL converts a seed set of randomly… ▽ More Following natural language instructions by executing actions in digital environments (e.g. web-browsers and REST APIs) is a challenging task for language model (LM) agents. Unfortunately, LM agents often fail to generalize to new environments without human demonstrations. This work presents BAGEL, a method for bootstrapping LM agents without human supervision. BAGEL converts a seed set of randomly explored trajectories or synthetic instructions, into demonstrations, via round-trips between two noisy LM components: an LM labeler which converts a trajectory into a synthetic instruction, and a zero-shot LM agent which maps the synthetic instruction into a refined trajectory. By performing these round-trips iteratively, BAGEL quickly converts the initial distribution of trajectories towards those that are well-described by natural language. We use BAGEL demonstrations to adapt a zero shot LM agent at test time via in-context learning over retrieved demonstrations, and find improvements of over 2-13% absolute on ToolQA and MiniWob++, with up to 13x reduction in execution failures. △ Less

Submitted 8 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: ICML 2024 Camera ready version

arXiv:2403.02054 [pdf, other]

Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism

Authors: Shuvayan Brahmachary, Subodh M. Joshi, Aniruddha Panda, Kaushik Koneripalli, Arun Kumar Sagotra, Harshil Patel, Ankush Sharma, Ameya D. Jagtap, Kaushic Kalyanaraman

Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Lang… ▽ More Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, prompting interest in their application as black-box optimizers. This paper asserts that LLMs possess the capability for zero-shot optimization across diverse scenarios, including multi-objective and high-dimensional problems. We introduce a novel population-based method for numerical optimization using LLMs called Language-Model-Based Evolutionary Optimizer (LEO). Our hypothesis is supported through numerical examples, spanning benchmark and industrial engineering problems such as supersonic nozzle shape optimization, heat transfer, and windfarm layout optimization. We compare our method to several gradient-based and gradient-free optimization approaches. While LLMs yield comparable results to state-of-the-art methods, their imaginative nature and propensity to hallucinate demand careful handling. We provide practical guidelines for obtaining reliable answers from LLMs and discuss method limitations and potential research directions. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19109 [pdf, other]

Confidence and Assurance of Percentiles

Authors: Sanjay M. Joshi

Abstract: Confidence interval of mean is often used when quoting statistics. The same rigor is often missing when quoting percentiles and tolerance or percentile intervals. This article derives the expression for confidence in percentiles of a sample population. Confidence intervals of median is compared to those of mean for a few sample distributions. The concept of assurance from reliability engineering i… ▽ More Confidence interval of mean is often used when quoting statistics. The same rigor is often missing when quoting percentiles and tolerance or percentile intervals. This article derives the expression for confidence in percentiles of a sample population. Confidence intervals of median is compared to those of mean for a few sample distributions. The concept of assurance from reliability engineering is then extended to percentiles. The assurance level of sorted samples simply matches the confidence and percentile levels. Numerical method to compute assurance using Brent's optimization method is provided as an open-source python package. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 5 pages, 4 Figures

arXiv:2311.09612 [pdf, other]

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Authors: Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Abstract: Understanding visually situated language requires interpreting complex layouts of textual and visual elements. Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text. However, such methods have high computational and engineering complexity. Can small pretrained image-to-text models a… ▽ More Understanding visually situated language requires interpreting complex layouts of textual and visual elements. Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text. However, such methods have high computational and engineering complexity. Can small pretrained image-to-text models accurately understand visual documents through similar recognition and reasoning steps instead? We propose Rationale Distillation (RD), which incorporates the outputs of OCR tools, LLMs, and larger multimodal models as intermediate "rationales", and trains a small student model to predict both rationales and answers. On three visual document understanding benchmarks representing infographics, scanned documents, and figures, our Pix2Struct (282M parameters) student model finetuned with RD outperforms the base model by 4-5% absolute accuracy with only 1% higher computational cost. △ Less

Submitted 1 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted by NAACL 2024

arXiv:2311.07897 [pdf, other]

CPopQA: Ranking Cultural Concept Popularity by LLMs

Authors: Ming Jiang, Mansi Joshi

Abstract: Prior work has demonstrated large language models' (LLMs) potential to discern statistical tendencies within their pre-training corpora. Despite that, many examinations of LLMs' knowledge capacity focus on knowledge explicitly appearing in the training data or implicitly inferable from similar contexts. How well an LLM captures the corpus-level statistical trends of concepts for reasoning, especia… ▽ More Prior work has demonstrated large language models' (LLMs) potential to discern statistical tendencies within their pre-training corpora. Despite that, many examinations of LLMs' knowledge capacity focus on knowledge explicitly appearing in the training data or implicitly inferable from similar contexts. How well an LLM captures the corpus-level statistical trends of concepts for reasoning, especially long-tail ones, is still underexplored. In this study, we introduce a novel few-shot question-answering task (CPopQA) that examines LLMs' statistical ranking abilities for long-tail cultural concepts (e.g., holidays), with a specific focus on these concepts' popularity in the United States and the United Kingdom, respectively. We curate a dataset containing 459 holidays across 58 countries, generating a total of 6,000 QA testing pairs. Experiments on four strong LLMs show that large models are capable of ranking long-tail cultural concepts regarding their statistical tendency. Notably, GPT-3.5 displayed superior performance and exhibited its potential to identify geo-cultural proximity across continents. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.07191 [pdf, other]

Applying Large Language Models for Causal Structure Learning in Non Small Cell Lung Cancer

Authors: Narmada Naik, Ayush Khandelwal, Mohit Joshi, Madhusudan Atre, Hollis Wright, Kavya Kannan, Scott Hill, Giridhar Mamidipudi, Ganapati Srinivasa, Carlo Bifulco, Brian Piening, Kevin Matlock

Abstract: Causal discovery is becoming a key part in medical AI research. These methods can enhance healthcare by identifying causal links between biomarkers, demographics, treatments and outcomes. They can aid medical professionals in choosing more impactful treatments and strategies. In parallel, Large Language Models (LLMs) have shown great potential in identifying patterns and generating insights from t… ▽ More Causal discovery is becoming a key part in medical AI research. These methods can enhance healthcare by identifying causal links between biomarkers, demographics, treatments and outcomes. They can aid medical professionals in choosing more impactful treatments and strategies. In parallel, Large Language Models (LLMs) have shown great potential in identifying patterns and generating insights from text data. In this paper we investigate applying LLMs to the problem of determining the directionality of edges in causal discovery. Specifically, we test our approach on a deidentified set of Non Small Cell Lung Cancer(NSCLC) patients that have both electronic health record and genomic panel data. Graphs are validated using Bayesian Dirichlet estimators using tabular data. Our result shows that LLMs can accurately predict the directionality of edges in causal graphs, outperforming existing state-of-the-art methods. These findings suggests that LLMs can play a significant role in advancing causal discovery and help us better understand complex systems. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2306.00245 [pdf, other]

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Authors: Peter Shaw, Mandar Joshi, James Cohan, Jonathan Berant, Panupong Pasupat, Hexiang Hu, Urvashi Khandelwal, Kenton Lee, Kristina Toutanova

Abstract: Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available. These input representations have been often coupled with custom, task-specific action spaces. This paper focuses on creating agents that interact with the digital world using the… ▽ More Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available. These input representations have been often coupled with custom, task-specific action spaces. This paper focuses on creating agents that interact with the digital world using the same conceptual interface that humans commonly use -- via pixel-based screenshots and a generic action space corresponding to keyboard and mouse actions. Building upon recent progress in pixel-based pretraining, we show, for the first time, that it is possible for such agents to outperform human crowdworkers on the MiniWob++ benchmark of GUI-based instruction following tasks. △ Less

Submitted 6 December, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.18565 [pdf, other]

PaLI-X: On Scaling up a Multilingual Vision and Language Model

Authors: Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic , et al. (18 additional authors not shown)

Abstract: We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-sh… ▽ More We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. PaLI-X advances the state-of-the-art on most vision-and-language benchmarks considered (25+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.16578 [pdf, other]

Computation of Reliability Statistics for Finite Samples of Success-Failure Experiments

Authors: Sanjay M. Joshi

Abstract: Computational method for statistical measures of reliability, confidence, and assurance are available for infinite population size. If the population size is finite and small compared to the number of samples tested, these computational methods need to be improved for a better representation of reality. This article discusses how to compute reliability, confidence, and assurance statistics for fin… ▽ More Computational method for statistical measures of reliability, confidence, and assurance are available for infinite population size. If the population size is finite and small compared to the number of samples tested, these computational methods need to be improved for a better representation of reality. This article discusses how to compute reliability, confidence, and assurance statistics for finite number of samples. Graphs and tables are provided as examples and can be used for low number of test sample sizes. Two open-source python libraries are provided for computing reliability, confidence, and assurance with both infinite and finite number of samples. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 6 pages, 4 figures, 1 table

arXiv:2303.07451 [pdf]

DRISHTI: Visual Navigation Assistant for Visually Impaired

Authors: Malay Joshi, Aditi Shukla, Jayesh Srivastava, Manya Rastogi

Abstract: In today's society, where independent living is becoming increasingly important, it can be extremely constricting for those who are blind. Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment. In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRI… ▽ More In today's society, where independent living is becoming increasingly important, it can be extremely constricting for those who are blind. Blind and visually impaired (BVI) people face challenges because they need manual support to prompt information about their environment. In this work, we took our first step towards developing an affordable and high-performing eye wearable assistive device, DRISHTI, to provide visual navigation assistance for BVI people. This system comprises a camera module, ESP32 processor, Bluetooth module, smartphone and speakers. Using artificial intelligence, this system is proposed to detect and understand the nature of the users' path and obstacles ahead of the user in that path and then inform BVI users about it via audio output to enable them to acquire directions by themselves on their journey. This first step discussed in this paper involves establishing a proof-of-concept of achieving the right balance of affordability and performance by testing an initial software integration of a currency detection algorithm on a low-cost embedded arrangement. This work will lay the foundation for our upcoming works toward achieving the goal of assisting the maximum of BVI people around the globe in moving independently. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: Paper presented at International Conference on Advancements and Key Challenges in Green Energy and Computing (AKGEC 2023) is accepted to be published in the proceedings of the Journal of Physics

arXiv:2302.11154 [pdf, other]

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Authors: Hexiang Hu, Yi Luan, Yang Chen, Urvashi Khandelwal, Mandar Joshi, Kenton Lee, Kristina Toutanova, Ming-Wei Chang

Abstract: Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks. However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual… ▽ More Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks. However, existing image classification benchmarks often evaluate recognition on a specific domain (e.g., outdoor images) or a specific task (e.g., classifying plant species), which falls short of evaluating whether pre-trained foundational models are universal visual recognizers. To address this, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct OVEN-Wiki by re-purposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. OVEN challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels. Our study on state-of-the-art pre-trained models reveals large headroom in generalizing to the massive-scale label space. We show that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning. We also find existing pretrained models yield different strengths: while PaLI-based models obtain higher overall performance, CLIP-based models are better at recognizing tail entities. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Dataset available at https://open-vision-language.github.io/oven

arXiv:2212.10505 [pdf, other]

DePlot: One-shot visual language reasoning by plot-to-table translation

Authors: Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

Abstract: Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual languag… ▽ More Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA. △ Less

Submitted 23 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023 (Findings)

arXiv:2212.09662 [pdf, other]

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Authors: Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, Julian Martin Eisenschlos

Abstract: Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks… ▽ More Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks. △ Less

Submitted 23 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2212.08022 [pdf, other]

iCardo: A Machine Learning Based Smart Healthcare Framework for Cardiovascular Disease Prediction

Authors: Nidhi Sinha, Teena Jangid, Amit M. Joshi, Saraju P. Mohanty

Abstract: The point of care services and medication have become simpler with efficient consumer electronics devices in a smart healthcare system. Cardiovascular disease is a critical illness which causes heart failure, and early and prompt identification can lessen damage and prevent premature mortality. Machine learning has been used to predict cardiovascular disease (CVD) in the literature. The article ex… ▽ More The point of care services and medication have become simpler with efficient consumer electronics devices in a smart healthcare system. Cardiovascular disease is a critical illness which causes heart failure, and early and prompt identification can lessen damage and prevent premature mortality. Machine learning has been used to predict cardiovascular disease (CVD) in the literature. The article explains choosing the best classifier model for the selected feature sets and the distinct feature sets selected using four feature selection models. The paper compares seven classifiers using each of the sixteen feature sets. Originally, the data had 56 attributes and 303 occurrences, of which 87 were in good health, and the remainder had cardiovascular disease (CVD). Demographic data with several features make up the four groups of overall features. Lasso, Tree-based algorithms, Chi-Square and RFE have all been used to choose the four distinct feature sets, each containing five, ten, fifteen, and twenty features, respectively. Seven distinct classifiers have been trained and evaluated for each of the sixteen feature sets. To determine the most effective blend of feature set and model, a total of 112 models have been trained, tested, and their performance has been compared. SVM classifier with fifteen chosen features is shown to be the best in terms of overall accuracy. The healthcare data has been maintained in the cloud and would be accessible to patients, caretakers, and healthcare providers through integration with the Internet of Medical Things (IoMT) enabled smart healthcare. Subsequently, the feature selection model chooses the most appropriate feature for CVD prediction to calibrate the system, and the proposed framework can be utilised to anticipate CVD. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 19 Pages, 9 Figures, 5 Tables

arXiv:2211.07893 [pdf, other]

doi 10.1145/3533708

Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges

Authors: Madhura Joshi, Ankit Pal, Malaikannan Sankarasubbu

Abstract: Federated learning is the process of developing machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, an… ▽ More Federated learning is the process of developing machine learning models over datasets distributed across data centers such as hospitals, clinical research labs, and mobile devices while preventing data leakage. This survey examines previous research and studies on federated learning in the healthcare sector across a range of use cases and applications. Our survey shows what challenges, methods, and applications a practitioner should be aware of in the topic of federated learning. This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries. △ Less

Submitted 19 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 40. Publication date: October 2022

Journal ref: ACM Transactions on Computing for Healthcare, Vol. 3, No. 4, Article 40. Publication date: October 2022

arXiv:2210.03347 [pdf, other]

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Authors: Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova

Abstract: Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for pu… ▽ More Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images. △ Less

Submitted 15 June, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted at ICML

arXiv:2209.09395 [pdf, other]

doi 10.1109/OCEANS47191.2022.9977233

OysterSim: Underwater Simulation for Enhancing Oyster Reef Monitoring

Authors: Xiaomin Lin, Nitesh Jha, Mayank Joshi, Nare Karapetyan, Yiannis Aloimonos, Miao Yu

Abstract: Oysters are the living vacuum cleaners of the oceans. There is an exponential decline in the oyster population due to over-harvesting. With the current development of the automation and AI, robots are becoming an integral part of the environmental monitoring process that can be also utilized for oyster reef preservation. Nevertheless, the underwater environment poses many difficulties, both from t… ▽ More Oysters are the living vacuum cleaners of the oceans. There is an exponential decline in the oyster population due to over-harvesting. With the current development of the automation and AI, robots are becoming an integral part of the environmental monitoring process that can be also utilized for oyster reef preservation. Nevertheless, the underwater environment poses many difficulties, both from the practical - dangerous and time consuming operations, and the technical perspectives - distorted perception and unreliable navigation. To this end, we present a simulated environment that can be used to improve oyster reef monitoring. The simulated environment can be used to create photo-realistic image datasets with multiple sensor data and ground truth location of a remotely operated vehicle(ROV). Currently, there are no photo-realistic image datasets for oyster reef monitoring. Thus, we want to provide a new benchmark suite to the underwater community. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Journal ref: OCEANS 2022, Hampton Roads, 2022, pp. 1-6

arXiv:2205.04050 [pdf, other]

Few-shot Mining of Naturally Occurring Inputs and Outputs

Authors: Mandar Joshi, Terra Blevins, Mike Lewis, Daniel S. Weld, Luke Zettlemoyer

Abstract: Creating labeled natural language training data is expensive and requires significant human effort. We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. The mining consists of two stages -- (1) a biencoder-based recall-oriented dense search which pairs inputs with potential outputs, and (2) a crossencoder-based fil… ▽ More Creating labeled natural language training data is expensive and requires significant human effort. We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. The mining consists of two stages -- (1) a biencoder-based recall-oriented dense search which pairs inputs with potential outputs, and (2) a crossencoder-based filter which re-ranks the output of the biencoder stage for better precision. Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks. On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set. Likewise, we see improvements of 1.46 ROUGE-L on Xsum abstractive summarization. △ Less

Submitted 9 May, 2022; originally announced May 2022.

arXiv:2204.07496 [pdf, other]

Improving Passage Retrieval with Zero-Shot Question Generation

Authors: Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

Abstract: We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or… ▽ More We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes. △ Less

Submitted 2 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: EMNLP 2022 camera-ready version. Code is available at: https://github.com/DevSinghSachan/unsupervised-passage-reranking

arXiv:2203.01412 [pdf, other]

Effect of Timing Error: A Case Study of Navigation Camera

Authors: Sandeep S. Kulkarni, Sanjay M. Joshi

Abstract: We focus on the problem of timing errors in navigation camera as a case study in a broader problem of the effect of a timing error in cyber-physical systems. These systems rely on the requirement that certain things happen at the same time or certain things happen periodically at some period $T$. However, as these systems get more complex, timing errors can occur between the components thereby vio… ▽ More We focus on the problem of timing errors in navigation camera as a case study in a broader problem of the effect of a timing error in cyber-physical systems. These systems rely on the requirement that certain things happen at the same time or certain things happen periodically at some period $T$. However, as these systems get more complex, timing errors can occur between the components thereby violating the assumption about events being simultaneous (or periodic). We consider the problem of a surgical navigation system where optical markers detected in the 2D pictures taken by two cameras are used to localize the markers in 3D space. A predefined array of such markers, known as a reference element, is used to navigate the corresponding CAD model of a surgical instrument on patient's images. The cameras rely on the assumption that the pictures from both cameras are taken exactly at the same time. If a timing error occurs then the instrument may have moved between the pictures. We find that, depending upon the location of the instrument, this can lead to a substantial error in the localization of the instrument. Specifically, we find that if the actual movement is $δ$ then the observed movement may be as high as $5δ$ in the operating range of the camera. Furthermore, we also identify potential issues that could affect the error in case there are changes to the camera system or to the operating range. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.06744 [pdf]

Overhead Management in Multi-Core Environment

Authors: Urmila Shrawankar, Mayuri Joshi

Abstract: In multi-core systems, various factors like inter-process communication, dependency, resource sharing and scheduling, level of parallelism, synchronization, number of available cores etc. influence the extent of possible High Performance Computing parallelization. These parameters if not managed to the root level, later surface as overheads during execution. This paper emphasizes on these paramete… ▽ More In multi-core systems, various factors like inter-process communication, dependency, resource sharing and scheduling, level of parallelism, synchronization, number of available cores etc. influence the extent of possible High Performance Computing parallelization. These parameters if not managed to the root level, later surface as overheads during execution. This paper emphasizes on these parameters of parallelism, their overheads of parallelization and its effective management for optimal parallel execution under any domain. As a whole, we focus on the Dense Linear Algebra (DLA) domain and specifically on Matrix Multiplication and sorting domains. These domains are chosen as they find application in various sectors of scientific and mathematical applications. The comparative analysis of results obtained clarifies the trade-off between serial and parallel execution of DLA problems the surfacing overheads and their possible and effective management. △ Less

Submitted 31 January, 2022; originally announced February 2022.

Comments: 06 pages, 05 figures, 03 tables

arXiv:2201.07520 [pdf, other]

CM3: A Causal Masked Multimodal Model of the Internet

Authors: Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

Abstract: We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking obje… ▽ More We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking object provides a type of hybrid of the more common causal and masked language models, by enabling full generative modeling while also providing bidirectional context when generating the masked spans. We train causally masked language-image models on large-scale web and Wikipedia articles, where each document contains all of the text, hypertext markup, hyperlinks, and image tokens (from a VQVAE-GAN), provided in the order they appear in the original HTML source (before masking). The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts, and thereby implicitly learn a wide range of text, image, and cross modal tasks. They can be prompted to recover, in a zero-shot fashion, the functionality of models such as DALL-E, GENRE, and HTLM. We set the new state-of-the-art in zero-shot summarization, entity linking, and entity disambiguation while maintaining competitive performance in the fine-tuning setting. We can generate images unconditionally, conditioned on text (like DALL-E) and do captioning all in a zero-shot setting with a single model. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.11298 [pdf, other]

Novel EEG based Schizophrenia Detection with IoMT Framework for Smart Healthcare

Authors: Geetanjali Sharma, Amit M. Joshi

Abstract: In the field of neuroscience, Brain activity analysis is always considered as an important area. Schizophrenia(Sz) is a brain disorder that severely affects the thinking, behaviour, and feelings of people all around the world. Electroencephalography (EEG) is proved to be an efficient biomarker in Sz detection. EEG is a non-linear time-seriesi signal and utilizing it for investigation is rather cru… ▽ More In the field of neuroscience, Brain activity analysis is always considered as an important area. Schizophrenia(Sz) is a brain disorder that severely affects the thinking, behaviour, and feelings of people all around the world. Electroencephalography (EEG) is proved to be an efficient biomarker in Sz detection. EEG is a non-linear time-seriesi signal and utilizing it for investigation is rather crucial due to its non-linear structure. This paper aims to improve the performance of EEG based Sz detection using a deep learning approach. A novel hybrid deep learning model known as SzHNN (Schizophrenia Hybrid Neural Network), a combination of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) has been proposed. CNN network is used for local feature extraction and LSTM has been utilized for classification. The proposed model has been compared with CNN only, LSTM only, and machine learning-based models. All the models have been evaluated on two different datasets wherein Dataset 1 consists of 19 subjects and Dataset 2 consists of 16 subjects. Several experiments have been conducted for the same using various parametric settings on different frequency bands and using different sets of electrodes on the scalp. Based on all the experiments, it is evident that the proposed hybrid model (SzHNN) provides the highest classification accuracy of 99.9% in comparison to other existing models. The proposed model overcomes the influence of different frequency bands and even showed a much better accuracy of 91% with only 5 electrodes. The proposed model is also evaluated on the Internet of Medical Things (IoMT) framework for smart healthcare and remote monitoring applications. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: 18 pages, 9 Figures

arXiv:2109.04194 [pdf, other]

Novel Time Domain Based Upper-Limb Prosthesis Control using Incremental Learning Approach

Authors: Sidharth Pancholi, Amit M. Joshi Deepak Joshi, Bradly S. Duerstock

Abstract: The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the… ▽ More The upper limb of the body is a vital for various kind of activities for human. The complete or partial loss of the upper limb would lead to a significant impact on daily activities of the amputees. EMG carries important information of human physique which helps to decode the various functionalities of human arm. EMG signal based bionics and prosthesis have gained huge research attention over the past decade. Conventional EMG-PR based prosthesis struggles to give accurate performance due to off-line training used and incapability to compensate for electrode position shift and change in arm position. This work proposes online training and incremental learning based system for upper limb prosthetic application. This system consists of ADS1298 as AFE (analog front end) and a 32 bit arm cortex-m4 processor for DSP (digital signal processing). The system has been tested for both intact and amputated subjects. Time derivative moment based features have been implemented and utilized for effective pattern classification. Initially, system have been trained for four classes using the on-line training process later on the number of classes have been incremented on user demand till eleven, and system performance has been evaluated. The system yielded a completion rate of 100% for healthy and amputated subjects when four motions have been considered. Further 94.33% and 92% completion rate have been showcased by the system when the number of classes increased to eleven for healthy and amputees respectively. The motion efficacy test is also evaluated for all the subjects. The highest efficacy rate of 91.23% and 88.64% are observed for intact and amputated subjects respectively. △ Less

Submitted 13 January, 2024; v1 submitted 25 August, 2021; originally announced September 2021.

Comments: 15 Pages, 8 Figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2107.08514 [pdf, other]

Classification of Upper Arm Movements from EEG signals using Machine Learning with ICA Analysis

Authors: Pranali Kokate, Sidharth Pancholi, Amit M. Joshi

Abstract: The Brain-Computer Interface system is a profoundly developing area of experimentation for Motor activities which plays vital role in decoding cognitive activities. Classification of Cognitive-Motor Imagery activities from EEG signals is a critical task. Hence proposed a unique algorithm for classifying left/right-hand movements by utilizing Multi-layer Perceptron Neural Network. Handcrafted stati… ▽ More The Brain-Computer Interface system is a profoundly developing area of experimentation for Motor activities which plays vital role in decoding cognitive activities. Classification of Cognitive-Motor Imagery activities from EEG signals is a critical task. Hence proposed a unique algorithm for classifying left/right-hand movements by utilizing Multi-layer Perceptron Neural Network. Handcrafted statistical Time domain and Power spectral density frequency domain features were extracted and obtained a combined accuracy of 96.02%. Results were compared with the deep learning framework. In addition to accuracy, Precision, F1-Score, and recall was considered as the performance metrics. The intervention of unwanted signals contaminates the EEG signals which influence the performance of the algorithm. Therefore, a novel approach was approached to remove the artifacts using Independent Components Analysis which boosted the performance. Following the selection of appropriate feature vectors that provided acceptable accuracy. The same method was used on all nine subjects. As a result, intra-subject accuracy was obtained for 9 subjects 94.72%. The results show that the proposed approach would be useful to classify the upper limb movements accurately. △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: 41 Pages, Figures 32, Table 9

arXiv:2107.06955 [pdf, ps, other]

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

Authors: Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer

Abstract: We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of… ▽ More We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision (e.g. class and id attributes often encode document category information), and (3) it allows for new structured prompting that follows the established semantics of HTML (e.g. to do zero-shot summarization by infilling title tags for a webpage that contains the input text). We show that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels. HTLM matches or exceeds the performance of comparably sized text-only LMs for zero-shot prompting and fine-tuning for classification benchmarks, while also setting new state-of-the-art performance levels for zero-shot summarization. We also find that hyper-text prompts provide more value to HTLM, in terms of data efficiency, than plain text prompts do for existing LMs, and that HTLM is highly effective at auto-prompting itself, by simply generating the most likely hyper-text formatting for any available training data. We will release all code and models to support future HTLM research. △ Less

Submitted 14 July, 2021; originally announced July 2021.

arXiv:2106.05365 [pdf, other]

DESCGEN: A Distantly Supervised Dataset for Generating Abstractive Entity Descriptions

Authors: Weijia Shi, Mandar Joshi, Luke Zettlemoyer

Abstract: Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering. However, generating entity descriptions, especially for new and long-tail entities, can be challenging since relevant information is often scattered across multiple sources with varied content and… ▽ More Short textual descriptions of entities provide summaries of their key attributes and have been shown to be useful sources of background knowledge for tasks such as entity linking and question answering. However, generating entity descriptions, especially for new and long-tail entities, can be challenging since relevant information is often scattered across multiple sources with varied content and style. We introduce DESCGEN: given mentions spread over multiple documents, the goal is to generate an entity summary description. DESCGEN consists of 37K entity descriptions from Wikipedia and Fandom, each paired with nine evidence documents on average. The documents were collected using a combination of entity linking and hyperlinks to the Wikipedia and Fandom entity pages, which together provide high-quality distant supervision. The resulting summaries are more abstractive than those found in existing datasets and provide a better proxy for the challenge of describing new and emerging entities. We also propose a two-stage extract-then-generate baseline and show that there exists a large gap (19.9% in ROUGE-L) between state-of-the-art models and human performance, suggesting that the data will support significant future work. △ Less

Submitted 16 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Journal ref: ACL-IJCNLP 2021

arXiv:2106.04192 [pdf, other]

Realistic Evaluation Principles for Cross-document Coreference Resolution

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regardi… ▽ More We point out that common evaluation practices for cross-document coreference resolution have been unrealistically permissive in their assumed settings, yielding inflated results. We propose addressing this issue via two evaluation methodology principles. First, as in other tasks, models should be evaluated on predicted mentions rather than on gold mentions. Doing this raises a subtle issue regarding singleton coreference clusters, which we address by decoupling the evaluation of mention detection from that of coreference linking. Second, we argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset, forcing models to confront the lexical ambiguity challenge, as intended by the dataset creators. We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model, yielding a score which is 33 F1 lower compared to evaluating by prior lenient practices. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: *SEM 2021

arXiv:2106.01210 [pdf, other]

Cross-document Coreference Resolution over Predicted Mentions

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference reso… ▽ More Coreference resolution has been mostly investigated within a single document scope, showing impressive progress in recent years based on end-to-end models. However, the more challenging task of cross-document (CD) coreference resolution remained relatively under-explored, with the few recent models applied only to gold mentions. Here, we introduce the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for within-document coreference to the CD setting. Our model achieves competitive results for event and entity coreference resolution on gold mentions. More importantly, we set first baseline results, on the standard ECB+ dataset, for CD coreference resolution over predicted mentions. Further, our model is simpler and more efficient than recent CD coreference resolution systems, while not using any external resources. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Findings of ACL 2021

arXiv:2102.09866 [pdf]

KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Code-Mixed Social Media text

Authors: Varsha Pathak, Manish Joshi, Prasad Joshi, Monica Mundada, Tanmay Joshi

Abstract: This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers… ▽ More This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers. These datasets are used to train the machine using different machine learning algorithms, based on classification and regression models. The datasets consist of tweets or YouTube comments with two class labels offensive and not offensive. The machine is trained to classify such social media messages in these two categories. Appropriate n-gram feature sets are extracted to learn the specific characteristics of the Hate Speech text messages. These feature models are based on TFIDF weights of n-gram. The referred work and respective experiments show that the features such as word, character and combined model of word and character n-grams could be used to identify the term patterns of offensive text contents. As a part of the HASOC shared task, the test data sets are made available by the HASOC track organizers. The best performing classification models developed for both languages are applied on test datasets. The model which gives the highest accuracy result on training dataset for Malayalam language was experimented to predict the categories of respective test data. This system has obtained an F1 score of 0.77. Similarly the best performing model for Tamil language has obtained an F1 score of 0.87. This work has received 2nd and 3rd rank in this shared Task 2 for Malayalam and Tamil language respectively. The proposed system is named HASOC_kbcnmujal. △ Less

Submitted 19 February, 2021; originally announced February 2021.

arXiv:2102.07983 [pdf, other]

FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary

Authors: Terra Blevins, Mandar Joshi, Luke Zettlemoyer

Abstract: Current models for Word Sense Disambiguation (WSD) struggle to disambiguate rare senses, despite reaching human performance on global WSD metrics. This stems from a lack of data for both modeling and evaluating rare senses in existing WSD datasets. In this paper, we introduce FEWS (Few-shot Examples of Word Senses), a new low-shot WSD dataset automatically extracted from example sentences in Wikti… ▽ More Current models for Word Sense Disambiguation (WSD) struggle to disambiguate rare senses, despite reaching human performance on global WSD metrics. This stems from a lack of data for both modeling and evaluating rare senses in existing WSD datasets. In this paper, we introduce FEWS (Few-shot Examples of Word Senses), a new low-shot WSD dataset automatically extracted from example sentences in Wiktionary. FEWS has high sense coverage across different natural language domains and provides: (1) a large training set that covers many more senses than previous datasets and (2) a comprehensive evaluation set containing few- and zero-shot examples of a wide variety of senses. We establish baselines on FEWS with knowledge-based and neural WSD approaches and present transfer learning experiments demonstrating that models additionally trained with FEWS better capture rare senses in existing WSD datasets. Finally, we find humans outperform the best baseline models on FEWS, indicating that FEWS will support significant future work on low-shot WSD. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: EACL 2021

arXiv:2011.03713 [pdf, other]

Naturalization of Text by the Insertion of Pauses and Filler Words

Authors: Richa Sharma, Parth Vipul Shah, Ashwini M. Joshi

Abstract: In this article, we introduce a set of methods to naturalize text based on natural human speech. Voice-based interactions provide a natural way of interfacing with electronic systems and are seeing a widespread adaptation of late. These computerized voices can be naturalized to some degree by inserting pauses and filler words at appropriate positions. The first proposed text transformation method… ▽ More In this article, we introduce a set of methods to naturalize text based on natural human speech. Voice-based interactions provide a natural way of interfacing with electronic systems and are seeing a widespread adaptation of late. These computerized voices can be naturalized to some degree by inserting pauses and filler words at appropriate positions. The first proposed text transformation method uses the frequency of bigrams in the training data to make appropriate insertions in the input sentence. It uses a probability distribution to choose the insertions from a set of all possible insertions. This method is fast and can be included before a Text-To-Speech module. The second method uses a Recurrent Neural Network to predict the next word to be inserted. It confirms the insertions given by the bigram method. Additionally, the degree of naturalization can be controlled in both these methods. On the conduction of a blind survey, we conclude that the output of these text transformation methods is comparable to natural speech. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: Keywords: Text transformation, natural speech, bigram, RNN, filler words

arXiv:2010.03378 [pdf]

Descriptive analysis of computational methods for automating mammograms with practical applications

Authors: Aparna Bhale, Manish Joshi

Abstract: Mammography is a vital screening technique for early revealing and identification of breast cancer in order to assist to decrease mortality rate. Practical applications of mammograms are not limited to breast cancer revealing, identification ,but include task based lens design, image compression, image classification, content based image retrieval and a host of others. Mammography computational an… ▽ More Mammography is a vital screening technique for early revealing and identification of breast cancer in order to assist to decrease mortality rate. Practical applications of mammograms are not limited to breast cancer revealing, identification ,but include task based lens design, image compression, image classification, content based image retrieval and a host of others. Mammography computational analysis methods are a useful tool for specialists to reveal hidden features and extract significant information in mammograms. Digital mammograms are mammography images available along with the conventional screen-film mammography to make automation of mammograms easier. In this paper, we descriptively discuss computational advancement in digital mammograms to serve as a compass for research and practice in the domain of computational mammography and related fields. The discussion focuses on research aiming at a variety of applications and automations of mammograms. It covers different perspectives on image pre-processing, feature extraction, application of mammograms, screen-film mammogram, digital mammogram and development of benchmark corpora for experimenting with digital mammograms. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 33 pages and 2 Figures. A review paper of the research work related to mamography

arXiv:2009.11032 [pdf, other]

Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan

Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, dis… ▽ More Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient, leading to incomparable results across works and overestimation of performance. To facilitate proper future research on this task, our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, disregards singleton prediction, and addresses typical targeted settings in CD coreference resolution. Aiming to set baseline results for future research that would follow our evaluation methodology, we build the first end-to-end model for this task. Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting, which outperforms state-of-the-art results by a significant margin. △ Less

Submitted 23 October, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

arXiv:2008.12905 [pdf, other]

Batching and Matching for Food Delivery in Dynamic Road Networks

Authors: Manas Joshi, Arshdeep Singh, Sayan Ranu, Amitabha Bagchi, Priyank Karia, Puneet Kala

Abstract: Given a stream of food orders and available delivery vehicles, how should orders be assigned to vehicles so that the delivery time is minimized? Several decisions have to be made: (1) assignment of orders to vehicles, (2) grouping orders into batches to cope with limited vehicle availability, and (3) adapting to dynamic positions of delivery vehicles. We show that the minimization problem is not o… ▽ More Given a stream of food orders and available delivery vehicles, how should orders be assigned to vehicles so that the delivery time is minimized? Several decisions have to be made: (1) assignment of orders to vehicles, (2) grouping orders into batches to cope with limited vehicle availability, and (3) adapting to dynamic positions of delivery vehicles. We show that the minimization problem is not only NP-hard but inapproximable in polynomial time. To mitigate this computational bottleneck, we develop an algorithm called FoodMatch, which maps the vehicle assignment problem to that of minimum weight perfect matching on a bipartite graph. To further reduce the quadratic construction cost of the bipartite graph, we deploy best-first search to only compute a subgraph that is highly likely to contain the minimum matching. The solution quality is further enhanced by reducing batching to a graph clustering problem and anticipating dynamic positions of vehicles through angular distance. Extensive experiments on food-delivery data from large metropolitan cities establish that FoodMatch is substantially better than baseline strategies on a number of metrics, while being efficient enough to handle real-world workloads. △ Less

Submitted 28 August, 2020; originally announced August 2020.

Comments: 12 pages, 9 figures, Accepted in ICDE 2021 as Short Paper

arXiv:2005.00652 [pdf, other]

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

Authors: Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi, Luke Zettlemoyer

Abstract: Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text. A rationale should be as concise as possible without significantly degrading task performance, but this balance can be difficult to achieve in practice. In this paper, we show that it is possible to better manage this trade-off by optimizing a bound on the… ▽ More Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text. A rationale should be as concise as possible without significantly degrading task performance, but this balance can be difficult to achieve in practice. In this paper, we show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective. Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale. Using IB, we derive a learning objective that allows direct control of mask sparsity levels through a tunable sparse prior. Experiments on ERASER benchmark tasks demonstrate significant gains over norm-minimization techniques for both task performance and agreement with human rationales. Furthermore, we find that in the semi-supervised setting, a modest amount of gold rationales (25% of training examples) closes the gap with a model that uses the full input. △ Less

Submitted 2 November, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: EMNLP 2020 main track accepted paper

arXiv:2004.12006 [pdf, other]

Contextualized Representations Using Textual Encyclopedic Knowledge

Authors: Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

Abstract: We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for ta… ▽ More We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1). △ Less

Submitted 13 July, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: Added experiments comparing linkers

arXiv:1911.03052 [pdf, ps, other]

A Novel Approach for Partial Fingerprint Identification to Mitigate MasterPrint Generation

Authors: Mahesh Joshi, Bodhisatwa Mazumdar, Somnath Dey

Abstract: Partial fingerprint recognition is a method to recognize an individual when the sensor size has a small form factor in accepting a full fingerprint. It is also used in forensic research to identify the partial fingerprints collected from the crime scenes. But the distinguishing features in the partial fingerprint are relatively low due to small fingerprint captured by the sensor. Hence, the unique… ▽ More Partial fingerprint recognition is a method to recognize an individual when the sensor size has a small form factor in accepting a full fingerprint. It is also used in forensic research to identify the partial fingerprints collected from the crime scenes. But the distinguishing features in the partial fingerprint are relatively low due to small fingerprint captured by the sensor. Hence, the uniqueness of a partial fingerprint cannot be guaranteed, leading to a possibility that a single partial fingerprint may identify multiple subjects. A MasterPrint is a partial fingerprint that identifies at least 4% different individuals from the enrolled template database. A fingerprint identification system with such a flaw can play a significant role in convicting an innocent in a criminal case. We propose a partial fingerprint identification approach that aims to mitigate MasterPrint generation. The proposed method, when applied to partial fingerprint dataset cropped from standard FVC 2002 DB1(A) dataset showed significant improvement in reducing the count of MasterPrints. The experimental result demonstrates improved results on other parameters, such as True match Rate (TMR) and Equal Error Rate (EER), generally used to evaluate the performance of a fingerprint biometric system. △ Less

Submitted 8 November, 2019; originally announced November 2019.

arXiv:1910.07233 [pdf]

Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries

Authors: Varsha Pathak, Manish Joshi

Abstract: Query term matching with document term matching is the basic function of any best effort Information Retrieval models like Vector Space Model. In our problem of SMS based Information Systems we expect common people to participate in information search. Our system allows mobile users to formulate their queries in their own words, own transliteration style and spelling formation. To achieve this fle… ▽ More Query term matching with document term matching is the basic function of any best effort Information Retrieval models like Vector Space Model. In our problem of SMS based Information Systems we expect common people to participate in information search. Our system allows mobile users to formulate their queries in their own words, own transliteration style and spelling formation. To achieve this flexibility we have resolved the term level ambiguity due to inherent transcription noise in user query terms. We have developed a rule based approach to select most relevantly close standard term for each noisy term in the user query. We have used four different versions of the rule based algorithm with variation in the rule set. We have formulated this rule set including the basic Levenshtein minimum edit distance algorithm for term matching. This paper presents the experiments and corresponding results of Marathi and Hindi language literature information system. We have experimented on Marathi and Hindi literature which include songs, gazals, powadas, bharud and other types in a standard transliteration form like ITRANS. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: 11 pages, 2 figures, 2 tables, Unpublished

MSC Class: 68T35

arXiv:1908.09091 [pdf, ps, other]

BERT for Coreference Resolution: Baselines and Analysis

Authors: Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

Abstract: We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling docum… ▽ More We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. Our code and models are publicly available. △ Less

Submitted 22 December, 2019; v1 submitted 24 August, 2019; originally announced August 2019.

Comments: Fix test set numbers for e2e-coref on GAP

arXiv:1907.11692 [pdf, ps, other]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Authors: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov

Abstract: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that caref… ▽ More Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code. △ Less

Submitted 26 July, 2019; originally announced July 2019.

arXiv:1907.10529 [pdf, other]

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Authors: Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Abstract: We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperform… ▽ More We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6\% F1), strong performance on the TACRED relation extraction benchmark, and even show gains on GLUE. △ Less

Submitted 17 January, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: Accepted at TACL

arXiv:1907.02014 [pdf, other]

Using AI for Economic Upliftment of Handicraft Industry

Authors: Nitya Raviprakash, Sonam Damani, Ankush Chatterjee, Meghana Joshi, Puneet Agrawal

Abstract: The handicraft industry is a strong pillar of Indian economy which provides large-scale employment opportunities to artisans in rural and underprivileged communities. However, in this era of globalization, diverse modern designs have rendered traditional designs old and monotonous, causing an alarming decline of handicraft sales. For this age-old industry to survive the global competition, it is i… ▽ More The handicraft industry is a strong pillar of Indian economy which provides large-scale employment opportunities to artisans in rural and underprivileged communities. However, in this era of globalization, diverse modern designs have rendered traditional designs old and monotonous, causing an alarming decline of handicraft sales. For this age-old industry to survive the global competition, it is imperative to integrate contemporary designs with Indian handicrafts. In this paper, we use novel AI techniques to generate contemporary designs for two popular Indian handicrafts - Ikat and Block Print. These techniques were successfully employed by communities across India to manufacture and sell products with greater appeal and revenue. The designs are evaluated to be significantly more likeable and marketable than the current designs used by artisans. △ Less

Submitted 31 May, 2019; originally announced July 2019.

arXiv:1906.00606 [pdf, other]

An Extensive Review of Computational Dance Automation Techniques and Applications

Authors: Manish Joshi, Sangeeta Jadhav

Abstract: Dance is an art and when technology meets this kind of art, it's a novel attempt in itself. Several researchers have attempted to automate several aspects of dance, right from dance notation to choreography. Furthermore, we have encountered several applications of dance automation like e-learning, heritage preservation, etc. Despite several attempts by researchers for more than two decades in vari… ▽ More Dance is an art and when technology meets this kind of art, it's a novel attempt in itself. Several researchers have attempted to automate several aspects of dance, right from dance notation to choreography. Furthermore, we have encountered several applications of dance automation like e-learning, heritage preservation, etc. Despite several attempts by researchers for more than two decades in various styles of dance all round the world, we found a review paper that portrays the research status in this area dating to 1990 \cite{politis1990computers}. Hence, we decide to come up with a comprehensive review article that showcases several aspects of dance automation. This paper is an attempt to review research work reported in the literature, categorize and group all research work completed so far in the field of automating dance. We have explicitly identified six major categories corresponding to the use of computers in dance automation namely dance representation, dance capturing, dance semantics, dance generation, dance processing approaches and applications of dance automation systems. We classified several research papers under these categories according to their research approach and functionality. With the help of proposed categories and subcategories one can easily determine the state of research and the new avenues left for exploration in the field of dance automation. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: 15 pages, 6 figures,

arXiv:1810.12097 [pdf, other]

Ruuh: A Deep Learning Based Conversational Social Agent

Authors: Sonam Damani, Nitya Raviprakash, Umang Gupta, Ankush Chatterjee, Meghana Joshi, Khyatti Gupta, Kedhar Nath Narahari, Puneet Agrawal, Manoj Kumar Chinnakotla, Sneha Magapu, Abhishek Mathur

Abstract: Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called "Ruuh" (facebook.com/Ruuh) designed by a team at Microsoft India to converse on… ▽ More Dialogue systems and conversational agents are becoming increasingly popular in the modern society but building an agent capable of holding intelligent conversation with its users is a challenging problem for artificial intelligence. In this demo, we demonstrate a deep learning based conversational social agent called "Ruuh" (facebook.com/Ruuh) designed by a team at Microsoft India to converse on a wide range of topics. Ruuh needs to think beyond the utilitarian notion of merely generating "relevant" responses and meet a wider range of user social needs, like expressing happiness when user's favorite team wins, sharing a cute comment on showing the pictures of the user's pet and so on. The agent also needs to detect and respond to abusive language, sensitive topics and trolling behavior of the users. Many of these problems pose significant research challenges which will be demonstrated in our demo. Our agent has interacted with over 2 million real world users till date which has generated over 150 million user conversations. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: 2 pages, 1 figure

arXiv:1810.08854 [pdf, other]

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Authors: Mandar Joshi, Eunsol Choi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

Abstract: Reasoning about implied relationships (e.g., paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems. This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. Our pairwise embeddings are computed as a compositional function on word represent… ▽ More Reasoning about implied relationships (e.g., paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems. This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships. Our pairwise embeddings are computed as a compositional function on word representations, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the two words co-occur. We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings. Experiments show a gain of 2.7% on the recently released SQuAD2.0 and 1.3% on MultiNLI. Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. (2018). △ Less

Submitted 5 April, 2019; v1 submitted 20 October, 2018; originally announced October 2018.

Comments: NAACL camera ready

arXiv:1805.07116 [pdf, other]

Security Vulnerabilities Against Fingerprint Biometric System

Authors: Mahesh Joshi, Bodhisatwa Mazumdar, Somnath Dey

Abstract: The biometric system is an automatic identification and authentication system that uses unique biological traits, such as fingerprint, face, iris, voice, retina, etc. of an individual. Of all these systems, fingerprint biometric system is the most widely used because of its low cost, high matching speed, and relatively high matching accuracy. Due to the high efficiency of fingerprint biometric sys… ▽ More The biometric system is an automatic identification and authentication system that uses unique biological traits, such as fingerprint, face, iris, voice, retina, etc. of an individual. Of all these systems, fingerprint biometric system is the most widely used because of its low cost, high matching speed, and relatively high matching accuracy. Due to the high efficiency of fingerprint biometric system in verifying a legitimate user, numerous government and private organizations are using this system for security purpose. This paper provides an overview of the fingerprint biometric system and gives details about various current security aspects related to the system. The security concerns that we address include multiple attacks on the system, associated threat models, biometric cryptosystems, current issues, challenges, opportunities, and open problems that exist in present day fingerprint biometric systems △ Less

Submitted 18 May, 2018; originally announced May 2018.

arXiv:1705.06338 [pdf, other]

Distributed Vector Representation Of Shopping Items, The Customer And Shopping Cart To Build A Three Fold Recommendation System

Authors: Bibek Behera, Manoj Joshi, Abhilash KK, Mohammad Ansari Ismail

Abstract: The main idea of this paper is to represent shopping items through vectors because these vectors act as the base for building em- beddings for customers and shopping carts. Also, these vectors are input to the mathematical models that act as either a recommendation engine or help in targeting potential customers. We have used exponential family embeddings as the tool to construct two basic vectors… ▽ More The main idea of this paper is to represent shopping items through vectors because these vectors act as the base for building em- beddings for customers and shopping carts. Also, these vectors are input to the mathematical models that act as either a recommendation engine or help in targeting potential customers. We have used exponential family embeddings as the tool to construct two basic vectors - product embeddings and context vectors. Using the basic vectors, we build combined embeddings, trip embeddings and customer embeddings. Combined embeddings mix linguistic properties of product names with their shopping patterns. The customer embeddings establish an understand- ing of the buying pattern of customers in a group and help in building customer profile. For example a customer profile can represent customers frequently buying pet-food. Identifying such profiles can help us bring out offers and discounts. Similarly, trip embeddings are used to build trip profiles. People happen to buy similar set of products in a trip and hence their trip embeddings can be used to predict the next product they would like to buy. This is a novel technique and the first of its kind to make recommendation using product, trip and customer embeddings. △ Less

Submitted 17 May, 2017; originally announced May 2017.

Comments: Cicling 2017

arXiv:1705.03551 [pdf, other]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Authors: Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer

Abstract: We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduc… ▽ More We present TriviaQA, a challenging reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. We show that, in comparison to other recently introduced large-scale datasets, TriviaQA (1) has relatively complex, compositional questions, (2) has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and (3) requires more cross sentence reasoning to find answers. We also present two baseline algorithms: a feature-based classifier and a state-of-the-art neural network, that performs well on SQuAD reading comprehension. Neither approach comes close to human performance (23% and 40% vs. 80%), suggesting that TriviaQA is a challenging testbed that is worth significant future study. Data and code available at -- http://nlp.cs.washington.edu/triviaqa/ △ Less

Submitted 13 May, 2017; v1 submitted 9 May, 2017; originally announced May 2017.

Comments: Added references, fixed typos, minor baseline update

arXiv:1512.01755 [pdf, ps, other]

A post-processing technique for stabilizing the discontinuous pressure projection operator in marginally-resolved incompressible inviscid flow

Authors: Sumedh M. Joshi, Peter J. Diamessis, Derek T. Steinmoeller, Marek Stastna, Greg N. Thomsen

Abstract: A method for post-processing the velocity after a pressure projection is developed that helps to maintain stability in an under-resolved, inviscid, discontinuous element-based simulation for use in environmental fluid mechanics process studies. The post-processing method is needed because of spurious divergence growth at element interfaces due to the discontinuous nature of the discretization used… ▽ More A method for post-processing the velocity after a pressure projection is developed that helps to maintain stability in an under-resolved, inviscid, discontinuous element-based simulation for use in environmental fluid mechanics process studies. The post-processing method is needed because of spurious divergence growth at element interfaces due to the discontinuous nature of the discretization used. This spurious divergence eventually leads to a numerical instability. Previous work has shown that a discontinuous element-local projection onto the space of divergence-free basis functions is capable of stabilizing the projection method, but the discontinuity inherent in this technique may lead to instability in under-resolved simulations. By enforcing inter-element discontinuity and requiring a divergence-free result in the weak sense only, a new post-processing technique is developed that simultaneously improves smoothness and reduces divergence in the pressure-projected velocity field at the same time. When compared against a non-post-processed velocity field, the post-processed velocity field remains stable far longer and exhibits better smoothness and conservation properties. △ Less

Submitted 6 December, 2015; originally announced December 2015.

Showing 1–50 of 55 results for author: Joshi, M