-
An Online Probabilistic Distributed Tracing System
Authors:
M. Toslali,
S. Qasim,
S. Parthasarathy,
F. A. Oliveira,
H. Huang,
G. Stringhini,
Z. Liu,
A. K. Coskun
Abstract:
Distributed tracing has become a fundamental tool for diagnosing performance issues in the cloud by recording causally ordered, end-to-end workflows of request executions. However, tracing in production workloads can introduce significant overheads due to the extensive instrumentation needed for identifying performance variations. This paper addresses the trade-off between the cost of tracing and…
▽ More
Distributed tracing has become a fundamental tool for diagnosing performance issues in the cloud by recording causally ordered, end-to-end workflows of request executions. However, tracing in production workloads can introduce significant overheads due to the extensive instrumentation needed for identifying performance variations. This paper addresses the trade-off between the cost of tracing and the utility of the "spans" within that trace through Astraea, an online probabilistic distributed tracing system. Astraea is based on our technique that combines online Bayesian learning and multi-armed bandit frameworks. This formulation enables Astraea to effectively steer tracing towards the useful instrumentation needed for accurate performance diagnosis. Astraea localizes performance variations using only 10-28% of available instrumentation, markedly reducing tracing overhead, storage, compute costs, and trace analysis time.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs
Authors:
Deepa Tilwani,
Yash Saxena,
Ali Mohammadi,
Edward Raff,
Amit Sheth,
Srinivasan Parthasarathy,
Manas Gaur
Abstract:
Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article,…
▽ More
Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with Perplexity.ai (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93%, and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study contributes valuable insights into the reliability of RAG for automated citation generation tasks.
△ Less
Submitted 8 May, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository
Authors:
Ajinkya Deshpande,
Anmol Agarwal,
Shashank Shet,
Arun Iyer,
Aditya Kanade,
Ramakrishna Bairi,
Suresh Parthasarathy
Abstract:
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, ne…
▽ More
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, neglecting the intricate dependencies & interactions that characterize real-world software environments. To address this gap, we introduce RepoClassBench, a comprehensive benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We ensure that each class in our dataset not only has cross-file dependencies within the repository but also includes corresponding test cases to verify its functionality. We find that current models struggle with the realistic challenges posed by our benchmark, primarily due to their limited exposure to relevant repository contexts. To address this shortcoming, we introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context in an agent-based framework. Our experiments demonstrate that RRR significantly outperforms existing baselines on RepoClassBench, showcasing its effectiveness across programming languages & under various settings. Our findings emphasize the critical need for code-generation benchmarks to incorporate repo-level dependencies to more accurately reflect the complexities of software development. Our work shows the benefits of leveraging specialized tools to enhance LLMs' understanding of repository context. We plan to make our dataset & evaluation harness public.
△ Less
Submitted 5 June, 2024; v1 submitted 21 April, 2024;
originally announced May 2024.
-
HeteroMILE: a Multi-Level Graph Representation Learning Framework for Heterogeneous Graphs
Authors:
Yue Zhang,
Yuntian He,
Saket Gurukar,
Srinivasan Parthasarathy
Abstract:
Heterogeneous graphs are ubiquitous in real-world applications because they can represent various relationships between different types of entities. Therefore, learning embeddings in such graphs is a critical problem in graph machine learning. However, existing solutions for this problem fail to scale to large heterogeneous graphs due to their high computational complexity. To address this issue,…
▽ More
Heterogeneous graphs are ubiquitous in real-world applications because they can represent various relationships between different types of entities. Therefore, learning embeddings in such graphs is a critical problem in graph machine learning. However, existing solutions for this problem fail to scale to large heterogeneous graphs due to their high computational complexity. To address this issue, we propose a Multi-Level Embedding framework of nodes on a heterogeneous graph (HeteroMILE) - a generic methodology that allows contemporary graph embedding methods to scale to large graphs. HeteroMILE repeatedly coarsens the large sized graph into a smaller size while preserving the backbone structure of the graph before embedding it, effectively reducing the computational cost by avoiding time-consuming processing operations. It then refines the coarsened embedding to the original graph using a heterogeneous graph convolution neural network. We evaluate our approach using several popular heterogeneous graph datasets. The experimental results show that HeteroMILE can substantially reduce computational time (approximately 20x speedup) and generate an embedding of better quality for link prediction and node classification.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Grounding from an AI and Cognitive Science Lens
Authors:
Goonmeet Bajaj,
Srinivasan Parthasarathy,
Valerie L. Shalin,
Amit Sheth
Abstract:
Grounding is a challenging problem, requiring a formal definition and different levels of abstraction. This article explores grounding from both cognitive science and machine learning perspectives. It identifies the subtleties of grounding, its significance for collaborative agents, and similarities and differences in grounding approaches in both communities. The article examines the potential of…
▽ More
Grounding is a challenging problem, requiring a formal definition and different levels of abstraction. This article explores grounding from both cognitive science and machine learning perspectives. It identifies the subtleties of grounding, its significance for collaborative agents, and similarities and differences in grounding approaches in both communities. The article examines the potential of neuro-symbolic approaches tailored for grounding tasks, showcasing how they can more comprehensively address grounding. Finally, we discuss areas for further exploration and development in grounding.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
Authors:
Amin Karimi Monsefi,
Payam Karisani,
Mengxi Zhou,
Stacey Choi,
Nathan Doble,
Heng Ji,
Srinivasan Parthasarathy,
Rajiv Ramnath
Abstract:
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introdu…
▽ More
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Modeling Sequences as Star Graphs to Address Over-smoothing in Self-attentive Sequential Recommendation
Authors:
Bo Peng,
Ziqi Chen,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Self-attention (SA) mechanisms have been widely used in developing sequential recommendation (SR) methods, and demonstrated state-of-the-art performance. However, in this paper, we show that self-attentive SR methods substantially suffer from the over-smoothing issue that item embeddings within a sequence become increasingly similar across attention blocks. As widely demonstrated in the literature…
▽ More
Self-attention (SA) mechanisms have been widely used in developing sequential recommendation (SR) methods, and demonstrated state-of-the-art performance. However, in this paper, we show that self-attentive SR methods substantially suffer from the over-smoothing issue that item embeddings within a sequence become increasingly similar across attention blocks. As widely demonstrated in the literature, this issue could lead to a loss of information in individual items, and significantly degrade models' scalability and performance. To address the over-smoothing issue, in this paper, we view items within a sequence constituting a star graph and develop a method, denoted as MSSG, for SR. Different from existing self-attentive methods, MSSG introduces an additional internal node to specifically capture the global information within the sequence, and does not require information propagation among items. This design fundamentally addresses the over-smoothing issue and enables MSSG a linear time complexity with respect to the sequence length. We compare MSSG with ten state-of-the-art baseline methods on six public benchmark datasets. Our experimental results demonstrate that MSSG significantly outperforms the baseline methods, with an improvement of as much as 10.10%. Our analysis shows the superior scalability of MSSG over the state-of-the-art self-attentive methods. Our complexity analysis and run-time performance comparison together show that MSSG is both theoretically and practically more efficient than self-attentive methods. Our analysis of the attention weights learned in SA-based methods indicates that on sparse recommendation data, modeling dependencies in all item pairs using the SA mechanism yields limited information gain, and thus, might not benefit the recommendation performance
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Towards Efficient and Effective Adaptation of Large Language Models for Sequential Recommendation
Authors:
Bo Peng,
Ben Burns,
Ziqi Chen,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
In recent years, with large language models (LLMs) achieving state-of-the-art performance in context understanding, increasing efforts have been dedicated to developing LLM-enhanced sequential recommendation (SR) methods. Considering that most existing LLMs are not specifically optimized for recommendation tasks, adapting them for SR becomes a critical step in LLM-enhanced SR methods. Though numer…
▽ More
In recent years, with large language models (LLMs) achieving state-of-the-art performance in context understanding, increasing efforts have been dedicated to developing LLM-enhanced sequential recommendation (SR) methods. Considering that most existing LLMs are not specifically optimized for recommendation tasks, adapting them for SR becomes a critical step in LLM-enhanced SR methods. Though numerous adaptation methods have been developed, it still remains a significant challenge to adapt LLMs for SR both efficiently and effectively. To address this challenge, in this paper, we introduce a novel side sequential network adaptation method, denoted as SSNA, for LLM enhanced SR. SSNA features three key designs to allow both efficient and effective LLM adaptation. First, SSNA learns adapters separate from LLMs, while fixing all the pre-trained parameters within LLMs to allow efficient adaptation. In addition, SSNA adapts the top-a layers of LLMs jointly, and integrates adapters sequentially for enhanced effectiveness (i.e., recommendation performance). We compare SSNA against five state-of-the-art baseline methods on five benchmark datasets using three LLMs. The experimental results demonstrate that SSNA significantly outperforms all the baseline methods in terms of recommendation performance, and achieves substantial improvement over the best-performing baseline methods at both run-time and memory efficiency during training. Our analysis shows the effectiveness of integrating adapters in a sequential manner. Our parameter study demonstrates the effectiveness of jointly adapting the top-a layers of LLMs.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Frustrated with Code Quality Issues? LLMs can Help!
Authors:
Nalin Wadhwa,
Jui Pradhan,
Atharv Sonwane,
Surya Prakash Sahu,
Nagarajan Natarajan,
Aditya Kanade,
Suresh Parthasarathy,
Sriram Rajamani
Abstract:
As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the u…
▽ More
As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the use of (instruction-following) large language models (LLMs) to assist developers in revising code to resolve code quality issues. We present a tool, CORE (short for COde REvisions), architected using a pair of LLMs organized as a duo comprised of a proposer and a ranker. Providers of static analysis tools recommend ways to mitigate the tool warnings and developers follow them to revise their code. The \emph{proposer LLM} of CORE takes the same set of recommendations and applies them to generate candidate code revisions. The candidates which pass the static quality checks are retained. However, the LLM may introduce subtle, unintended functionality changes which may go un-detected by the static analysis. The \emph{ranker LLM} evaluates the changes made by the proposer using a rubric that closely follows the acceptance criteria that a developer would enforce. CORE uses the scores assigned by the ranker LLM to rank the candidate revisions before presenting them to the developer. CORE could revise 59.2% Python files (across 52 quality checks) so that they pass scrutiny by both a tool and a human reviewer. The ranker LLM is able to reduce false positives by 25.8% in these cases. CORE produced revisions that passed the static analysis tool in 76.8% Java files (across 10 quality checks) comparable to 78.3% of a specialized program repair tool, with significantly much less engineering efforts.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
CodePlan: Repository-level Coding using LLMs and Planning
Authors:
Ramakrishna Bairi,
Atharv Sonwane,
Aditya Kanade,
Vageesh D C,
Arun Iyer,
Suresh Parthasarathy,
Sriram Rajamani,
B. Ashok,
Shashank Shet
Abstract:
Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks.
Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succ…
▽ More
Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks.
Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan to solve it. CodePlan synthesizes a multi-step chain of edits (plan), where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. CodePlan is based on a novel combination of an incremental dependency analysis, a change may-impact analysis and an adaptive planning algorithm.
We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python). Each task is evaluated on multiple code repositories, each of which requires inter-dependent changes to many files (between 2-97 files). Coding tasks of this level of complexity have not been automated using LLMs before. Our results show that CodePlan has better match with the ground truth compared to baselines. CodePlan is able to get 5/6 repositories to pass the validity checks (e.g., to build without errors and make correct code edits) whereas the baselines (without planning but with the same type of contextual information as CodePlan) cannot get any of the repositories to pass them.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Multi-modality Meets Re-learning: Mitigating Negative Transfer in Sequential Recommendation
Authors:
Bo Peng,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Learning effective recommendation models from sparse user interactions represents a fundamental challenge in developing sequential recommendation methods. Recently, pre-training-based methods have been developed to tackle this challenge. Though promising, in this paper, we show that existing methods suffer from the notorious negative transfer issue, where the model adapted from the pre-trained mod…
▽ More
Learning effective recommendation models from sparse user interactions represents a fundamental challenge in developing sequential recommendation methods. Recently, pre-training-based methods have been developed to tackle this challenge. Though promising, in this paper, we show that existing methods suffer from the notorious negative transfer issue, where the model adapted from the pre-trained model results in worse performance compared to the model learned from scratch in the task of interest (i.e., target task). To address this issue, we develop a method, denoted as ANT, for transferable sequential recommendation. ANT mitigates negative transfer by 1) incorporating multi-modality item information, including item texts, images and prices, to effectively learn more transferable knowledge from related tasks (i.e., auxiliary tasks); and 2) better capturing task-specific knowledge in the target task using a re-learning-based adaptation strategy. We evaluate ANT against eight state-of-the-art baseline methods on five target tasks. Our experimental results demonstrate that ANT does not suffer from the negative transfer issue on any of the target tasks. The results also demonstrate that ANT substantially outperforms baseline methods in the target tasks with an improvement of as much as 15.2%. Our analysis highlights the superior effectiveness of our re-learning-based strategy compared to fine-tuning on the target tasks.
△ Less
Submitted 20 September, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Shape-conditioned 3D Molecule Generation via Equivariant Diffusion Models
Authors:
Ziqi Chen,
Bo Peng,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Ligand-based drug design aims to identify novel drug candidates of similar shapes with known active molecules. In this paper, we formulated an in silico shape-conditioned molecule generation problem to generate 3D molecule structures conditioned on the shape of a given molecule. To address this problem, we developed a translation- and rotation-equivariant shape-guided generative model ShapeMol. Sh…
▽ More
Ligand-based drug design aims to identify novel drug candidates of similar shapes with known active molecules. In this paper, we formulated an in silico shape-conditioned molecule generation problem to generate 3D molecule structures conditioned on the shape of a given molecule. To address this problem, we developed a translation- and rotation-equivariant shape-guided generative model ShapeMol. ShapeMol consists of an equivariant shape encoder that maps molecular surface shapes into latent embeddings, and an equivariant diffusion model that generates 3D molecules based on these embeddings. Experimental results show that ShapeMol can generate novel, diverse, drug-like molecules that retain 3D molecular shapes similar to the given shape condition. These results demonstrate the potential of ShapeMol in designing drug candidates of desired 3D shapes binding to protein target pockets.
△ Less
Submitted 16 October, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
StaticFixer: From Static Analysis to Static Repair
Authors:
Naman Jain,
Shubham Gandhi,
Atharv Sonwane,
Aditya Kanade,
Nagarajan Natarajan,
Suresh Parthasarathy,
Sriram Rajamani,
Rahul Sharma
Abstract:
Static analysis tools are traditionally used to detect and flag programs that violate properties. We show that static analysis tools can also be used to perturb programs that satisfy a property to construct variants that violate the property. Using this insight we can construct paired data sets of unsafe-safe program pairs, and learn strategies to automatically repair property violations. We prese…
▽ More
Static analysis tools are traditionally used to detect and flag programs that violate properties. We show that static analysis tools can also be used to perturb programs that satisfy a property to construct variants that violate the property. Using this insight we can construct paired data sets of unsafe-safe program pairs, and learn strategies to automatically repair property violations. We present a system called \sysname, which automatically repairs information flow vulnerabilities using this approach. Since information flow properties are non-local (both to check and repair), \sysname also introduces a novel domain specific language (DSL) and strategy learning algorithms for synthesizing non-local repairs. We use \sysname to synthesize strategies for repairing two types of information flow vulnerabilities, unvalidated dynamic calls and cross-site scripting, and show that \sysname successfully repairs several hundred vulnerabilities from open source {\sc JavaScript} repositories, outperforming neural baselines built using {\sc CodeT5} and {\sc Codex}. Our datasets can be downloaded from \url{http://aka.ms/StaticFixer}.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
PolicyClusterGCN: Identifying Efficient Clusters for Training Graph Convolutional Networks
Authors:
Saket Gurukar,
Shaileshh Bojja Venkatakrishnan,
Balaraman Ravindran,
Srinivasan Parthasarathy
Abstract:
Graph convolutional networks (GCNs) have achieved huge success in several machine learning (ML) tasks on graph-structured data. Recently, several sampling techniques have been proposed for the efficient training of GCNs and to improve the performance of GCNs on ML tasks. Specifically, the subgraph-based sampling approaches such as ClusterGCN and GraphSAINT have achieved state-of-the-art performanc…
▽ More
Graph convolutional networks (GCNs) have achieved huge success in several machine learning (ML) tasks on graph-structured data. Recently, several sampling techniques have been proposed for the efficient training of GCNs and to improve the performance of GCNs on ML tasks. Specifically, the subgraph-based sampling approaches such as ClusterGCN and GraphSAINT have achieved state-of-the-art performance on the node classification tasks. These subgraph-based sampling approaches rely on heuristics -- such as graph partitioning via edge cuts -- to identify clusters that are then treated as minibatches during GCN training. In this work, we hypothesize that rather than relying on such heuristics, one can learn a reinforcement learning (RL) policy to compute efficient clusters that lead to effective GCN performance. To that end, we propose PolicyClusterGCN, an online RL framework that can identify good clusters for GCN training. We develop a novel Markov Decision Process (MDP) formulation that allows the policy network to predict ``importance" weights on the edges which are then utilized by a clustering algorithm (Graclus) to compute the clusters. We train the policy network using a standard policy gradient algorithm where the rewards are computed from the classification accuracies while training GCN using clusters given by the policy. Experiments on six real-world datasets and several synthetic datasets show that PolicyClusterGCN outperforms existing state-of-the-art models on node classification task.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
External Language Model Integration for Factorized Neural Transducers
Authors:
Michael Levit,
Sarangarajan Parthasarathy,
Cem Aksoylar,
Mohammad Sadegh Rasooli,
Shuangyu Chang
Abstract:
We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-base…
▽ More
We propose an adaptation method for factorized neural transducers (FNT) with external language models. We demonstrate that both neural and n-gram external LMs add significantly more value when linearly interpolated with predictor output compared to shallow fusion, thus confirming that FNT forces the predictor to act like regular language models. Further, we propose a method to integrate class-based n-gram language models into FNT framework resulting in accuracy gains similar to a hybrid setup. We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario through a combination of class-based n-gram and neural LMs.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
FairMILE: Towards an Efficient Framework for Fair Graph Representation Learning
Authors:
Yuntian He,
Saket Gurukar,
Srinivasan Parthasarathy
Abstract:
Graph representation learning models have demonstrated great capability in many real-world applications. Nevertheless, prior research indicates that these models can learn biased representations leading to discriminatory outcomes. A few works have been proposed to mitigate the bias in graph representations. However, most existing works require exceptional time and computing resources for training…
▽ More
Graph representation learning models have demonstrated great capability in many real-world applications. Nevertheless, prior research indicates that these models can learn biased representations leading to discriminatory outcomes. A few works have been proposed to mitigate the bias in graph representations. However, most existing works require exceptional time and computing resources for training and fine-tuning. To this end, we study the problem of efficient fair graph representation learning and propose a novel framework FairMILE. FairMILE is a multi-level paradigm that can efficiently learn graph representations while enforcing fairness and preserving utility. It can work in conjunction with any unsupervised embedding approach and accommodate various fairness constraints. Extensive experiments across different downstream tasks demonstrate that FairMILE significantly outperforms state-of-the-art baselines in terms of running time while achieving a superior trade-off between fairness and utility.
△ Less
Submitted 17 October, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Recursive Attentive Methods with Reused Item Representations for Sequential Recommendation
Authors:
Bo Peng,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Sequential recommendation aims to recommend the next item of users' interest based on their historical interactions. Recently, the self-attention mechanism has been adapted for sequential recommendation, and demonstrated state-of-the-art performance. However, in this manuscript, we show that the self-attention-based sequential recommendation methods could suffer from the localization-deficit issue…
▽ More
Sequential recommendation aims to recommend the next item of users' interest based on their historical interactions. Recently, the self-attention mechanism has been adapted for sequential recommendation, and demonstrated state-of-the-art performance. However, in this manuscript, we show that the self-attention-based sequential recommendation methods could suffer from the localization-deficit issue. As a consequence, in these methods, over the first few blocks, the item representations may quickly diverge from their original representations, and thus, impairs the learning in the following blocks. To mitigate this issue, in this manuscript, we develop a recursive attentive method with reused item representations (RAM) for sequential recommendation. We compare RAM with five state-of-the-art baseline methods on six public benchmark datasets. Our experimental results demonstrate that RAM significantly outperforms the baseline methods on benchmark datasets, with an improvement of as much as 11.3%. Our stability analysis shows that RAM could enable deeper and wider models for better performance. Our run-time performance comparison signifies that RAM could also be more efficient on benchmark datasets.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Multilingual Transformer Language Model for Speech Recognition in Low-resource Languages
Authors:
Li Miao,
Jian Wu,
Piyush Behre,
Shuangyu Chang,
Sarangarajan Parthasarathy
Abstract:
It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic. In this study, we present a new way to group multiple low-resource locales tog…
▽ More
It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic. In this study, we present a new way to group multiple low-resource locales together and optimize the performance of Multilingual Transformer LMs in ASR. Our Locale-group Multilingual Transformer LMs outperform traditional multilingual LMs along with reducing maintenance costs and operating expenses. Further, for low-resource but high-traffic locales where deploying monolingual models is feasible, we show that fine-tuning our locale-group multilingual LMs produces better monolingual LM candidates than baseline monolingual LMs.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Prospective Preference Enhanced Mixed Attentive Model for Session-based Recommendation
Authors:
Bo Peng,
Chang-Yu Tai,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Session-based recommendation aims to generate recommendations for the next item of users' interest based on a given session. In this manuscript, we develop prospective preference enhanced mixed attentive model (P2MAM) to generate session-based recommendations using two important factors: temporal patterns and estimates of users' prospective preferences. Unlike existing methods, P2MAM models the te…
▽ More
Session-based recommendation aims to generate recommendations for the next item of users' interest based on a given session. In this manuscript, we develop prospective preference enhanced mixed attentive model (P2MAM) to generate session-based recommendations using two important factors: temporal patterns and estimates of users' prospective preferences. Unlike existing methods, P2MAM models the temporal patterns using a light-weight while effective position-sensitive attention mechanism. In P2MAM, we also leverage the estimate of users' prospective preferences to signify important items, and generate better recommendations. Our experimental results demonstrate that P2MAM models significantly outperform the state-of-the-art methods in six benchmark datasets, with an improvement as much as 19.2%. In addition, our run-time performance comparison demonstrates that during testing, P2MAM models are much more efficient than the best baseline method, with a significant average speedup of 47.7 folds.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite Graphs at Pinterest
Authors:
Saket Gurukar,
Nikil Pancha,
Andrew Zhai,
Eric Kim,
Samson Hu,
Srinivasan Parthasarathy,
Charles Rosenberg,
Jure Leskovec
Abstract:
Graph Convolutional Networks (GCN) can efficiently integrate graph structure and node features to learn high-quality node embeddings. These embeddings can then be used for several tasks such as recommendation and search. At Pinterest, we have developed and deployed PinSage, a data-efficient GCN that learns pin embeddings from the Pin-Board graph. The Pin-Board graph contains pin and board entities…
▽ More
Graph Convolutional Networks (GCN) can efficiently integrate graph structure and node features to learn high-quality node embeddings. These embeddings can then be used for several tasks such as recommendation and search. At Pinterest, we have developed and deployed PinSage, a data-efficient GCN that learns pin embeddings from the Pin-Board graph. The Pin-Board graph contains pin and board entities and the graph captures the pin belongs to a board interaction. However, there exist several entities at Pinterest such as users, idea pins, creators, and there exist heterogeneous interactions among these entities such as add-to-cart, follow, long-click.
In this work, we show that training deep learning models on graphs that captures these diverse interactions would result in learning higher-quality pin embeddings than training PinSage on only the Pin-Board graph. To that end, we model the diverse entities and their diverse interactions through multiple bipartite graphs and propose a novel data-efficient MultiBiSage model. MultiBiSage can capture the graph structure of multiple bipartite graphs to learn high-quality pin embeddings. We take this pragmatic approach as it allows us to utilize the existing infrastructure developed at Pinterest -- such as Pixie system that can perform optimized random-walks on billion node graphs, along with existing training and deployment workflows. We train MultiBiSage on six bipartite graphs including our Pin-Board graph. Our offline metrics show that MultiBiSage significantly outperforms the deployed latest version of PinSage on multiple user engagement metrics.
△ Less
Submitted 21 May, 2022;
originally announced May 2022.
-
UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus
Authors:
Thilini Wijesiriwardene,
Vinh Nguyen,
Goonmeet Bajaj,
Hong Yung Yip,
Vishesh Javangula,
Yuqing Mao,
Kin Wah Fung,
Srinivasan Parthasarathy,
Amit P. Sheth,
Olivier Bodenreider
Abstract:
The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work intr…
▽ More
The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Landmarks and Regions: A Robust Approach to Data Extraction
Authors:
Suresh Parthasarathy,
Lincy Pattanaik,
Anirudh Khatry,
Arun Iyer,
Arjun Radhakrishna,
Sriram Rajamani,
Mohammad Raza
Abstract:
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extra…
▽ More
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extract the desired fields. Such approaches are not robust to format changes in the document, and the extraction process typically fails even if changes are made to parts of the document that are unrelated to the desired fields of interest. We propose a new approach to data extraction based on the concepts of landmarks and regions. Humans routinely use landmarks in manual processing of documents to zoom in and focus their attention on small regions of interest in the document. Inspired by this human intuition, we use the notion of landmarks in program synthesis to automatically synthesize extraction programs that first extract a small region of interest, and then automatically extract the desired value from the region in a subsequent step. We have implemented our landmark-based extraction approach in a tool LRSyn, and show extensive evaluation on documents in HTML as well as scanned images of invoices and receipts. Our results show that our approach is robust to various types of format changes that routinely happen in real-world settings.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
FairEGM: Fair Link Prediction and Recommendation via Emulated Graph Modification
Authors:
Sean Current,
Yuntian He,
Saket Gurukar,
Srinivasan Parthasarathy
Abstract:
As machine learning becomes more widely adopted across domains, it is critical that researchers and ML engineers think about the inherent biases in the data that may be perpetuated by the model. Recently, many studies have shown that such biases are also imbibed in Graph Neural Network (GNN) models if the input graph is biased, potentially to the disadvantage of underserved and underrepresented co…
▽ More
As machine learning becomes more widely adopted across domains, it is critical that researchers and ML engineers think about the inherent biases in the data that may be perpetuated by the model. Recently, many studies have shown that such biases are also imbibed in Graph Neural Network (GNN) models if the input graph is biased, potentially to the disadvantage of underserved and underrepresented communities. In this work, we aim to mitigate the bias learned by GNNs by jointly optimizing two different loss functions: one for the task of link prediction and one for the task of demographic parity. We further implement three different techniques inspired by graph modification approaches: the Global Fairness Optimization (GFO), Constrained Fairness Optimization (CFO), and Fair Edge Weighting (FEW) models. These techniques mimic the effects of changing underlying graph structures within the GNN and offer a greater degree of interpretability over more integrated neural network methods. Our proposed models emulate microscopic or macroscopic edits to the input graph while training GNNs and learn node embeddings that are both accurate and fair under the context of link recommendations. We demonstrate the effectiveness of our approach on four real world datasets and show that we can improve the recommendation fairness by several factors at negligible cost to link prediction accuracy.
△ Less
Submitted 20 October, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Jigsaw: Large Language Models meet Program Synthesis
Authors:
Naman Jain,
Skanda Vaidyanath,
Arun Iyer,
Nagarajan Natarajan,
Suresh Parthasarathy,
Sriram Rajamani,
Rahul Sharma
Abstract:
Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every progra…
▽ More
Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every programmer in the world. On the cautionary side, since these large language models do not understand program semantics, they offer no guarantees about quality of the suggested code. In this paper, we present an approach to augment these large language models with post-processing steps based on program analysis and synthesis techniques, that understand the syntax and semantics of programs. Further, we show that such techniques can make use of user feedback and improve with usage. We present our experiences from building and evaluating such a tool jigsaw, targeted at synthesizing code for using Python Pandas API using multi-modal inputs. Our experience suggests that as these large language models evolve for synthesizing code from intent, jigsaw has an important role to play in improving the accuracy of the systems.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Semi-Supervised Deep Learning for Multiplex Networks
Authors:
Anasua Mitra,
Priyesh Vijayan,
Ranbir Sanasam,
Diganta Goswami,
Srinivasan Parthasarathy,
Balaraman Ravindran
Abstract:
Multiplex networks are complex graph structures in which a set of entities are connected to each other via multiple types of relations, each relation representing a distinct layer. Such graphs are used to investigate many complex biological, social, and technological systems. In this work, we present a novel semi-supervised approach for structure-aware representation learning on multiplex networks…
▽ More
Multiplex networks are complex graph structures in which a set of entities are connected to each other via multiple types of relations, each relation representing a distinct layer. Such graphs are used to investigate many complex biological, social, and technological systems. In this work, we present a novel semi-supervised approach for structure-aware representation learning on multiplex networks. Our approach relies on maximizing the mutual information between local node-wise patch representations and label correlated structure-aware global graph representations to model the nodes and cluster structures jointly. Specifically, it leverages a novel cluster-aware, node-contextualized global graph summary generation strategy for effective joint-modeling of node and cluster representations across the layers of a multiplex network. Empirically, we demonstrate that the proposed architecture outperforms state-of-the-art methods in a range of tasks: classification, clustering, visualization, and similarity search on seven real-world multiplex networks for various experiment settings.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Factorized Neural Transducer for Efficient Language Model Adaptation
Authors:
Xie Chen,
Zhong Meng,
Sarangarajan Parthasarathy,
Jinyu Li
Abstract:
In recent years, end-to-end (E2E) based automatic speech recognition (ASR) systems have achieved great success due to their simplicity and promising performance. Neural Transducer based models are increasingly popular in streaming E2E based ASR systems and have been reported to outperform the traditional hybrid system in some scenarios. However, the joint optimization of acoustic model, lexicon an…
▽ More
In recent years, end-to-end (E2E) based automatic speech recognition (ASR) systems have achieved great success due to their simplicity and promising performance. Neural Transducer based models are increasingly popular in streaming E2E based ASR systems and have been reported to outperform the traditional hybrid system in some scenarios. However, the joint optimization of acoustic model, lexicon and language model in neural Transducer also brings about challenges to utilize pure text for language model adaptation. This drawback might prevent their potential applications in practice. In order to address this issue, in this paper, we propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction, and adopting a standalone language model for the vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition, which allows various language model adaptation techniques to be applied. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation, at the cost of a minor degradation in WER on a general test set.
△ Less
Submitted 18 October, 2021; v1 submitted 27 September, 2021;
originally announced October 2021.
-
Privacy Policy Question Answering Assistant: A Query-Guided Extractive Summarization Approach
Authors:
Moniba Keymanesh,
Micha Elsner,
Srinivasan Parthasarathy
Abstract:
Existing work on making privacy policies accessible has explored new presentation forms such as color-coding based on the risk factors or summarization to assist users with conscious agreement. To facilitate a more personalized interaction with the policies, in this work, we propose an automated privacy policy question answering assistant that extracts a summary in response to the input user query…
▽ More
Existing work on making privacy policies accessible has explored new presentation forms such as color-coding based on the risk factors or summarization to assist users with conscious agreement. To facilitate a more personalized interaction with the policies, in this work, we propose an automated privacy policy question answering assistant that extracts a summary in response to the input user query. This is a challenging task because users articulate their privacy-related questions in a very different language than the legal language of the policy, making it difficult for the system to understand their inquiry. Moreover, existing annotated data in this domain are limited. We address these problems by paraphrasing to bring the style and language of the user's question closer to the language of privacy policies. Our content scoring module uses the existing in-domain data to find relevant information in the policy and incorporates it in a summary. Our pipeline is able to find an answer for 89% of the user queries in the privacyQA dataset.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Evaluating Biomedical BERT Models for Vocabulary Alignment at Scale in the UMLS Metathesaurus
Authors:
Goonmeet Bajaj,
Vinh Nguyen,
Thilini Wijesiriwardene,
Hong Yung Yip,
Vishesh Javangula,
Srinivasan Parthasarathy,
Amit Sheth,
Olivier Bodenreider
Abstract:
The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent advances in Natural Language Processing such as Transformer models like BERT and its biomedical variants wit…
▽ More
The current UMLS (Unified Medical Language System) Metathesaurus construction process for integrating over 200 biomedical source vocabularies is expensive and error-prone as it relies on the lexical algorithms and human editors for deciding if the two biomedical terms are synonymous. Recent advances in Natural Language Processing such as Transformer models like BERT and its biomedical variants with contextualized word embeddings have achieved state-of-the-art (SOTA) performance on downstream tasks. We aim to validate if these approaches using the BERT models can actually outperform the existing approaches for predicting synonymy in the UMLS Metathesaurus. In the existing Siamese Networks with LSTM and BioWordVec embeddings, we replace the BioWordVec embeddings with the biomedical BERT embeddings extracted from each BERT model using different ways of extraction. In the Transformer architecture, we evaluate the use of the different biomedical BERT models that have been pre-trained using different datasets and tasks. Given the SOTA performance of these BERT models for other downstream tasks, our experiments yield surprisingly interesting results: (1) in both model architectures, the approaches employing these biomedical BERT-based models do not outperform the existing approaches using Siamese Network with BioWordVec embeddings for the UMLS synonymy prediction task, (2) the original BioBERT large model that has not been pre-trained with the UMLS outperforms the SapBERT models that have been pre-trained with the UMLS, and (3) using the Siamese Networks yields better performance for synonymy prediction when compared to using the biomedical BERT models.
△ Less
Submitted 15 October, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Fairness-aware Summarization for Justified Decision-Making
Authors:
Moniba Keymanesh,
Tanya Berger-Wolf,
Micha Elsner,
Srinivasan Parthasarathy
Abstract:
In consequential domains such as recidivism prediction, facility inspection, and benefit assignment, it's important for individuals to know the decision-relevant information for the model's prediction. In addition, predictions should be fair both in terms of the outcome and the justification of the outcome. In other words, decision-relevant features should provide sufficient information for the pr…
▽ More
In consequential domains such as recidivism prediction, facility inspection, and benefit assignment, it's important for individuals to know the decision-relevant information for the model's prediction. In addition, predictions should be fair both in terms of the outcome and the justification of the outcome. In other words, decision-relevant features should provide sufficient information for the predicted outcome and should be independent of the membership of individuals in protected groups such as race and gender. In this work, we focus on the problem of (un)fairness in the justification of the text-based neural models. We tie the explanatory power of the model to fairness in the outcome and propose a fairness-aware summarization mechanism to detect and counteract the bias in such models. Given a potentially biased natural language explanation for a decision, we use a multi-task neural model and an attribution mechanism based on integrated gradients to extract high-utility and low-bias justifications in form of a summary. The extracted summary is then used for training a model to make decisions for individuals. Results on several real world datasets suggest that our method drastically limits the demographic leakage in the input (fairness in justification) while moderately enhancing the fairness in the outcome. Our model is also effective in detecting and counteracting several types of data poisoning attacks that synthesize race-coded reasoning or irrelevant justifications.
△ Less
Submitted 9 February, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
SYSML: StYlometry with Structure and Multitask Learning: Implications for Darknet Forum Migrant Analysis
Authors:
Pranav Maneriker,
Yuntian He,
Srinivasan Parthasarathy
Abstract:
Darknet market forums are frequently used to exchange illegal goods and services between parties who use encryption to conceal their identities. The Tor network is used to host these markets, which guarantees additional anonymization from IP and location tracking, making it challenging to link across malicious users using multiple accounts (sybils). Additionally, users migrate to new forums when o…
▽ More
Darknet market forums are frequently used to exchange illegal goods and services between parties who use encryption to conceal their identities. The Tor network is used to host these markets, which guarantees additional anonymization from IP and location tracking, making it challenging to link across malicious users using multiple accounts (sybils). Additionally, users migrate to new forums when one is closed, making it difficult to link users across multiple forums. We develop a novel stylometry-based multitask learning approach for natural language and interaction modeling using graph embeddings to construct low-dimensional representations of short episodes of user activity for authorship attribution. We provide a comprehensive evaluation of our methods across four different darknet forums demonstrating its efficacy over the state-of-the-art, with a lift of up to 2.5X on Mean Retrieval Rank and 2X on Recall@10.
△ Less
Submitted 20 September, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
eDarkTrends: Harnessing Social Media Trends in Substance use disorders for Opioid Listings on Cryptomarket
Authors:
Usha Lokala,
Francois Lamy,
Triyasha Ghosh Dastidar,
Kaushik Roy,
Raminta Daniulaityte,
Srinivasan Parthasarathy,
Amit Sheth
Abstract:
Opioid and substance misuse is rampant in the United States today, with the phenomenon known as the opioid crisis. The relationship between substance use and mental health has been extensively studied, with one possible relationship being substance misuse causes poor mental health. However, the lack of evidence on the relationship has resulted in opioids being largely inaccessible through legal me…
▽ More
Opioid and substance misuse is rampant in the United States today, with the phenomenon known as the opioid crisis. The relationship between substance use and mental health has been extensively studied, with one possible relationship being substance misuse causes poor mental health. However, the lack of evidence on the relationship has resulted in opioids being largely inaccessible through legal means. This study analyzes the substance misuse posts on social media with the opioids being sold through crypto market listings. We use the Drug Abuse Ontology, state-of-the-art deep learning, and BERT-based models to generate sentiment and emotion for the social media posts to understand user perception on social media by investigating questions such as, which synthetic opioids people are optimistic, neutral, or negative about or what kind of drugs induced fear and sorrow or what kind of drugs people love or thankful about or which drug people think negatively about or which opioids cause little to no sentimental reaction. We also perform topic analysis associated with the generated sentiments and emotions to understand which topics correlate with people's responses to various drugs. Our findings can help shape policy to help isolate opioid use cases where timely intervention may be required to prevent adverse consequences, prevent overdose-related deaths, and worsen the epidemic.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Disentanglement for audio-visual emotion recognition using multitask setup
Authors:
Raghuveer Peri,
Srinivas Parthasarathy,
Charles Bradshaw,
Shiva Sundaram
Abstract:
Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world da…
▽ More
Deep learning models trained on audio-visual data have been successfully used to achieve state-of-the-art performance for emotion recognition. In particular, models trained with multitask learning have shown additional performance improvements. However, such multitask models entangle information between the tasks, encoding the mutual dependencies present in label distributions in the real world data used for training. This work explores the disentanglement of multimodal signal representations for the primary task of emotion recognition and a secondary person identification task. In particular, we developed a multitask framework to extract low-dimensional embeddings that aim to capture emotion specific information, while containing minimal information related to person identity. We evaluate three different techniques for disentanglement and report results of up to 13% disentanglement while maintaining emotion recognition performance.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Driving Style Representation in Convolutional Recurrent Neural Network Model of Driver Identification
Authors:
Sobhan Moosavi,
Pravar D. Mahajan,
Srinivasan Parthasarathy,
Colleen Saunders-Chukwu,
Rajiv Ramnath
Abstract:
Identifying driving styles is the task of analyzing the behavior of drivers in order to capture variations that will serve to discriminate different drivers from each other. This task has become a prerequisite for a variety of applications, including usage-based insurance, driver coaching, driver action prediction, and even in designing autonomous vehicles; because driving style encodes essential…
▽ More
Identifying driving styles is the task of analyzing the behavior of drivers in order to capture variations that will serve to discriminate different drivers from each other. This task has become a prerequisite for a variety of applications, including usage-based insurance, driver coaching, driver action prediction, and even in designing autonomous vehicles; because driving style encodes essential information needed by these applications. In this paper, we present a deep-neural-network architecture, we term D-CRNN, for building high-fidelity representations for driving style, that combine the power of convolutional neural networks (CNN) and recurrent neural networks (RNN). Using CNN, we capture semantic patterns of driver behavior from trajectories (such as a turn or a braking event). We then find temporal dependencies between these semantic patterns using RNN to encode driving style. We demonstrate the effectiveness of these techniques for driver identification by learning driving style through extensive experiments conducted on several large, real-world datasets, and comparing the results with the state-of-the-art deep-learning and non-deep-learning solutions. These experiments also demonstrate a useful example of bias removal, by presenting how we preprocess the input data by sampling dissimilar trajectories for each driver to prevent spatial memorization. Finally, this paper presents an analysis of the contribution of different attributes for driver identification; we find that engine RPM, Speed, and Acceleration are the best combination of features.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition
Authors:
Zhong Meng,
Naoyuki Kanda,
Yashesh Gaur,
Sarangarajan Parthasarathy,
Eric Sun,
Liang Lu,
Xie Chen,
Jinyu Li,
Yifan Gong
Abstract:
The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method. In this method, the internal LM score is subtracted from the score obtained by interpolating the E2E score with the external LM score, during inference. To improve the ILME-based…
▽ More
The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method. In this method, the internal LM score is subtracted from the score obtained by interpolating the E2E score with the external LM score, during inference. To improve the ILME-based inference, we propose an internal LM training (ILMT) method to minimize an additional internal LM loss by updating only the E2E model components that affect the internal LM estimation. ILMT encourages the E2E model to form a standalone LM inside its existing components, without sacrificing ASR accuracy. After ILMT, the more modular E2E model with matched training and inference criteria enables a more thorough elimination of the source-domain internal LM, and therefore leads to a more effective integration of the target-domain external LM. Experimented with 30K-hour trained recurrent neural network transducer and attention-based encoder-decoder models, ILMT with ILME-based inference achieves up to 31.5% and 11.4% relative word error rate reductions from standard E2E training with Shallow Fusion on out-of-domain LibriSpeech and in-domain Microsoft production test sets, respectively.
△ Less
Submitted 22 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
A Tight Bound for Stochastic Submodular Cover
Authors:
Lisa Hellerstein,
Devorah Kletenik,
Srinivasan Parthasarathy
Abstract:
We show that the Adaptive Greedy algorithm of Golovin and Krause (2011) achieves an approximation bound of $(\ln (Q/η)+1)$ for Stochastic Submodular Cover: here $Q$ is the "goal value" and $η$ is the smallest non-zero marginal increase in utility deliverable by an item. (For integer-valued utility functions, we show a bound of $H(Q)$, where $H(Q)$ is the $Q^{th}$ Harmonic number.) Although this bo…
▽ More
We show that the Adaptive Greedy algorithm of Golovin and Krause (2011) achieves an approximation bound of $(\ln (Q/η)+1)$ for Stochastic Submodular Cover: here $Q$ is the "goal value" and $η$ is the smallest non-zero marginal increase in utility deliverable by an item. (For integer-valued utility functions, we show a bound of $H(Q)$, where $H(Q)$ is the $Q^{th}$ Harmonic number.) Although this bound was claimed by Golovin and Krause in the original version of their paper, the proof was later shown to be incorrect by Nan and Saligrama (2017). The subsequent corrected proof of Golovin and Krause (2017) gives a quadratic bound of $(\ln(Q/η) + 1)^2$. Other previous bounds for the problem are $56(\ln(Q/η) + 1)$, implied by work of Im et al. (2016) on a related problem, and $k(\ln (Q/η)+1)$, due to Deshpande et al. (2016) and Hellerstein and Kletenik (2018), where $k$ is the number of states. Our bound generalizes the well-known $(\ln~m + 1)$ approximation bound on the greedy algorithm for the classical Set Cover problem, where $m$ is the size of the ground set.
△ Less
Submitted 2 August, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
A Deep Generative Model for Molecule Optimization via One Fragment Modification
Authors:
Ziqi Chen,
Martin Renqiang Min,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Molecule optimization is a critical step in drug development to improve desired properties of drug candidates through chemical modification. We developed a novel deep generative model Modof over molecular graphs for molecule optimization. Modof modifies a given molecule through the prediction of a single site of disconnection at the molecule and the removal and/or addition of fragments at that sit…
▽ More
Molecule optimization is a critical step in drug development to improve desired properties of drug candidates through chemical modification. We developed a novel deep generative model Modof over molecular graphs for molecule optimization. Modof modifies a given molecule through the prediction of a single site of disconnection at the molecule and the removal and/or addition of fragments at that site. A pipeline of multiple, identical Modof models is implemented into Modof-pipe to modify an input molecule at multiple disconnection sites. Here we show that Modof-pipe is able to retain major molecular scaffolds, allow controls over intermediate optimization steps and better constrain molecule similarities. Modof-pipe outperforms the state-of-the-art methods on benchmark datasets: without molecular similarity constraints, Modof-pipe achieves 81.2% improvement in octanol-water partition coefficient penalized by synthetic accessibility and ring size; and 51.2%, 25.6% and 9.2% improvement if the optimized molecules are at least 0.2, 0.4 and 0.6 similar to those before optimization, respectively. Modof-pipe is further enhanced into Modof-pipem to allow modifying one molecule to multiple optimized ones. Modof-pipem achieves additional performance improvement as at least 17.8% better than Modof-pipe.
△ Less
Submitted 13 January, 2022; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Detecting expressions with multimodal transformers
Authors:
Srinivas Parthasarathy,
Shiva Sundaram
Abstract:
Developing machine learning algorithms to understand person-to-person engagement can result in natural user experiences for communal devices such as Amazon Alexa. Among other cues such as voice activity and gaze, a person's audio-visual expression that includes tone of the voice and facial expression serves as an implicit signal of engagement between parties in a dialog. This study investigates de…
▽ More
Developing machine learning algorithms to understand person-to-person engagement can result in natural user experiences for communal devices such as Amazon Alexa. Among other cues such as voice activity and gaze, a person's audio-visual expression that includes tone of the voice and facial expression serves as an implicit signal of engagement between parties in a dialog. This study investigates deep-learning algorithms for audio-visual detection of user's expression. We first implement an audio-visual baseline model with recurrent layers that shows competitive results compared to current state of the art. Next, we propose the transformer architecture with encoder layers that better integrate audio-visual features for expressions tracking. Performance on the Aff-Wild2 database shows that the proposed methods perform better than baseline architecture with recurrent layers with absolute gains approximately 2% for arousal and valence descriptors. Further, multimodal architectures show significant improvements over models trained on single modalities with gains of up to 3.6%. Ablation studies show the significance of the visual modality for the expression detection on the Aff-Wild2 database.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Self-Supervised learning with cross-modal transformers for emotion recognition
Authors:
Aparna Khare,
Srinivas Parthasarathy,
Shiva Sundaram
Abstract:
Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language. Models such as BERT learn to incorporate context in word embeddings, which translates to improved performance in downstream tasks like question answering. In this wo…
▽ More
Emotion recognition is a challenging task due to limited availability of in-the-wild labeled datasets. Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language. Models such as BERT learn to incorporate context in word embeddings, which translates to improved performance in downstream tasks like question answering. In this work, we extend self-supervised training to multi-modal applications. We learn multi-modal representations using a transformer trained on the masked language modeling task with audio, visual and text features. This model is fine-tuned on the downstream task of emotion recognition. Our results on the CMU-MOSEI dataset show that this pre-training technique can improve the emotion recognition performance by up to 3% compared to the baseline.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Authors:
Zhong Meng,
Sarangarajan Parthasarathy,
Eric Sun,
Yashesh Gaur,
Naoyuki Kanda,
Liang Lu,
Xie Chen,
Rui Zhao,
Jinyu Li,
Yifan Gong
Abstract:
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including…
▽ More
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models. In this work, we propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models with no additional model training, including the most popular recurrent neural network transducer (RNN-T) and attention-based encoder-decoder (AED) models. Trained with audio-transcript pairs, an E2E model implicitly learns an internal LM that characterizes the training data in the source domain. With ILME, the internal LM scores of an E2E model are estimated and subtracted from the log-linear interpolation between the scores of the E2E model and the external LM. The internal LM scores are approximated as the output of an E2E model when eliminating its acoustic components. ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR. Experimented with 30K-hour trained RNN-T and AED models, ILME achieves up to 15.5% and 6.8% relative word error rate reductions from Shallow Fusion on out-of-domain LibriSpeech and in-domain Microsoft production test sets, respectively.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition
Authors:
Xie Chen,
Sarangarajan Parthasarathy,
William Gale,
Shuangyu Chang,
Michael Zeng
Abstract:
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to…
▽ More
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
Ranging success probability of PPP distributed automotive radar in presence of generalized fading
Authors:
Sudharsan Parthasarathy,
Rakshith Jagannath
Abstract:
In automotive radar applications, multiple radars are used in all vehicles for improving the imaging quality. However this causes radar-to-radar interference from neighbouring vehicles, thus reducing the imaging quality. One metric to measure the imaging quality is ranging success probability. The ranging success probability is the probability that a multiple radar system successfully detects an o…
▽ More
In automotive radar applications, multiple radars are used in all vehicles for improving the imaging quality. However this causes radar-to-radar interference from neighbouring vehicles, thus reducing the imaging quality. One metric to measure the imaging quality is ranging success probability. The ranging success probability is the probability that a multiple radar system successfully detects an object at a given range, under certain operating conditions. In state-of-the-art literature, closed form expressions for ranging success probability have been derived assuming no fading in desired signal component. Similarly in literature, though distribution of fading in interferers is assumed to be arbitrary, closed form expression is derived only for no-fading assumption in interferers. As fading is always present in a wireless channel, we have derived ranging success probability assuming desired channel experiences the popular Rayleigh fading. And we have assumed generalized $κ$-$μ$ shadowed fading for interfering channels that generalizes many popular fading models such as Rayleigh, Rician, Nakagami-$m$, $κ$-$μ$ etc. The interferers are assumed to be located on points drawn from a Poisson point process distribution. We have also studied how the relationship between shadowing component and number of clusters can affect the impact of LOS component on ranging success probability.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
Training Strategies to Handle Missing Modalities for Audio-Visual Expression Recognition
Authors:
Srinivas Parthasarathy,
Shiva Sundaram
Abstract:
Automatic audio-visual expression recognition can play an important role in communication services such as tele-health, VOIP calls and human-machine interaction. Accuracy of audio-visual expression recognition could benefit from the interplay between the two modalities. However, most audio-visual expression recognition systems, trained in ideal conditions, fail to generalize in real world scenario…
▽ More
Automatic audio-visual expression recognition can play an important role in communication services such as tele-health, VOIP calls and human-machine interaction. Accuracy of audio-visual expression recognition could benefit from the interplay between the two modalities. However, most audio-visual expression recognition systems, trained in ideal conditions, fail to generalize in real world scenarios where either the audio or visual modality could be missing due to a number of reasons such as limited bandwidth, interactors' orientation, caller initiated muting. This paper studies the performance of a state-of-the art transformer when one of the modalities is missing. We conduct ablation studies to evaluate the model in the absence of either modality. Further, we propose a strategy to randomly ablate visual inputs during training at the clip or frame level to mimic real world scenarios. Results conducted on in-the-wild data, indicate significant generalization in proposed models trained on missing cues, with gains up to 17% for frame level ablations, showing that these training strategies cope better with the loss of input modalities.
△ Less
Submitted 30 November, 2020; v1 submitted 1 October, 2020;
originally announced October 2020.
-
Multi-modal embeddings using multi-task learning for emotion recognition
Authors:
Aparna Khare,
Srinivas Parthasarathy,
Shiva Sundaram
Abstract:
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machi…
▽ More
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability
Authors:
Jinyu Li,
Rui Zhao,
Zhong Meng,
Yanqing Liu,
Wenning Wei,
Sarangarajan Parthasarathy,
Vadim Mazalov,
Zhenghao Wang,
Lei He,
Sheng Zhao,
Yifan Gong
Abstract:
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead.…
▽ More
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. In this paper, we describe our recent development of RNN-T models with reduced GPU memory consumption during training, better initialization strategy, and advanced encoder modeling with future lookahead. When trained with Microsoft's 65 thousand hours of anonymized training data, the developed RNN-T model surpasses a very well trained hybrid model with both better recognition accuracy and lower latency. We further study how to customize RNN-T models to a new domain, which is important for deploying E2E models to practical scenarios. By comparing several methods leveraging text-only data in the new domain, we found that updating RNN-T's prediction and joint networks using text-to-speech generated from domain-specific text is the most effective.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings
Authors:
Bortik Bandyopadhyay,
Pranav Maneriker,
Vedang Patel,
Saumya Yashmohini Sahai,
Ping Zhang,
Srinivasan Parthasarathy
Abstract:
Traditional relational databases contain a lot of latent semantic information that have largely remained untapped due to the difficulty involved in automatically extracting such information. Recent works have proposed unsupervised machine learning approaches to extract such hidden information by textifying the database columns and then projecting the text tokens onto a fixed dimensional semantic v…
▽ More
Traditional relational databases contain a lot of latent semantic information that have largely remained untapped due to the difficulty involved in automatically extracting such information. Recent works have proposed unsupervised machine learning approaches to extract such hidden information by textifying the database columns and then projecting the text tokens onto a fixed dimensional semantic vector space. However, in certain databases, task-specific class labels may be available, which unsupervised approaches are unable to lever in a principled manner. Also, when embeddings are generated at individual token level, then column encoding of multi-token text column has to be computed by taking the average of the vectors of the tokens present in that column for any given row. Such averaging approach may not produce the best semantic vector representation of the multi-token text column, as observed while encoding paragraphs or documents in natural language processing domain. With these shortcomings in mind, we propose a supervised machine learning approach using a Bi-LSTM based sequence encoder to directly generate column encodings for multi-token text columns of the DrugBank database, which contains gold standard drug-drug interaction (DDI) labels. Our text data driven encoding approach achieves very high Accuracy on the supervised DDI prediction task for some columns and we use those supervised column encodings to simulate and evaluate the Analogy SQL queries on relational data to demonstrate the efficacy of our technique.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
HPRA: Hyperedge Prediction using Resource Allocation
Authors:
Tarun Kumar,
K Darwin,
Srinivasan Parthasarathy,
Balaraman Ravindran
Abstract:
Many real-world systems involve higher-order interactions and thus demand complex models such as hypergraphs. For instance, a research article could have multiple collaborating authors, and therefore the co-authorship network is best represented as a hypergraph. In this work, we focus on the problem of hyperedge prediction. This problem has immense applications in multiple domains, such as predict…
▽ More
Many real-world systems involve higher-order interactions and thus demand complex models such as hypergraphs. For instance, a research article could have multiple collaborating authors, and therefore the co-authorship network is best represented as a hypergraph. In this work, we focus on the problem of hyperedge prediction. This problem has immense applications in multiple domains, such as predicting new collaborations in social networks, discovering new chemical reactions in metabolic networks, etc. Despite having significant importance, the problem of hyperedge prediction hasn't received adequate attention, mainly because of its inherent complexity. In a graph with $n$ nodes the number of potential edges is $\mathcal{O}(n^{2})$, whereas in a hypergraph, the number of potential hyperedges is $\mathcal{O}(2^{n})$. To avoid searching through such a huge space, current methods restrain the original problem in the following two ways. One class of algorithms assume the hypergraphs to be $k$-uniform. However, many real-world systems are not confined only to have interactions involving $k$ components. Thus, these algorithms are not suitable for many real-world applications. The second class of algorithms requires a candidate set of hyperedges from which the potential hyperedges are chosen. In the absence of domain knowledge, the candidate set can have $\mathcal{O}(2^{n})$ possible hyperedges, which makes this problem intractable. We propose HPRA - Hyperedge Prediction using Resource Allocation, the first of its kind algorithm, which overcomes these issues and predicts hyperedges of any cardinality without using any candidate hyperedge set. HPRA is a similarity-based method working on the principles of the resource allocation process. In addition to recovering missing hyperedges, we demonstrate that HPRA can predict future hyperedges in a wide range of hypergraphs.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Multiresolution and Multimodal Speech Recognition with Transformers
Authors:
Georgios Paraskevopoulos,
Srinivas Parthasarathy,
Aparna Khare,
Shiva Sundaram
Abstract:
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally…
▽ More
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally, we incorporate a multitask training criterion for multiresolution ASR, where we train the model to generate both character and subword level transcriptions.
Experimental results on the How2 dataset, indicate that multiresolution training can speed up convergence by around 50% and relatively improves word error rate (WER) performance by upto 18% over subword prediction models. Further, incorporating visual information improves performance with relative gains upto 3.76% over audio only models.
Our results are comparable to state-of-the-art Listen, Attend and Spell-based architectures.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing
Authors:
Goonmeet Bajaj,
Bortik Bandyopadhyay,
Daniel Schmidt,
Pranav Maneriker,
Christopher Myers,
Srinivasan Parthasarathy
Abstract:
Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on…
▽ More
Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each Knowledge Gap (KG) describes the reasoning abilities needed to arrive at a resolution. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image. These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.
△ Less
Submitted 3 June, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
M2: Mixed Models with Preferences, Popularities and Transitions for Next-Basket Recommendation
Authors:
Bo Peng,
Zhiyun Ren,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a novel mixed model with preferences, popularities and transitions (M2) for the next-basket recommendation. This method models three important factors in next-basket generation process: 1) users' general preferences, 2) items' global po…
▽ More
Next-basket recommendation considers the problem of recommending a set of items into the next basket that users will purchase as a whole. In this paper, we develop a novel mixed model with preferences, popularities and transitions (M2) for the next-basket recommendation. This method models three important factors in next-basket generation process: 1) users' general preferences, 2) items' global popularities and 3) transition patterns among items. Unlike existing recurrent neural network-based approaches, M2 does not use the complicated networks to model the transitions among items, or generate embeddings for users. Instead, it has a simple encoder-decoder based approach (ed-Trans) to better model the transition patterns among items. We compared M2 with different combinations of the factors with 5 state-of-the-art next-basket recommendation methods on 4 public benchmark datasets in recommending the first, second and third next basket. Our experimental results demonstrate that M2 significantly outperforms the state-of-the-art methods on all the datasets in all the tasks, with an improvement of up to 22.1%. In addition, our ablation study demonstrates that the ed-Trans is more effective than recurrent neural networks in terms of the recommendation performance. We also have a thorough discussion on various experimental protocols and evaluation metrics for next-basket recommendation evaluation.
△ Less
Submitted 17 January, 2022; v1 submitted 3 April, 2020;
originally announced April 2020.
-
HAM: Hybrid Associations Models for Sequential Recommendation
Authors:
Bo Peng,
Zhiyun Ren,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Sequential recommendation aims to identify and recommend the next few items for a user that the user is most likely to purchase/review, given the user's purchase/rating trajectories. It becomes an effective tool to help users select favorite items from a variety of options. In this manuscript, we developed hybrid associations models (HAM) to generate sequential recommendations using three factors:…
▽ More
Sequential recommendation aims to identify and recommend the next few items for a user that the user is most likely to purchase/review, given the user's purchase/rating trajectories. It becomes an effective tool to help users select favorite items from a variety of options. In this manuscript, we developed hybrid associations models (HAM) to generate sequential recommendations using three factors: 1) users' long-term preferences, 2) sequential, high-order and low-order association patterns in the users' most recent purchases/ratings, and 3) synergies among those items. HAM uses simplistic pooling to represent a set of items in the associations, and element-wise product to represent item synergies of arbitrary orders. We compared HAM models with the most recent, state-of-the-art methods on six public benchmark datasets in three different experimental settings. Our experimental results demonstrate that HAM models significantly outperform the state of the art in all the experimental settings, with an improvement as much as 46.6%. In addition, our run-time performance comparison in testing demonstrates that HAM models are much more efficient than the state-of-the-art methods, and are able to achieve significant speedup as much as 139.7 folds.
△ Less
Submitted 4 January, 2021; v1 submitted 26 February, 2020;
originally announced February 2020.