-
A Quantum Algorithm Based Heuristic to Hide Sensitive Itemsets
Authors:
Abhijeet Ghoshal,
Yan Li,
Syam Menon,
Sumit Sarkar
Abstract:
Quantum devices use qubits to represent information, which allows them to exploit important properties from quantum physics, specifically superposition and entanglement. As a result, quantum computers have the potential to outperform the most advanced classical computers. In recent years, quantum algorithms have shown hints of this promise, and many algorithms have been proposed for the quantum do…
▽ More
Quantum devices use qubits to represent information, which allows them to exploit important properties from quantum physics, specifically superposition and entanglement. As a result, quantum computers have the potential to outperform the most advanced classical computers. In recent years, quantum algorithms have shown hints of this promise, and many algorithms have been proposed for the quantum domain. There are two key hurdles to solving difficult real-world problems on quantum computers. The first is on the hardware front -- the number of qubits in the most advanced quantum systems is too small to make the solution of large problems practical. The second involves the algorithms themselves -- as quantum computers use qubits, the algorithms that work there are fundamentally different from those that work on traditional computers. As a result of these constraints, research has focused on developing approaches to solve small versions of problems as proofs of concept -- recognizing that it would be possible to scale these up once quantum devices with enough qubits become available. Our objective in this paper is along the same lines. We present a quantum approach to solve a well-studied problem in the context of data sharing. This heuristic uses the well-known Quantum Approximate Optimization Algorithm (QAOA). We present results on experiments involving small datasets to illustrate how the problem could be solved using quantum algorithms. The results show that the method has potential and provide answers close to optimal. At the same time, we realize there are opportunities for improving the method further.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Characterization of partial wetting by CMAS droplets using multiphase many-body dissipative particle dynamics and data-driven discovery based on PINNs
Authors:
Elham Kiyani,
Mahdi Kooshkbaghi,
Khemraj Shukla,
Rahul Babu Koneru,
Zhen Li,
Luis Bravo,
Anindya Ghoshal,
George Em Karniadakis,
Mikko Karttunen
Abstract:
The molten sand, a mixture of calcia, magnesia, alumina, and silicate, known as CMAS, is characterized by its high viscosity, density, and surface tension. The unique properties of CMAS make it a challenging material to deal with in high-temperature applications, requiring innovative solutions and materials to prevent its buildup and damage to critical equipment. Here, we use multiphase many-body…
▽ More
The molten sand, a mixture of calcia, magnesia, alumina, and silicate, known as CMAS, is characterized by its high viscosity, density, and surface tension. The unique properties of CMAS make it a challenging material to deal with in high-temperature applications, requiring innovative solutions and materials to prevent its buildup and damage to critical equipment. Here, we use multiphase many-body dissipative particle dynamics (mDPD) simulations to study the wetting dynamics of highly viscous molten CMAS droplets. The simulations are performed in three dimensions, with varying initial droplet sizes and equilibrium contact angles. We propose a coarse parametric ordinary differential equation (ODE) that captures the spreading radius behavior of the CMAS droplets. The ODE parameters are then identified based on the Physics-Informed Neural Network (PINN) framework. Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils
Authors:
Khemraj Shukla,
Vivek Oommen,
Ahmad Peyvan,
Michael Penwarden,
Luis Bravo,
Anindya Ghoshal,
Robert M. Kirby,
George Em Karniadakis
Abstract:
Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design pro…
▽ More
Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences
Authors:
Asish Ghoshal,
Arash Einolghozati,
Ankit Arun,
Haoran Li,
Lili Yu,
Vera Gor,
Yashar Mehdad,
Scott Wen-tau Yih,
Asli Celikyilmaz
Abstract:
Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries. In this paper, we show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders. To that end, we leverage information-theoretic measures of causal effects to quantify the amount…
▽ More
Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries. In this paper, we show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders. To that end, we leverage information-theoretic measures of causal effects to quantify the amount of confounding and precisely quantify how they affect the summarization performance. Based on insights derived from our theoretical results, we design a simple multi-task model to control such confounding by leveraging human-annotated relevant sentences when available. Crucially, we give a principled characterization of data distributions where such confounding can be large thereby necessitating the use of human annotated relevant sentences to generate factual summaries. Our approach improves faithfulness scores by 20\% over strong baselines on AnswerSumm \citep{fabbri2021answersumm}, a conversation summarization dataset where lack of faithfulness is a significant issue due to the subjective nature of the task. Our best method achieves the highest faithfulness score while also achieving state-of-the-art results on standard metrics like ROUGE and METEOR. We corroborate these improvements through human evaluation.
△ Less
Submitted 18 January, 2024; v1 submitted 19 December, 2022;
originally announced December 2022.
-
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval
Authors:
Minghan Li,
Sheng-Chieh Lin,
Barlas Oguz,
Asish Ghoshal,
Jimmy Lin,
Yashar Mehdad,
Wen-tau Yih,
Xilun Chen
Abstract:
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a t…
▽ More
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route different token vectors to the predicted lexical ``keys'' such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Code and data are available at https://github.com/facebookresearch/dpr-scale.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Variable Attention Masking for Configurable Transformer Transducer Speech Recognition
Authors:
Pawel Swietojanski,
Stefan Braun,
Dogan Can,
Thiago Fraga da Silva,
Arnab Ghoshal,
Takaaki Hori,
Roger Hsiao,
Henry Mason,
Erik McDermott,
Honza Silovsky,
Ruchir Travadi,
Xiaodan Zhuang
Abstract:
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries,…
▽ More
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.
△ Less
Submitted 18 April, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation
Authors:
Thien Nguyen,
Nathalie Tran,
Liuhui Deng,
Thiago Fraga da Silva,
Matthew Radzihovsky,
Roger Hsiao,
Henry Mason,
Stefan Braun,
Erik McDermott,
Dogan Can,
Pawel Swietojanski,
Lyan Verwimp,
Sibel Oyman,
Tresi Arvizo,
Honza Silovsky,
Arnab Ghoshal,
Mathieu Martel,
Bharat Ram Ambati,
Mohamed Ali
Abstract:
Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-swit…
▽ More
Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-switched data can improve the bilingual ASR system on code-switching speech. We analyze how each of the neural transducer's encoders contributes towards code-switching performance by measuring encoder-specific recall values, and evaluate our English/Mandarin system on the ASCEND data set. Our final system achieves 25% mixed error rate (MER) on the ASCEND English/Mandarin code-switching test set -- reducing the MER by 2.1% absolute compared to the previous literature -- while maintaining good accuracy on the monolingual test sets.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Bilingual End-to-End ASR with Byte-Level Subwords
Authors:
Liuhui Deng,
Roger Hsiao,
Arnab Ghoshal
Abstract:
In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to-end model…
▽ More
In this paper, we investigate how the output representation of an end-to-end neural network affects multilingual automatic speech recognition (ASR). We study different representations including character-level, byte-level, byte pair encoding (BPE), and byte-level byte pair encoding (BBPE) representations, and analyze their strengths and weaknesses. We focus on developing a single end-to-end model to support utterance-based bilingual ASR, where speakers do not alternate between two languages in a single utterance but may change languages across utterances. We conduct our experiments on English and Mandarin dictation tasks, and we find that BBPE with penalty schemes can improve utterance-based bilingual ASR performance by 2% to 5% relative even with smaller number of outputs and fewer parameters. We conclude with analysis that indicates directions for further improving multilingual ASR.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms
Authors:
Yilun Zhou,
Adithya Renduchintala,
Xian Li,
Sida Wang,
Yashar Mehdad,
Asish Ghoshal
Abstract:
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process. While many algorithms have been proposed, there is little study on what the optimal AL algorithm looks like, which would help researchers understand where their models fall short and iterate on the design. In this paper, we present a simulated annealing algorithm to s…
▽ More
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process. While many algorithms have been proposed, there is little study on what the optimal AL algorithm looks like, which would help researchers understand where their models fall short and iterate on the design. In this paper, we present a simulated annealing algorithm to search for this optimal oracle and analyze it for several tasks. We present qualitative and quantitative insights into the behaviors of this oracle, comparing and contrasting them with those of various heuristics. Moreover, we are able to consistently improve the heuristics using one particular insight. We hope that our findings can better inform future active learning research. The code is available at https://github.com/YilunZhou/optimal-active-learning.
△ Less
Submitted 20 February, 2021; v1 submitted 29 December, 2020;
originally announced January 2021.
-
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation
Authors:
Kushal Lakhotia,
Bhargavi Paranjape,
Asish Ghoshal,
Wen-tau Yih,
Yashar Mehdad,
Srinivasan Iyer
Abstract:
Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for NLP tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations.…
▽ More
Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for NLP tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations. However, these models have many shortcomings; they can fabricate explanations even for incorrect predictions, they are difficult to adapt to long input documents, and their training requires a large amount of labeled data. In this paper, we develop FiD-Ex, which addresses these shortcomings for seq2seq models by: 1) introducing sentence markers to eliminate explanation fabrication by encouraging extractive generation, 2) using the fusion-in-decoder architecture to handle long input contexts, and 3) intermediate fine-tuning on re-structured open domain QA datasets to improve few-shot performance. FiD-Ex significantly improves over prior work in terms of explanation metrics and task accuracy, on multiple tasks from the ERASER explainability benchmark, both in the fully supervised and in the few-shot settings.
△ Less
Submitted 31 December, 2020;
originally announced December 2020.
-
Low-Resource Domain Adaptation for Compositional Task-Oriented Semantic Parsing
Authors:
Xilun Chen,
Asish Ghoshal,
Yashar Mehdad,
Luke Zettlemoyer,
Sonal Gupta
Abstract:
Task-oriented semantic parsing is a critical component of virtual assistants, which is responsible for understanding the user's intents (set reminder, play music, etc.). Recent advances in deep learning have enabled several approaches to successfully parse more complex queries (Gupta et al., 2018; Rongali et al.,2020), but these models require a large amount of annotated training data to parse que…
▽ More
Task-oriented semantic parsing is a critical component of virtual assistants, which is responsible for understanding the user's intents (set reminder, play music, etc.). Recent advances in deep learning have enabled several approaches to successfully parse more complex queries (Gupta et al., 2018; Rongali et al.,2020), but these models require a large amount of annotated training data to parse queries on new domains (e.g. reminder, music).
In this paper, we focus on adapting task-oriented semantic parsers to low-resource domains, and propose a novel method that outperforms a supervised neural model at a 10-fold data reduction. In particular, we identify two fundamental factors for low-resource domain adaptation: better representation learning and better training techniques. Our representation learning uses BART (Lewis et al., 2019) to initialize our model which outperforms encoder-only pre-trained representations used in previous work. Furthermore, we train with optimization-based meta-learning (Finn et al., 2017) to improve generalization to low-resource domains. This approach significantly outperforms all baseline methods in the experiments on a newly collected multi-domain task-oriented semantic parsing dataset (TOPv2), which we release to the public.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Online Automatic Speech Recognition with Listen, Attend and Spell Model
Authors:
Roger Hsiao,
Dogan Can,
Tim Ng,
Ruchir Travadi,
Arnab Ghoshal
Abstract:
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propos…
▽ More
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.
△ Less
Submitted 13 October, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.
-
Improving Language Identification for Multilingual Speakers
Authors:
Andrew Titus,
Jan Silovsky,
Nanxin Chen,
Roger Hsiao,
Mary Young,
Arnab Ghoshal
Abstract:
Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technolo…
▽ More
Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language. One aspect that has been mostly neglected, however, is discrimination of languages for multilingual speakers, despite being a primary target audience of many systems that utilize LID technologies. As we show in this work, LID systems can have a high average accuracy for most combinations of languages while greatly underperforming for others when accented speech is present. We address this by using coarser-grained targets for the acoustic LID model and integrating its outputs with interaction context signals in a context-aware model to tailor the system to each user. This combined system achieves an average 97% accuracy across all language combinations while improving worst-case accuracy by over 60% relative to our baseline.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Minimax bounds for structured prediction
Authors:
Kevin Bello,
Asish Ghoshal,
Jean Honorio
Abstract:
Structured prediction can be considered as a generalization of many standard supervised learning tasks, and is usually thought as a simultaneous prediction of multiple labels. One standard approach is to maximize a score function on the space of labels, which decomposes as a sum of unary and pairwise potentials, each depending on one or two specific labels, respectively. For this approach, several…
▽ More
Structured prediction can be considered as a generalization of many standard supervised learning tasks, and is usually thought as a simultaneous prediction of multiple labels. One standard approach is to maximize a score function on the space of labels, which decomposes as a sum of unary and pairwise potentials, each depending on one or two specific labels, respectively. For this approach, several learning and inference algorithms have been proposed over the years, ranging from exact to approximate methods while balancing the computational complexity. However, in contrast to binary and multiclass classification, results on the necessary number of samples for achieving learning is still limited, even for a specific family of predictors such as factor graphs. In this work, we provide minimax bounds for a class of factor-graph inference models for structured prediction. That is, we characterize the necessary sample complexity for any conceivable algorithm to achieve learning of factor-graph predictors.
△ Less
Submitted 2 June, 2019;
originally announced June 2019.
-
Learning Maximum-A-Posteriori Perturbation Models for Structured Prediction in Polynomial Time
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
MAP perturbation models have emerged as a powerful framework for inference in structured prediction. Such models provide a way to efficiently sample from the Gibbs distribution and facilitate predictions that are robust to random noise. In this paper, we propose a provably polynomial time randomized algorithm for learning the parameters of perturbed MAP predictors. Our approach is based on minimiz…
▽ More
MAP perturbation models have emerged as a powerful framework for inference in structured prediction. Such models provide a way to efficiently sample from the Gibbs distribution and facilitate predictions that are robust to random noise. In this paper, we propose a provably polynomial time randomized algorithm for learning the parameters of perturbed MAP predictors. Our approach is based on minimizing a novel Rademacher-based generalization bound on the expected loss of a perturbed MAP predictor, which can be computed in polynomial time. We obtain conditions under which our randomized learning algorithm can guarantee generalization to unseen examples.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
Learning linear structural equation models in polynomial time and sample complexity
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm --- which is computationally and statistically efficient and works in the high-dimensional regime --- for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computa…
▽ More
The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm --- which is computationally and statistically efficient and works in the high-dimensional regime --- for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computational efficiency, and statistical efficiency. We show that when data is generated from a linear SEM over $p$ nodes and maximum degree $d$, our algorithm recovers the directed acyclic graph (DAG) structure of the SEM under an identifiability condition that is more general than those considered in the literature, and without faithfulness assumptions. In the population setting, our algorithm recovers the DAG structure in $\mathcal{O}(p(d^2 + \log p))$ operations. In the finite sample setting, if the estimated precision matrix is sparse, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^3 + pd^7)$, while if the estimated precision matrix is dense, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^5)$. For sub-Gaussian noise, we show that our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} \log (\frac{p}{\sqrtδ}))$ to achieve $\varepsilon$ element-wise additive error with respect to the true autoregression matrix with probability at most $1 - δ$, while for noise with bounded $(4m)$-th moment, with $m$ being a positive integer, our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} (\frac{p^2}δ)^{1/m})$.
△ Less
Submitted 14 July, 2017;
originally announced July 2017.
-
Learning Sparse Polymatrix Games in Polynomial Time and Sample Complexity
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
We consider the problem of learning sparse polymatrix games from observations of strategic interactions. We show that a polynomial time method based on $\ell_{1,2}$-group regularized logistic regression recovers a game, whose Nash equilibria are the $ε$-Nash equilibria of the game from which the data was generated (true game), in $\mathcal{O}(m^4 d^4 \log (pd))$ samples of strategy profiles --- wh…
▽ More
We consider the problem of learning sparse polymatrix games from observations of strategic interactions. We show that a polynomial time method based on $\ell_{1,2}$-group regularized logistic regression recovers a game, whose Nash equilibria are the $ε$-Nash equilibria of the game from which the data was generated (true game), in $\mathcal{O}(m^4 d^4 \log (pd))$ samples of strategy profiles --- where $m$ is the maximum number of pure strategies of a player, $p$ is the number of players, and $d$ is the maximum degree of the game graph. Under slightly more stringent separability conditions on the payoff matrices of the true game, we show that our method learns a game with the exact same Nash equilibria as the true game. We also show that $Ω(d \log (pm))$ samples are necessary for any method to consistently recover a game, with the same Nash-equilibria as the true game, from observations of strategic interactions. We verify our theoretical results through simulation experiments.
△ Less
Submitted 20 November, 2017; v1 submitted 18 June, 2017;
originally announced June 2017.
-
Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
In this paper we obtain sufficient and necessary conditions on the number of samples required for exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions. We consider sparse linear influence games --- a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of…
▽ More
In this paper we obtain sufficient and necessary conditions on the number of samples required for exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions. We consider sparse linear influence games --- a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We show that one can efficiently recover the PSNE set of a linear influence game with $O(k^2 \log n)$ samples, under very general observation models. On the other hand, we show that $Ω(k \log n)$ samples are necessary for any procedure to recover the PSNE set from observations of joint actions.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
Learning the directed acyclic graph (DAG) structure of a Bayesian network from observational data is a notoriously difficult problem for which many hardness results are known. In this paper we propose a provably polynomial-time algorithm for learning sparse Gaussian Bayesian networks with equal noise variance --- a class of Bayesian networks for which the DAG structure can be uniquely identified f…
▽ More
Learning the directed acyclic graph (DAG) structure of a Bayesian network from observational data is a notoriously difficult problem for which many hardness results are known. In this paper we propose a provably polynomial-time algorithm for learning sparse Gaussian Bayesian networks with equal noise variance --- a class of Bayesian networks for which the DAG structure can be uniquely identified from observational data --- under high-dimensional settings. We show that $O(k^4 \log p)$ number of samples suffices for our method to recover the true DAG structure with high probability, where $p$ is the number of variables and $k$ is the maximum Markov blanket size. We obtain our theoretical guarantees under a condition called Restricted Strong Adjacency Faithfulness, which is strictly weaker than strong faithfulness --- a condition that other methods based on conditional independence testing need for their success. The sample complexity of our method matches the information-theoretic limits in terms of the dependence on $p$. We show that our method out-performs existing state-of-the-art methods for learning Gaussian Bayesian networks in terms of recovering the true DAG structure while being comparable in speed to heuristic methods.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
From Behavior to Sparse Graphical Games: Efficient Recovery of Equilibria
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
In this paper we study the problem of exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions of the players alone. We consider sparse linear influence games --- a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We present an $\ell_1$-regu…
▽ More
In this paper we study the problem of exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions of the players alone. We consider sparse linear influence games --- a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We present an $\ell_1$-regularized logistic regression based algorithm for recovering the PSNE set exactly, that is both computationally efficient --- i.e. runs in polynomial time --- and statistically efficient --- i.e. has logarithmic sample complexity. Specifically, we show that the sufficient number of samples required for exact PSNE recovery scales as $\mathcal{O}(\mathrm{poly}(k) \log n)$. We also validate our theoretical results using synthetic experiments.
△ Less
Submitted 19 October, 2016; v1 submitted 11 July, 2016;
originally announced July 2016.
-
Information-theoretic limits of Bayesian network structure learning
Authors:
Asish Ghoshal,
Jean Honorio
Abstract:
In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks (BNs), on discrete as well as continuous random variables, from a finite number of samples. We show that the minimum number of samples required by any procedure to recover the correct structure grows as $Ω(m)$ and $Ω(k \log m + (k^2/m))$ for non-sparse and sparse BNs respectively, where $m$ is t…
▽ More
In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks (BNs), on discrete as well as continuous random variables, from a finite number of samples. We show that the minimum number of samples required by any procedure to recover the correct structure grows as $Ω(m)$ and $Ω(k \log m + (k^2/m))$ for non-sparse and sparse BNs respectively, where $m$ is the number of variables and $k$ is the maximum number of parents per node. We provide a simple recipe, based on an extension of the Fano's inequality, to obtain information-theoretic limits of structure recovery for any exponential family BN. We instantiate our result for specific conditional distributions in the exponential family to characterize the fundamental limits of learning various commonly used BNs, such as conditional probability table based networks, gaussian BNs, noisy-OR networks, and logistic regression networks. En route to obtaining our main results, we obtain tight bounds on the number of sparse and non-sparse essential-DAGs. Finally, as a byproduct, we recover the information-theoretic limits of sparse variable selection for logistic regression.
△ Less
Submitted 3 March, 2017; v1 submitted 27 January, 2016;
originally announced January 2016.