-
Exploiting the equivalence between quantum neural networks and perceptrons
Authors:
Chris Mingard,
Jessica Pointing,
Charles London,
Yoonsoo Nam,
Ard A. Louis
Abstract:
Quantum machine learning models based on parametrized quantum circuits, also called quantum neural networks (QNNs), are considered to be among the most promising candidates for applications on near-term quantum devices. Here we explore the expressivity and inductive bias of QNNs by exploiting an exact mapping from QNNs with inputs $x$ to classical perceptrons acting on $x \otimes x$ (generalised t…
▽ More
Quantum machine learning models based on parametrized quantum circuits, also called quantum neural networks (QNNs), are considered to be among the most promising candidates for applications on near-term quantum devices. Here we explore the expressivity and inductive bias of QNNs by exploiting an exact mapping from QNNs with inputs $x$ to classical perceptrons acting on $x \otimes x$ (generalised to complex inputs). The simplicity of the perceptron architecture allows us to provide clear examples of the shortcomings of current QNN models, and the many barriers they face to becoming useful general-purpose learning algorithms. For example, a QNN with amplitude encoding cannot express the Boolean parity function for $n\geq 3$, which is but one of an exponential number of data structures that such a QNN is unable to express. Mapping a QNN to a classical perceptron simplifies training, allowing us to systematically study the inductive biases of other, more expressive embeddings on Boolean data. Several popular embeddings primarily produce an inductive bias towards functions with low class balance, reducing their generalisation performance compared to deep neural network architectures which exhibit much richer inductive biases. We explore two alternate strategies that move beyond standard QNNs. In the first, we use a QNN to help generate a classical DNN-inspired kernel. In the second we draw an analogy to the hierarchical structure of deep neural networks and construct a layered non-linear QNN that is provably fully expressive on Boolean data, while also exhibiting a richer inductive bias than simple QNNs. Finally, we discuss characteristics of the QNN literature that may obscure how hard it is to achieve quantum advantage over deep learning algorithms on classical data.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Improved linearly ordered colorings of hypergraphs via SDP rounding
Authors:
Anand Louis,
Alantha Newman,
Arka Ray
Abstract:
We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and…
▽ More
We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and Zivný recently gave a polynomial-time algorithm to color such hypergraphs with $\widetilde{O}(n^{1/3})$ colors and asked if SDP methods can be used directly to obtain improved bounds. Our main result is to show how to use SDP-based rounding methods to produce an LO coloring with $\widetilde{O}(n^{1/5})$ colors for such hypergraphs. We first show that we can reduce the problem to cases with highly structured SDP solutions, which we call balanced hypergraphs. Then we show how to apply classic SDP-rounding tools in this case. We believe that the reduction to balanced hypergraphs is novel and could be of independent interest.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
An exactly solvable model for emergence and scaling laws
Authors:
Yoonsoo Nam,
Nayara Fonseca,
Seok Hyeong Lee,
Chris Mingard,
Ard A. Louis
Abstract:
Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of ne…
▽ More
Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
△ Less
Submitted 14 July, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval
Authors:
Antoine Louis,
Vageesh Saxena,
Gijs van Dijck,
Gerasimos Spanakis
Abstract:
State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial t…
▽ More
State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial task-specific fine-tuning across multiple languages, often perform poorly in languages with minimal representation in the pretraining corpus, and struggle to incorporate new languages after the pretraining phase. In this work, we present a novel modular dense retrieval model that learns from the rich data of a single high-resource language and effectively zero-shot transfers to a wide array of languages, thereby eliminating the need for language-specific labeled data. Our model, ColBERT-XM, demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages. Further analysis reveals that our modular approach is highly data-efficient, effectively adapts to out-of-distribution data, and significantly reduces energy consumption and carbon emissions. By demonstrating its proficiency in zero-shot scenarios, ColBERT-XM marks a shift towards more sustainable and inclusive retrieval systems, enabling effective information accessibility in numerous languages. We publicly release our code and models for the community.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
A synthetic data approach for domain generalization of NLI models
Authors:
Mohammad Javad Hosseini,
Andrey Petrov,
Alex Fabrikant,
Annie Louis
Abstract:
Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performan…
▽ More
Natural Language Inference (NLI) remains an important benchmark task for LLMs. NLI datasets are a springboard for transfer learning to other semantic tasks, and NLI models are standard tools for identifying the faithfulness of model-generated text. There are several large scale NLI datasets today, and models have improved greatly by hill-climbing on these collections. Yet their realistic performance on out-of-distribution/domain data is less well-understood. We explore the opportunity for synthetic high-quality datasets to adapt NLI models for zero-shot use in downstream applications across new and unseen text domains. We demonstrate a new approach for generating NLI data in diverse domains and lengths, so far not covered by existing training sets. The resulting examples have meaningful premises, the hypotheses are formed in creative ways rather than simple edits to a few premise tokens, and the labels have high accuracy. We show that models trained on this data ($685$K synthetic examples) have the best generalization to completely new downstream test settings. On the TRUE benchmark, a T5-small model trained with our data improves around $7\%$ on average compared to training on the best alternative dataset. The improvements are more pronounced for smaller models, while still meaningful on a T5 XXL model. We also demonstrate gains on test sets when in-domain training data is augmented with our domain-general synthetic data.
△ Less
Submitted 28 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
New Approximation Bounds for Small-Set Vertex Expansion
Authors:
Suprovat Ghoshal,
Anand Louis
Abstract:
The vertex expansion of the graph is a fundamental graph parameter. Given a graph $G=(V,E)$ and a parameter $δ\in (0,1/2]$, its $δ$-Small-Set Vertex Expansion (SSVE) is defined as \[ \min_{S : |S| = δ|V|} \frac{|{\partial^V(S)}|}{ \min \{ |S|, |S^c| \} } \] where $\partial^V(S)$ is the vertex boundary of a set $S$. The SSVE~problem, in addition to being of independent interest as a natural graph p…
▽ More
The vertex expansion of the graph is a fundamental graph parameter. Given a graph $G=(V,E)$ and a parameter $δ\in (0,1/2]$, its $δ$-Small-Set Vertex Expansion (SSVE) is defined as \[ \min_{S : |S| = δ|V|} \frac{|{\partial^V(S)}|}{ \min \{ |S|, |S^c| \} } \] where $\partial^V(S)$ is the vertex boundary of a set $S$. The SSVE~problem, in addition to being of independent interest as a natural graph partitioning problem, is also of interest due to its connections to the Strong Unique Games problem. We give a randomized algorithm running in time $n^{{\sf poly}(1/δ)}$, which outputs a set $S$ of size $Θ(δn)$, having vertex expansion at most \[ \max\left(O(\sqrt{φ^* \log d \log (1/δ)}) , \tilde{O}(d\log^2(1/δ)) \cdot φ^* \right), \] where $d$ is the largest vertex degree of the graph, and $φ^*$ is the optimal $δ$-SSVE. The previous best-known guarantees for this were the bi-criteria bounds of $\tilde{O}(1/δ)\sqrt{φ^* \log d}$ and $\tilde{O}(1/δ)φ^* \sqrt{\log n}$ due to Louis-Makarychev [TOC'16].
Our algorithm uses the basic SDP relaxation of the problem augmented with ${\rm poly}(1/δ)$ rounds of the Lasserre/SoS hierarchy. Our rounding algorithm is a combination of the rounding algorithms of Raghavendra-Tan [SODA'12] and Austrin-Benabbas-Georgiou [SODA'13]. A key component of our analysis is novel Gaussian rounding lemma for hyperedges which might be of independent interest.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models
Authors:
Antoine Louis,
Gijs van Dijck,
Gerasimos Spanakis
Abstract:
Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approach…
▽ More
Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expert-annotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness
Authors:
Sruthi Gorantla,
Eshaan Bhansali,
Amit Deshpande,
Anand Louis
Abstract:
In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups…
▽ More
In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups ex-ante (or, in expectation), which may not guarantee representation fairness to the groups ex-post, that is, after realizing a ranking from the stochastic ranking model. Typically, ex-post fairness is achieved by post-processing, but previous work does not train stochastic ranking models that are aware of this post-processing.
In this paper, we propose a novel objective that maximizes expected relevance only over those rankings that satisfy given representation constraints to ensure ex-post fairness. Building upon recent work on an efficient sampler for ex-post group-fair rankings, we propose a group-fair Plackett-Luce model and show that it can be efficiently optimized for our objective in the LTR framework.
Experiments on three real-world datasets show that our group-fair algorithm guarantees fairness alongside usually having better relevance compared to the LTR baselines. In addition, our algorithm also achieves better relevance than post-processing baselines, which also ensures ex-post fairness. Further, when implicit bias is injected into the training data, our algorithm typically outperforms existing LTR baselines in relevance.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Sampling Individually-Fair Rankings that are Always Group Fair
Authors:
Sruthi Gorantla,
Anay Mehrotra,
Amit Deshpande,
Anand Louis
Abstract:
Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identif…
▽ More
Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identify uncertainty in the utilities of items as a primary cause of unfairness and propose introducing randomness in the output. This randomness is carefully chosen to guarantee an adequate representation of each item (while accounting for the uncertainty). However, due to this randomness, the output rankings may violate group fairness constraints. We give an efficient algorithm that samples rankings from an individually-fair distribution while ensuring that every output ranking is group fair. The expected utility of the output ranking is at least $α$ times the utility of the optimal fair solution. Here, $α$ depends on the utilities, position-discounts, and constraints -- it approaches 1 as the range of utilities or the position-discounts shrinks, or when utilities satisfy distributional assumptions. Empirically, we observe that our algorithm achieves individual and group fairness and that Pareto dominates the state-of-the-art baselines.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
LAIT: Efficient Multi-Segment Encoding in Transformers with Layer-Adjustable Interaction
Authors:
Jeremiah Milbauer,
Annie Louis,
Mohammad Javad Hosseini,
Alex Fabrikant,
Donald Metzler,
Tal Schuster
Abstract:
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segm…
▽ More
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages.
To this end, we introduce Layer-Adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. The LAIT framework effectively leverages existing pretrained Transformers and converts them into the hybrid of the two aforementioned architectures, allowing for easy and intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT able to reduce 30-50% of the attention FLOPs on many tasks, while preserving high accuracy; in some practical settings, LAIT could reduce actual latency by orders of magnitude.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
Authors:
Fantine Huot,
Joshua Maynez,
Shashi Narayan,
Reinald Kim Amplayo,
Kuzman Ganchev,
Annie Louis,
Anders Sandholm,
Dipanjan Das,
Mirella Lapata
Abstract:
While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration fo…
▽ More
While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output.
A short video demonstrating our system is available at https://goo.gle/text-blueprint-demo.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
Do deep neural networks have an inbuilt Occam's razor?
Authors:
Chris Mingard,
Henry Rees,
Guillermo Valle-Pérez,
Ard A. Louis
Abstract:
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting…
▽ More
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks
Authors:
Antoine Louis,
Gijs van Dijck,
Gerasimos Spanakis
Abstract:
Statutory article retrieval (SAR), the task of retrieving statute law articles relevant to a legal question, is a promising application of legal text processing. In particular, high-quality SAR systems can improve the work efficiency of legal professionals and provide basic legal assistance to citizens in need at no cost. Unlike traditional ad-hoc information retrieval, where each document is cons…
▽ More
Statutory article retrieval (SAR), the task of retrieving statute law articles relevant to a legal question, is a promising application of legal text processing. In particular, high-quality SAR systems can improve the work efficiency of legal professionals and provide basic legal assistance to citizens in need at no cost. Unlike traditional ad-hoc information retrieval, where each document is considered a complete source of information, SAR deals with texts whose full sense depends on complementary information from the topological organization of statute law. While existing works ignore these domain-specific dependencies, we propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance. Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Sparse Cuts in Hypergraphs from Random Walks on Simplicial Complexes
Authors:
Anand Louis,
Rameesh Paul,
Arka Ray
Abstract:
There are a lot of recent works on generalizing the spectral theory of graphs and graph partitioning to hypergraphs. There have been two broad directions toward this goal. One generalizes the notion of graph conductance to hypergraph conductance [LM16, CLTZ18]. In the second approach one can view a hypergraph as a simplicial complex and study its various topological properties [LM06, MW09, DKW16,…
▽ More
There are a lot of recent works on generalizing the spectral theory of graphs and graph partitioning to hypergraphs. There have been two broad directions toward this goal. One generalizes the notion of graph conductance to hypergraph conductance [LM16, CLTZ18]. In the second approach one can view a hypergraph as a simplicial complex and study its various topological properties [LM06, MW09, DKW16, PR17] and spectral properties [KM17, DK17, KO18a, KO18b, Opp20].
In this work, we attempt to bridge these two directions of study by relating the spectrum of {\em up-down walks} and {\em swap-walks} on the simplicial complex to hypergraph expansion. In surprising contrast to random-walks on graphs, we show that the spectral gap of swap-walks and up-down walks between level $m$ and $l$ with $1 < m \leq l$ can not be used to infer any bounds on hypergraph conductance. Moreover, we show that the spectral gap of swap-walks between $X(1)$ and $X(k-1)$ can not be used to infer any bounds on hypergraph conductance, whereas we give a Cheeger-like inequality relating the spectral of walks between level $1$ and $l$ for any $l \leq k$ to hypergraph expansion. This is a surprising difference between swaps-walks and up-down walks!
Finally, we also give a construction to show that the well-studied notion of {\em link expansion} in simplicial complexes can not be used to bound hypergraph expansion in a Cheeger-like manner.
△ Less
Submitted 3 October, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Resolving Indirect Referring Expressions for Entity Selection
Authors:
Mohammad Javad Hosseini,
Filip Radlinski,
Silvia Pareti,
Annie Louis
Abstract:
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural res…
▽ More
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address this problem of reference resolution, when people use natural expressions to choose between the entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a dialog participant may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
△ Less
Submitted 26 May, 2023; v1 submitted 21 December, 2022;
originally announced December 2022.
-
OpineSum: Entailment-based self-training for abstractive opinion summarization
Authors:
Annie Louis,
Joshua Maynez
Abstract:
A typical product or place often has hundreds of reviews, and summarization of these texts is an important and challenging problem. Recent progress on abstractive summarization in domains such as news has been driven by supervised systems trained on hundreds of thousands of news articles paired with human-written summaries. However for opinion texts, such large scale datasets are rarely available.…
▽ More
A typical product or place often has hundreds of reviews, and summarization of these texts is an important and challenging problem. Recent progress on abstractive summarization in domains such as news has been driven by supervised systems trained on hundreds of thousands of news articles paired with human-written summaries. However for opinion texts, such large scale datasets are rarely available. Unsupervised methods, self-training, and few-shot learning approaches bridge that gap. In this work, we present a novel self-training approach, OpineSum, for abstractive opinion summarization. The summaries in this approach are built using a novel application of textual entailment and capture the consensus of opinions across the various reviews for an item. This method can be used to obtain silver-standard summaries on a large scale and train both unsupervised and few-shot abstractive summarization systems. OpineSum achieves state-of-the-art performance in both settings.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Little Red Riding Hood Goes Around the Globe:Crosslingual Story Planning and Generation with Large Language Models
Authors:
Evgeniia Razumovskaia,
Joshua Maynez,
Annie Louis,
Mirella Lapata,
Shashi Narayan
Abstract:
Previous work has demonstrated the effectiveness of planning for story generation exclusively in a monolingual setting focusing primarily on English. We consider whether planning brings advantages to automatic story generation across languages. We propose a new task of cross-lingual story generation with planning and present a new dataset for this task. We conduct a comprehensive study of differen…
▽ More
Previous work has demonstrated the effectiveness of planning for story generation exclusively in a monolingual setting focusing primarily on English. We consider whether planning brings advantages to automatic story generation across languages. We propose a new task of cross-lingual story generation with planning and present a new dataset for this task. We conduct a comprehensive study of different plans and generate stories in several languages, by leveraging the creative and reasoning capabilities of large pre-trained language models. Our results demonstrate that plans which structure stories into three acts lead to more coherent and interesting narratives, while allowing to explicitly control their content and structure.
△ Less
Submitted 25 March, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Online Algorithms for Matchings with Proportional Fairness Constraints and Diversity Constraints
Authors:
Anand Louis,
Meghana Nasre,
Prajakta Nimbhorkar,
Govind S. Sankar
Abstract:
Matching problems with group-fairness constraints and diversity constraints have numerous applications such as in allocation problems, committee selection, school choice, etc. Moreover, online matching problems have lots of applications in ad allocations and other e-commerce problems like product recommendation in digital marketing.
We study two problems involving assigning {\em items} to {\em p…
▽ More
Matching problems with group-fairness constraints and diversity constraints have numerous applications such as in allocation problems, committee selection, school choice, etc. Moreover, online matching problems have lots of applications in ad allocations and other e-commerce problems like product recommendation in digital marketing.
We study two problems involving assigning {\em items} to {\em platforms}, where items belong to various {\em groups} depending on their attributes; the set of items are available offline and the platforms arrive online. In the first problem, we study online matchings with {\em proportional fairness constraints}. Here, each platform on arrival should either be assigned a set of items in which the fraction of items from each group is within specified bounds or be assigned no items; the goal is to assign items to platforms in order to maximize the number of items assigned to platforms.
In the second problem, we study online matchings with {\em diversity constraints}, i.e. for each platform, absolute lower bounds are specified for each group. Each platform on arrival should either be assigned a set of items that satisfy these bounds or be assigned no items; the goal is to maximize the set of platforms that get matched. We study approximation algorithms and hardness results for these problems. The technical core of our proofs is a new connection between these problems and the problem of matchings in hypergraphs.
Our experimental evaluation shows the performance of our algorithms on real-world and synthetic datasets exceeds our theoretical guarantees.
△ Less
Submitted 3 August, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Socially Fair Center-based and Linear Subspace Clustering
Authors:
Sruthi Gorantla,
Kishen N. Gowda,
Amit Deshpande,
Anand Louis
Abstract:
Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The…
▽ More
Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Individual Fairness under Varied Notions of Group Fairness in Bipartite Matching - One Framework to Approximate Them All
Authors:
Atasi Panda,
Anand Louis,
Prajakta Nimbhorkar
Abstract:
We study the probabilistic assignment of items to platforms that satisfies both group and individual fairness constraints. Each item belongs to specific groups and has a preference ordering over platforms. Each platform enforces group fairness by limiting the number of items per group that can be assigned to it. There could be multiple optimal solutions that satisfy the group fairness constraints,…
▽ More
We study the probabilistic assignment of items to platforms that satisfies both group and individual fairness constraints. Each item belongs to specific groups and has a preference ordering over platforms. Each platform enforces group fairness by limiting the number of items per group that can be assigned to it. There could be multiple optimal solutions that satisfy the group fairness constraints, but this alone ignores item preferences. Our approach explores a `best of both worlds fairness' solution to get a randomized matching, which is ex-ante individually fair and ex-post group-fair. Thus, we seek a `probabilistic individually fair' distribution over `group-fair' matchings where each item has a `high' probability of matching to one of its top choices. This distribution is also ex-ante group-fair. Users can customize fairness constraints to suit their requirements. Our first result is a polynomial-time algorithm that computes a distribution over `group-fair' matchings such that the individual fairness constraints are approximately satisfied and the expected size of a matching is close to OPT. We empirically test this on real-world datasets. We present two additional polynomial-time bi-criteria approximation algorithms that users can choose from to balance group fairness and individual fairness trade-offs.
For disjoint groups, we provide an exact polynomial-time algorithm adaptable to additional lower `group fairness' bounds. Extending our model, we encompass `maxmin group fairness,' amplifying underrepresented groups, and `mindom group fairness,' reducing the representation of dominant groups.'
△ Less
Submitted 10 May, 2024; v1 submitted 21 August, 2022;
originally announced August 2022.
-
Conditional Generation with a Question-Answering Blueprint
Authors:
Shashi Narayan,
Joshua Maynez,
Reinald Kim Amplayo,
Kuzman Ganchev,
Annie Louis,
Fantine Huot,
Anders Sandholm,
Dipanjan Das,
Mirella Lapata
Abstract:
The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our wo…
▽ More
The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e.,~what to say) and planning (i.e.,~in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.
△ Less
Submitted 1 May, 2023; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Approximating CSPs with Outliers
Authors:
Suprovat Ghoshal,
Anand Louis
Abstract:
Constraint satisfaction problems (CSPs) are ubiquitous in theoretical computer science. We study the problem of StrongCSPs, i.e. instances where a large induced sub-instance has a satisfying assignment. More formally, given a CSP instance $Ψ(V, E, [k], \{Π_{ij}\}_{(i,j) \in E})$ consisting of a set of vertices $V$, a set of edges $E$, alphabet $[k]$, a constraint $Π_{ij} \subset [k] \times [k]$ fo…
▽ More
Constraint satisfaction problems (CSPs) are ubiquitous in theoretical computer science. We study the problem of StrongCSPs, i.e. instances where a large induced sub-instance has a satisfying assignment. More formally, given a CSP instance $Ψ(V, E, [k], \{Π_{ij}\}_{(i,j) \in E})$ consisting of a set of vertices $V$, a set of edges $E$, alphabet $[k]$, a constraint $Π_{ij} \subset [k] \times [k]$ for each $(i,j) \in E$, the goal of this problem is to compute the largest subset $S \subseteq V$ such that the instance induced on $S$ has an assignment that satisfies all the constraints.
In this paper, we study approximation algorithms for Unique Games and related problems under the StrongCSP framework when the underlying constraint graph satisfies mild expansion properties. In particular, we show that given a Strong Unique Games instance whose optimal solution $S^*$ is supported on a regular low threshold rank graph, there exists an algorithm that runs in time exponential in the threshold rank, and recovers a large satisfiable sub-instance whose size is independent on the label set size and maximum degree of the graph. Our algorithm combines the techniques of Barak-Raghavendra-Steurer (FOCS'11), Guruswami-Sinop (FOCS'11) with several new ideas and runs in time exponential in the threshold rank of the optimal set. A key component of our algorithm is a new threshold rank based spectral decomposition, which is used to compute a "large" induced subgraph of "small" threshold rank; our techniques build on the work of Oveis Gharan and Rezaei (SODA'17) and could be of independent interest.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Exact recovery algorithm for Planted Bipartite Graph in Semi-random Graphs
Authors:
Akash Kumar,
Anand Louis,
Rameesh Paul
Abstract:
The problem of finding the largest induced balanced bipartite subgraph in a given graph is NP-hard. This problem is closely related to the problem of finding the smallest Odd Cycle Transversal.
In this work, we consider the following model of instances: starting with a set of vertices $V$, a set $S \subseteq V$ of $k$ vertices is chosen and an arbitrary $d$-regular bipartite graph is added on it…
▽ More
The problem of finding the largest induced balanced bipartite subgraph in a given graph is NP-hard. This problem is closely related to the problem of finding the smallest Odd Cycle Transversal.
In this work, we consider the following model of instances: starting with a set of vertices $V$, a set $S \subseteq V$ of $k$ vertices is chosen and an arbitrary $d$-regular bipartite graph is added on it; edges between pairs of vertices in $S \times (V \setminus S)$ and $(V \setminus S) \times (V \setminus S)$ are added with probability $p$. Since for $d=0$, the problem reduces to recovering a planted independent set, we don't expect efficient algorithms for $k=o(\sqrt{n})$. This problem is a generalization of the planted balanced biclique problem where the bipartite graph induced on $S$ is a complete bipartite graph; [Lev18] gave an algorithm for recovering $S$ in this problem when $k=Ω(\sqrt{n})$.
Our main result is an efficient algorithm that recovers (w.h.p.) the planted bipartite graph when $k=Ω_p(\sqrt{n \log n})$ for a large range of parameters. Our results also hold for a natural semi-random model of instances, which involve the presence of a monotone adversary. Our proof shows that a natural SDP relaxation for the problem is integral by constructing an appropriate solution to it's dual formulation. Our main technical contribution is a new approach for constructing the dual solution where we calibrate the eigenvectors of the adjacency matrix to be the eigenvectors of the dual matrix. We believe that this approach may have applications to other recovery problems in semi-random models as well.
When $k=Ω(\sqrt{n})$, we give an algorithm for recovering $S$ whose running time is exponential in the number of small eigenvalues in graph induced on $S$; this algorithm is based on subspace enumeration techniques due to the works of [KT07,ABS10,Kol11].
△ Less
Submitted 12 May, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Sampling Ex-Post Group-Fair Rankings
Authors:
Sruthi Gorantla,
Amit Deshpande,
Anand Louis
Abstract:
Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on…
▽ More
Randomized rankings have been of recent interest to achieve ex-ante fairer exposure and better robustness than deterministic rankings. We propose a set of natural axioms for randomized group-fair rankings and prove that there exists a unique distribution $D$ that satisfies our axioms and is supported only over ex-post group-fair rankings, i.e., rankings that satisfy given lower and upper bounds on group-wise representation in the top-$k$ ranks. Our problem formulation works even when there is implicit bias, incomplete relevance information, or only ordinal ranking is available instead of relevance scores or utility values.
We propose two algorithms to sample a random group-fair ranking from the distribution $D$ mentioned above. Our first dynamic programming-based algorithm samples ex-post group-fair rankings uniformly at random in time $O(k^2\ell)$, where $\ell$ is the number of groups. Our second random walk-based algorithm samples ex-post group-fair rankings from a distribution $δ$-close to $D$ in total variation distance and has expected running time $O^*(k^2\ell^2)$, when there is a sufficient gap between the given upper and lower bounds on the group-wise representation. The former does exact sampling, but the latter runs significantly faster on real-world data sets for larger values of $k$. We give empirical evidence that our algorithms compare favorably against recent baselines for fairness and ranking utility on real-world data sets.
△ Less
Submitted 29 May, 2023; v1 submitted 2 March, 2022;
originally announced March 2022.
-
A Statutory Article Retrieval Dataset in French
Authors:
Antoine Louis,
Gerasimos Spanakis
Abstract:
Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian…
▽ More
Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles. Using BSARD, we benchmark several state-of-the-art retrieval approaches, including lexical and dense architectures, both in zero-shot and supervised setups. We find that fine-tuned dense retrieval models significantly outperform other systems. Our best performing baseline achieves 74.8% R@100, which is promising for the feasibility of the task and indicates there is still room for improvement. By the specificity of the domain and addressed task, BSARD presents a unique challenge problem for future research on legal information retrieval. Our dataset and source code are publicly available.
△ Less
Submitted 15 March, 2022; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Matchings with Group Fairness Constraints: Online and Offline Algorithms
Authors:
Govind S. Sankar,
Anand Louis,
Meghana Nasre,
Prajakta Nimbhorkar
Abstract:
We consider the problem of assigning items to platforms in the presence of group fairness constraints. In the input, each item belongs to certain categories, called classes in this paper. Each platform specifies the group fairness constraints through an upper bound on the number of items it can serve from each class. Additionally, each platform also has an upper bound on the total number of items…
▽ More
We consider the problem of assigning items to platforms in the presence of group fairness constraints. In the input, each item belongs to certain categories, called classes in this paper. Each platform specifies the group fairness constraints through an upper bound on the number of items it can serve from each class. Additionally, each platform also has an upper bound on the total number of items it can serve. The goal is to assign items to platforms so as to maximize the number of items assigned while satisfying the upper bounds of each class. In some cases, there is a revenue associated with matching an item to a platform, then the goal is to maximize the revenue generated.
This problem models several important real-world problems like ad-auctions, scheduling, resource allocations, school choice etc.We also show an interesting connection to computing a generalized maximum independent set on hypergraphs and ranking items under group fairness constraints.
We show that if the classes are arbitrary, then the problem is NP-hard and has a strong inapproximability. We consider the problem in both online and offline settings under natural restrictions on the classes. Under these restrictions, the problem continues to remain NP-hard but admits approximation algorithms with small approximation factors. We also implement some of the algorithms. Our experiments show that the algorithms work well in practice both in terms of efficiency and the number of items that get assigned to some platform.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Independent Sets in Semi-random Hypergraphs
Authors:
Yash Khanna,
Anand Louis,
Rameesh Paul
Abstract:
A set of vertices in a hypergraph is called an independent set if no hyperedge is completely contained inside the set. Given a hypergraph, computing its largest size independent set is an NP-hard problem.
In this work, we study the independent set problem on hypergraphs in a natural semi-random family of instances. Our semi-random model is inspired by the Feige-Kilian model [FK01]. This popular…
▽ More
A set of vertices in a hypergraph is called an independent set if no hyperedge is completely contained inside the set. Given a hypergraph, computing its largest size independent set is an NP-hard problem.
In this work, we study the independent set problem on hypergraphs in a natural semi-random family of instances. Our semi-random model is inspired by the Feige-Kilian model [FK01]. This popular model has also been studied in the works of [FK01, Ste17, MMT20] etc. McKenzie, Mehta, and Trevisan [MMT20] gave algorithms for computing independent sets in such a semi-random family of graphs. The algorithms by McKenzie et al. [MMT20] are based on rounding a "crude-SDP". We generalize their results and techniques to hypergraphs for an analogous family of hypergraph instances. Our algorithms are based on rounding the "crude-SDP" of McKenzie et al. [MMT20], augmented with "Lasserre/SoS like" hierarchy of constraints. Analogous to the results of McKenzie et al. [MMT20], we study the ranges of input parameters where we can recover the planted independent set or a large independent set.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Why flatness does and does not correlate with generalization for deep neural networks
Authors:
Shuofeng Zhang,
Isaac Reid,
Guillermo Valle Pérez,
Ard Louis
Abstract:
The intuition that local flatness of the loss landscape is correlated with better generalization for deep neural networks (DNNs) has been explored for decades, spawning many different flatness measures. Recently, this link with generalization has been called into question by a demonstration that many measures of flatness are vulnerable to parameter re-scaling which arbitrarily changes their value…
▽ More
The intuition that local flatness of the loss landscape is correlated with better generalization for deep neural networks (DNNs) has been explored for decades, spawning many different flatness measures. Recently, this link with generalization has been called into question by a demonstration that many measures of flatness are vulnerable to parameter re-scaling which arbitrarily changes their value without changing neural network outputs.
Here we show that, in addition, some popular variants of SGD such as Adam and Entropy-SGD, can also break the flatness-generalization correlation. As an alternative to flatness measures, we use a function based picture and propose using the log of Bayesian prior upon initialization, $\log P(f)$, as a predictor of the generalization when a DNN converges on function $f$ after training to zero error. The prior is directly proportional to the Bayesian posterior for functions that give zero error on a test set. For the case of image classification, we show that $\log P(f)$ is a significantly more robust predictor of generalization than flatness measures are.
Whilst local flatness measures fail under parameter re-scaling, the prior/posterior, which is global quantity, remains invariant under re-scaling. Moreover, the correlation with generalization as a function of data complexity remains good for different variants of SGD.
△ Less
Submitted 21 June, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Double-descent curves in neural networks: a new perspective using Gaussian processes
Authors:
Ouns El Harzli,
Bernardo Cuenca Grau,
Guillermo Valle-Pérez,
Ard A. Louis
Abstract:
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribut…
▽ More
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expression allows us to study the generalisation behavior of the corresponding kernel and GP regression, and provides a new interpretation of the double-descent phenomenon, namely as governed by the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.
△ Less
Submitted 25 May, 2023; v1 submitted 14 February, 2021;
originally announced February 2021.
-
Generalization bounds for deep learning
Authors:
Guillermo Valle-Pérez,
Ard A. Louis
Abstract:
Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions should 1) scale correctly with data complexity; 2) scale correctly with training set size; 3) capture differences between architectures; 4) capture diffe…
▽ More
Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions should 1) scale correctly with data complexity; 2) scale correctly with training set size; 3) capture differences between architectures; 4) capture differences between optimization algorithms; 5) be quantitatively not too far from the true error (in particular, be non-vacuous); 6) be efficiently computable; and 7) be rigorous. We focus on generalization error upper bounds, and introduce a categorisation of bounds depending on assumptions on the algorithm and data. We review a wide range of existing approaches, from classical VC dimension to recent PAC-Bayesian bounds, commenting on how well they perform against the desiderata.
We next use a function-based picture to derive a marginal-likelihood PAC-Bayesian bound. This bound is, by one definition, optimal up to a multiplicative constant in the asymptotic limit of large training sets, as long as the learning curve follows a power law, which is typically found in practice for deep learning problems. Extensive empirical analysis demonstrates that our marginal-likelihood PAC-Bayes bound fulfills desiderata 1-3 and 5. The results for 6 and 7 are promising, but not yet fully conclusive, while only desideratum 4 is currently beyond the scope of our bound. Finally, we comment on why this function-based bound performs significantly better than current parameter-based PAC-Bayes bounds.
△ Less
Submitted 9 December, 2020; v1 submitted 7 December, 2020;
originally announced December 2020.
-
On the Problem of Underranking in Group-Fair Ranking
Authors:
Sruthi Gorantla,
Amit Deshpande,
Anand Louis
Abstract:
Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic ineq…
▽ More
Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines.
△ Less
Submitted 18 February, 2021; v1 submitted 24 September, 2020;
originally announced October 2020.
-
"I'd rather just go to bed": Understanding Indirect Answers
Authors:
Annie Louis,
Dan Roth,
Filip Radlinski
Abstract:
We revisit a pragmatic inference problem in dialog: understanding indirect responses to questions. Humans can interpret 'I'm starving.' in response to 'Hungry?', even without direct cue words such as 'yes' and 'no'. In dialog systems, allowing natural responses rather than closed vocabularies would be similarly beneficial. However, today's systems are only as sensitive to these pragmatic moves as…
▽ More
We revisit a pragmatic inference problem in dialog: understanding indirect responses to questions. Humans can interpret 'I'm starving.' in response to 'Hungry?', even without direct cue words such as 'yes' and 'no'. In dialog systems, allowing natural responses rather than closed vocabularies would be similarly beneficial. However, today's systems are only as sensitive to these pragmatic moves as their language model allows. We create and release the first large-scale English language corpus 'Circa' with 34,268 (polar question, indirect answer) pairs to enable progress on this task. The data was collected via elaborate crowdsourcing, and contains utterances with yes/no meaning, as well as uncertain, middle-ground, and conditional responses. We also present BERT-based neural models to predict such categories for a question-answer pair. We find that while transfer learning from entailment works reasonably, performance is not yet sufficient for robust dialog. Our models reach 82-88% accuracy for a 4-class distinction, and 74-85% for 6 classes.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Robust Identifiability in Linear Structural Equation Models of Causal Inference
Authors:
Karthik Abinav Sankararaman,
Anand Louis,
Navin Goyal
Abstract:
In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on…
▽ More
In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths.'' In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Is SGD a Bayesian sampler? Well, almost
Authors:
Chris Mingard,
Guillermo Valle-Pérez,
Joar Skalse,
Ard A. Louis
Abstract:
Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen data, and yet in practice DNNs often generalise remarkably well. This success suggests that a trained DNN must have a strong inductive bias towards functions with l…
▽ More
Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen data, and yet in practice DNNs often generalise remarkably well. This success suggests that a trained DNN must have a strong inductive bias towards functions with low generalisation error. Here we empirically investigate this inductive bias by calculating, for a range of architectures and datasets, the probability $P_{SGD}(f\mid S)$ that an overparameterised DNN, trained with stochastic gradient descent (SGD) or one of its variants, converges on a function $f$ consistent with a training set $S$. We also use Gaussian processes to estimate the Bayesian posterior probability $P_B(f\mid S)$ that the DNN expresses $f$ upon random sampling of its parameters, conditioned on $S$.
Our main findings are that $P_{SGD}(f\mid S)$ correlates remarkably well with $P_B(f\mid S)$ and that $P_B(f\mid S)$ is strongly biased towards low-error and low complexity functions. These results imply that strong inductive bias in the parameter-function map (which determines $P_B(f\mid S)$), rather than a special property of SGD, is the primary explanation for why DNNs generalise so well in the overparameterised regime.
While our results suggest that the Bayesian posterior $P_B(f\mid S)$ is the first order determinant of $P_{SGD}(f\mid S)$, there remain second order differences that are sensitive to hyperparameter tuning. A function probability picture, based on $P_{SGD}(f\mid S)$ and/or $P_B(f\mid S)$, can shed new light on the way that variations in architecture or hyperparameter settings such as batch size, learning rate, and optimiser choice, affect DNN performance.
△ Less
Submitted 24 October, 2020; v1 submitted 26 June, 2020;
originally announced June 2020.
-
Group Fairness for Knapsack Problems
Authors:
Deval Patel,
Arindam Khan,
Anand Louis
Abstract:
We study the knapsack problem with group fairness constraints. The input of the problem consists of a knapsack of bounded capacity and a set of items, each item belongs to a particular category and has and associated weight and value. The goal of this problem is to select a subset of items such that all categories are fairly represented, the total weight of the selected items does not exceed the c…
▽ More
We study the knapsack problem with group fairness constraints. The input of the problem consists of a knapsack of bounded capacity and a set of items, each item belongs to a particular category and has and associated weight and value. The goal of this problem is to select a subset of items such that all categories are fairly represented, the total weight of the selected items does not exceed the capacity of the knapsack,and the total value is maximized. We study the fairness parameters such as the bounds on the total value of items from each category, the total weight of items from each category, and the total number of items from each category. We give approximation algorithms for these problems. These fairness notions could also be extended to the min-knapsack problem. The fair knapsack problems encompass various important problems, such as participatory budgeting, fair budget allocation, advertising.
△ Less
Submitted 16 January, 2021; v1 submitted 14 June, 2020;
originally announced June 2020.
-
Approximation Algorithms and Hardness for Strong Unique Games
Authors:
Suprovat Ghoshal,
Anand Louis
Abstract:
The UNIQUE GAMES problem is a central problem in algorithms and complexity theory. Given an instance of UNIQUE GAMES, the STRONG UNIQUE GAMES problem asks to find the largest subset of vertices, such that the UNIQUE GAMES instance induced on them is completely satisfiable. In this paper, we give new algorithmic and hardness results for STRONG UNIQUE GAMES. Given an instance with label set size…
▽ More
The UNIQUE GAMES problem is a central problem in algorithms and complexity theory. Given an instance of UNIQUE GAMES, the STRONG UNIQUE GAMES problem asks to find the largest subset of vertices, such that the UNIQUE GAMES instance induced on them is completely satisfiable. In this paper, we give new algorithmic and hardness results for STRONG UNIQUE GAMES. Given an instance with label set size $k$ where a set of $(1 - ε)$-fraction of the vertices induce an instance that is completely satisfiable, our first algorithm produces a set of $1 - \widetilde{O}({k^2}) ε\sqrt{\log n}$ fraction of the vertices such that the UNIQUE GAMES induced on them is completely satisfiable. In the same setting, our second algorithm produces a set of $1 - \widetilde{O}({k^2}) \sqrt{ε\log d}$ (here $d$ is the largest vertex degree of the graph) fraction of the vertices such that the UNIQUE GAMES induced on them is completely satisfiable. The technical core of our results is a new connection between STRONG UNIQUE GAMES and Small-Set-Vertex-Expansion in graphs. Complementing this, assuming the Unique Games Conjecture, we prove that it is NP-hard to compute a set of size larger than $1 - Ω( \sqrt{ε\log k \log d})$ for which all the constraints induced on this set are satisfied.
Given an undirected graph $G(V,E)$ the ODD CYCLE TRANSVERSAL problem asks to delete the least fraction of vertices to make the induced graph on the remaining vertices bipartite. As a corollary to our main algorithmic results, we obtain an algorithm that outputs a set $S$ such the graph induced on $V \setminus S$ is bipartite, and $|S|/n \leq O(\sqrt{ε\log d})$ (here $d$ is the largest vertex degree and $ε$ is the optimal fraction of vertices that need to be deleted). Assuming the Unique Games Conjecture, we prove a matching (up to constant factors) hardness.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Planted Models for the Densest $k$-Subgraph Problem
Authors:
Yash Khanna,
Anand Louis
Abstract:
Given an undirected graph $G$, the Densest $k$-subgraph problem (DkS) asks to compute a set $S \subset V$ of cardinality $\left\lvert S\right\rvert \leq k$ such that the weight of edges inside $S$ is maximized. This is a fundamental NP-hard problem whose approximability, inspite of many decades of research, is yet to be settled. The current best known approximation algorithm due to Bhaskara et al.…
▽ More
Given an undirected graph $G$, the Densest $k$-subgraph problem (DkS) asks to compute a set $S \subset V$ of cardinality $\left\lvert S\right\rvert \leq k$ such that the weight of edges inside $S$ is maximized. This is a fundamental NP-hard problem whose approximability, inspite of many decades of research, is yet to be settled. The current best known approximation algorithm due to Bhaskara et al. (2010) computes a $\mathcal{O}\left({n^{1/4 + ε}}\right)$ approximation in time $n^{\mathcal{O}\left(1/ε\right)}$, for any $ε> 0$.
We ask what are some "easier" instances of this problem? We propose some natural semi-random models of instances with a planted dense subgraph, and study approximation algorithms for computing the densest subgraph in them. These models are inspired by the semi-random models of instances studied for various other graph problems such as the independent set problem, graph partitioning problems etc. For a large range of parameters of these models, we get significantly better approximation factors for the Densest $k$-subgraph problem. Moreover, our algorithm recovers a large part of the planted solution.
△ Less
Submitted 7 November, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
$λ_\infty$ & Maximum Variance Embedding: Measuring and Optimizing Connectivity of A Graph Metric
Authors:
Majid Farhadi,
Anand Louis,
Mohit Singh,
Prasad Tetali
Abstract:
Bobkov, Houdré, and the last author [2000] introduced a Poincaré-type functional parameter, $λ_\infty$, of a graph and related it to connectivity of the graph via Cheeger-type inequalities. A work by the second author, Raghavendra, and Vempala [2013] related the complexity of $λ_\infty$ to the so-called small-set expansion (SSE) problem and further set forth the desiderata for NP-hardness of this…
▽ More
Bobkov, Houdré, and the last author [2000] introduced a Poincaré-type functional parameter, $λ_\infty$, of a graph and related it to connectivity of the graph via Cheeger-type inequalities. A work by the second author, Raghavendra, and Vempala [2013] related the complexity of $λ_\infty$ to the so-called small-set expansion (SSE) problem and further set forth the desiderata for NP-hardness of this optimization problem. We confirm the conjecture that computing $λ_\infty$ is NP-hard for weighted trees.
Beyond measuring connectivity in many applications we want to optimize it. This, via convex duality, leads to a problem in machine learning known as the Maximum Variance Embedding (MVE). The output is a function from vertices to a low dim Euclidean space, subject to bounds on Euclidean distances between neighbors. The objective is to maximize output variance. Special cases of MVE into $n$ and $1$ dims lead to absolute algebraic connectivity [1990] and spread constant [1998], that measure connectivity of the graph and its Cartesian $n$-power, respectively. MVE has other applications in measuring diffusion speed and robustness of networks, clustering, and dimension reduction.
We show that computing MVE in tree-width dims is NP-hard, while only one additional dim beyond width of a given tree-decomposition makes the problem in P. We show that MVE of a tree in 2 dims defines a non-convex yet benign optimization landscape, i.e., local=global optima. We further develop a linear time combinatorial algorithm for this case. Finally, we denote approximate Maximum Variance Embedding is tractable in significantly lower dims. For trees and general graphs, for which Maximum Variance Embedding cannot be solved in less than $2$ and $Ω(n)$ dims, we provide $1+\varepsilon$ approximation algorithms for embedding into $1$ and $O(\log n /\varepsilon^2)$ dims, respectively.
△ Less
Submitted 30 June, 2022; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Planted Models for $k$-way Edge and Vertex Expansion
Authors:
Anand Louis,
Rakesh Venkat
Abstract:
Graph partitioning problems are a central topic of study in algorithms and complexity theory. Edge expansion and vertex expansion, two popular graph partitioning objectives, seek a $2$-partition of the vertex set of the graph that minimizes the considered objective. However, for many natural applications, one might require a graph to be partitioned into $k$ parts, for some $k \geq 2$. For a $k$-pa…
▽ More
Graph partitioning problems are a central topic of study in algorithms and complexity theory. Edge expansion and vertex expansion, two popular graph partitioning objectives, seek a $2$-partition of the vertex set of the graph that minimizes the considered objective. However, for many natural applications, one might require a graph to be partitioned into $k$ parts, for some $k \geq 2$. For a $k$-partition $S_1, \ldots, S_k$ of the vertex set of a graph $G = (V,E)$, the $k$-way edge expansion (resp. vertex expansion) of $\{S_1, \ldots, S_k\}$ is defined as $\max_{i \in [k]} Φ(S_i)$, and the balanced $k$-way edge expansion (resp. vertex expansion) of $G$ is defined as \[ \min_{ \{S_1, \ldots, S_k\} \in \mathcal{P}_k} \max_{i \in [k]} Φ(S_i) \, , \] where $\mathcal{P}_k$ is the set of all balanced $k$-partitions of $V$ (i.e each part of a $k$-partition in $\mathcal{P}_k$ should have cardinality $|V|/k$), and $Φ(S)$ denotes the edge expansion (resp. vertex expansion) of $S \subset V$. We study a natural planted model for graphs where the vertex set of a graph has a $k$-partition $S_1, \ldots, S_k$ such that the graph induced on each $S_i$ has large expansion, but each $S_i$ has small edge expansion (resp. vertex expansion) in the graph. We give bi-criteria approximation algorithms for computing the balanced $k$-way edge expansion (resp. vertex expansion) of instances in this planted model.
△ Less
Submitted 20 October, 2019;
originally announced October 2019.
-
Neural networks are a priori biased towards Boolean functions with low entropy
Authors:
Chris Mingard,
Joar Skalse,
Guillermo Valle-Pérez,
David Martínez-Rubio,
Vladimir Mikulik,
Ard A. Louis
Abstract:
Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies t points…
▽ More
Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies t points in ${0,1}^n$ as 1 has a remarkably simple form: $P(t) = 2^{-n}$ for $0\leq t < 2^n$.
Since a perceptron can express far fewer Boolean functions with small or large values of t (low entropy) than with intermediate values of t (high entropy) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.
△ Less
Submitted 2 January, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses
Authors:
Matt Grenander,
Yue Dong,
Jackie Chi Kit Cheung,
Annie Louis
Abstract:
Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article. In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose…
▽ More
Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article. In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs 'unbiased' data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining.
△ Less
Submitted 8 September, 2019;
originally announced September 2019.
-
Approximation Algorithms for Partially Colorable Graphs
Authors:
Suprovat Ghoshal,
Anand Louis,
Rahul Raychaudhury
Abstract:
Graph coloring problems are a central topic of study in the theory of algorithms. We study the problem of partially coloring partially colorable graphs. For $α\leq 1$ and $k \in \mathbb{Z}^+$, we say that a graph $G=(V,E)$ is $α$-partially $k$-colorable, if there exists a subset $S\subset V$ of cardinality $ |S | \geq α| V |$ such that the graph induced on $S$ is $k$-colorable. Partial $k$-colorab…
▽ More
Graph coloring problems are a central topic of study in the theory of algorithms. We study the problem of partially coloring partially colorable graphs. For $α\leq 1$ and $k \in \mathbb{Z}^+$, we say that a graph $G=(V,E)$ is $α$-partially $k$-colorable, if there exists a subset $S\subset V$ of cardinality $ |S | \geq α| V |$ such that the graph induced on $S$ is $k$-colorable. Partial $k$-colorability is a more robust structural property of a graph than $k$-colorability. For graphs that arise in practice, partial $k$-colorability might be a better notion to use than $k$-colorability, since data arising in practice often contains various forms of noise.
We give a polynomial time algorithm that takes as input a $(1 - ε)$-partially $3$-colorable graph $G$ and a constant $γ\in [ε, 1/10]$, and colors a $(1 - ε/γ)$ fraction of the vertices using $\tilde{O}\left(n^{0.25 + O(γ^{1/2})} \right)$ colors. We also study natural semi-random families of instances of partially $3$-colorable graphs and partially $2$-colorable graphs, and give stronger bi-criteria approximation guarantees for these family of instances.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Stability of Linear Structural Equation Models of Causal Inference
Authors:
Karthik Abinav Sankararaman,
Anand Louis,
Navin Goyal
Abstract:
We consider the numerical stability of the parameter recovery problem in Linear Structural Equation Model ($\LSEM$) of causal inference. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of $\LSEM$ allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of this paper is complementary to this li…
▽ More
We consider the numerical stability of the parameter recovery problem in Linear Structural Equation Model ($\LSEM$) of causal inference. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of $\LSEM$ allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of this paper is complementary to this line of work; we want to understand the stability of the recovery problem in the cases when efficient recovery is possible. Numerical stability of Pearl's notion of causality was first studied in Schulman and Srivastava (2016) using the concept of condition number where they provide ill-conditioned examples. In this work, we provide a condition number analysis for the $\LSEM$. First we prove that under a sufficient condition, for a certain sub-class of $\LSEM$ that are \emph{bow-free} (Brito and Pearl (2002)), the parameter recovery is stable. We further prove that \emph{randomly} chosen input parameters for this family satisfy the condition with a substantial probability. Hence for this family, on a large subset of parameter space, recovery is numerically stable. Next we construct an example of $\LSEM$ on four vertices with \emph{unbounded} condition number. We then corroborate our theoretical findings via simulations as well as real-world experiments for a sociology application. Finally, we provide a general heuristic for estimating the condition number of any $\LSEM$ instance.
△ Less
Submitted 17 August, 2020; v1 submitted 16 May, 2019;
originally announced May 2019.
-
What comes next? Extractive summarization by next-sentence prediction
Authors:
Jingyun Liu,
Jackie C. K. Cheung,
Annie Louis
Abstract:
Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimization problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel…
▽ More
Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimization problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the human-written gold standards.
△ Less
Submitted 12 January, 2019;
originally announced January 2019.
-
HyperGCN: A New Method of Training Graph Convolutional Networks on Hypergraphs
Authors:
Naganand Yadati,
Madhav Nimishakavi,
Prateek Yadav,
Vikram Nitin,
Anand Louis,
Partha Talukdar
Abstract:
In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. The obvious existence of such complex relationships in many real-world networks naturaly motivates the problem of learning with hypergraphs. A popular learni…
▽ More
In many real-world network datasets such as co-authorship, co-citation, email communication, etc., relationships are complex and go beyond pairwise. Hypergraphs provide a flexible and natural modeling tool to model such complex relationships. The obvious existence of such complex relationships in many real-world networks naturaly motivates the problem of learning with hypergraphs. A popular learning paradigm is hypergraph-based semi-supervised learning (SSL) where the goal is to assign labels to initially unlabeled vertices in a hypergraph. Motivated by the fact that a graph convolutional network (GCN) has been effective for graph-based SSL, we propose HyperGCN, a novel GCN for SSL on attributed hypergraphs. Additionally, we show how HyperGCN can be used as a learning-based approach for combinatorial optimisation on NP-hard hypergraph problems. We demonstrate HyperGCN's effectiveness through detailed experimentation on real-world hypergraphs.
△ Less
Submitted 22 May, 2019; v1 submitted 7 September, 2018;
originally announced September 2018.
-
Deep Learning to Detect Redundant Method Comments
Authors:
Annie Louis,
Santanu Kumar Dash,
Earl T. Barr,
Charles Sutton
Abstract:
Comments in software are critical for maintenance and reuse. But apart from prescriptive advice, there is little practical support or quantitative understanding of what makes a comment useful. In this paper, we introduce the task of identifying comments which are uninformative about the code they are meant to document. To address this problem, we introduce the notion of comment entailment from cod…
▽ More
Comments in software are critical for maintenance and reuse. But apart from prescriptive advice, there is little practical support or quantitative understanding of what makes a comment useful. In this paper, we introduce the task of identifying comments which are uninformative about the code they are meant to document. To address this problem, we introduce the notion of comment entailment from code, high entailment indicating that a comment's natural language semantics can be inferred directly from the code. Although not all entailed comments are low quality, comments that are too easily inferred, for example, comments that restate the code, are widely discouraged by authorities on software style. Based on this, we develop a tool called CRAIC which scores method-level comments for redundancy. Highly redundant comments can then be expanded or alternately removed by the developer. CRAIC uses deep language models to exploit large software corpora without requiring expensive manual annotations of entailment. We show that CRAIC can perform the comment entailment task with good agreement with human judgements. Our findings also have implications for documentation tools. For example, we find that common tags in Javadoc are at least two times more predictable from code than non-Javadoc sentences, suggesting that Javadoc tags are less informative than more free-form comments
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Semi-Random Graphs with Planted Sparse Vertex Cuts: Algorithms for Exact and Approximate Recovery
Authors:
Anand Louis,
Rakesh Venkat
Abstract:
The problem of computing the vertex expansion of a graph is an NP-hard problem. The current best worst-case approximation guarantees for computing the vertex expansion of a graph are a $O(\sqrt{\log n})$-approximation algorithm due to Feige, Hajiaghayi and Lee [SIAM J. Comp., 2008], and $O(\sqrt{OPT \log d})$ bound in graphs having vertex degrees at most $d$, due to Louis, Raghavendra and Vempala…
▽ More
The problem of computing the vertex expansion of a graph is an NP-hard problem. The current best worst-case approximation guarantees for computing the vertex expansion of a graph are a $O(\sqrt{\log n})$-approximation algorithm due to Feige, Hajiaghayi and Lee [SIAM J. Comp., 2008], and $O(\sqrt{OPT \log d})$ bound in graphs having vertex degrees at most $d$, due to Louis, Raghavendra and Vempala [FOCS 2013].
We study a natural semi-random model of graphs with sparse vertex cuts. For certain ranges of parameters, we give an algorithm to recover the planted sparse vertex cut exactly. For a larger range of parameters, we give a constant factor bi-criteria approximation algorithm to compute the graph's balanced vertex expansion. Our algorithms are based on studying a semidefinite programming relaxation for the balanced vertex expansion of the graph.
In addition to being a family of instances that will help us to better understand the complexity of the computation of vertex expansion, our model can also be used in the study of community detection where only a few nodes from each community interact with nodes from other communities. There has been a lot of work on studying random and semi-random graphs with planted sparse edge cuts. To the best of our knowledge, our model of semi-random graphs with planted sparse vertex cuts has not been studied before.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Deep learning generalizes because the parameter-function map is biased towards simple functions
Authors:
Guillermo Valle-Pérez,
Chico Q. Camargo,
Ard A. Louis
Abstract:
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly…
▽ More
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.
△ Less
Submitted 21 April, 2019; v1 submitted 22 May, 2018;
originally announced May 2018.