subscribe to arXiv mailings

Weisfeiler-Leman at the margin: When more expressivity matters

Authors: Billy J. Franks, Christopher Morris, Ameya Velingker, Floris Geerts

Abstract: The Weisfeiler-Leman algorithm ($1$-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, $1$-WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of mor… ▽ More The Weisfeiler-Leman algorithm ($1$-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, $1$-WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of more expressive MPNN and kernel architectures. However, the relationship between enhanced expressivity and improved generalization performance remains unclear. Here, we show that an architecture's expressivity offers limited insights into its generalization performance when viewed through graph isomorphism. Moreover, we focus on augmenting $1$-WL and MPNNs with subgraph information and employ classical margin theory to investigate the conditions under which an architecture's increased expressivity aligns with improved generalization performance. In addition, we show that gradient flow pushes the MPNN's weights toward the maximum margin solution. Further, we introduce variations of expressive $1$-WL-based kernel and MPNN architectures with provable generalization properties. Our empirical study confirms the validity of our theoretical findings. △ Less

Submitted 28 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted at ICML 2024. arXiv admin note: text overlap with arXiv:2301.11039

arXiv:2310.01668 [pdf, other]

Locality-Aware Graph-Rewiring in GNNs

Authors: Federico Barbero, Ameya Velingker, Amin Saberi, Michael Bronstein, Francesco Di Giovanni

Abstract: Graph Neural Networks (GNNs) are popular models for machine learning on graphs that typically follow the message-passing paradigm, whereby the feature of a node is updated recursively upon aggregating information over its neighbors. While exchanging messages over the input graph endows GNNs with a strong inductive bias, it can also make GNNs susceptible to over-squashing, thereby preventing them f… ▽ More Graph Neural Networks (GNNs) are popular models for machine learning on graphs that typically follow the message-passing paradigm, whereby the feature of a node is updated recursively upon aggregating information over its neighbors. While exchanging messages over the input graph endows GNNs with a strong inductive bias, it can also make GNNs susceptible to over-squashing, thereby preventing them from capturing long-range interactions in the given graph. To rectify this issue, graph rewiring techniques have been proposed as a means of improving information flow by altering the graph connectivity. In this work, we identify three desiderata for graph-rewiring: (i) reduce over-squashing, (ii) respect the locality of the graph, and (iii) preserve the sparsity of the graph. We highlight fundamental trade-offs that occur between spatial and spectral rewiring techniques; while the former often satisfy (i) and (ii) but not (iii), the latter generally satisfy (i) and (iii) at the expense of (ii). We propose a novel rewiring framework that satisfies all of (i)--(iii) through a locality-aware sequence of rewiring operations. We then discuss a specific instance of such rewiring framework and validate its effectiveness on several real-world benchmarks, showing that it either matches or significantly outperforms existing rewiring approaches. △ Less

Submitted 4 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2306.01869 [pdf, other]

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

Authors: Ameya Velingker, Maximilian Vötsch, David P. Woodruff, Samson Zhou

Abstract: We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and… ▽ More We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2303.06147 [pdf, other]

Exphormer: Sparse Transformers for Graphs

Authors: Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J. Sutherland, Ali Kemal Sinop

Abstract: Graph transformers have emerged as a promising architecture for a variety of graph learning and representation tasks. Despite their successes, though, it remains challenging to scale graph transformers to large graphs while maintaining accuracy competitive with message-passing networks. In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. Exphor… ▽ More Graph transformers have emerged as a promising architecture for a variety of graph learning and representation tasks. Despite their successes, though, it remains challenging to scale graph transformers to large graphs while maintaining accuracy competitive with message-passing networks. In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers. Exphormer consists of a sparse attention mechanism based on two mechanisms: virtual global nodes and expander graphs, whose mathematical characteristics, such as spectral expansion, pseduorandomness, and sparsity, yield graph transformers with complexity only linear in the size of the graph, while allowing us to prove desirable theoretical properties of the resulting transformer models. We show that incorporating Exphormer into the recently-proposed GraphGPS framework produces models with competitive empirical results on a wide variety of graph datasets, including state-of-the-art results on three datasets. We also show that Exphormer can scale to datasets on larger graphs than shown in previous graph transformer architectures. Code can be found at \url{https://github.com/hamed1375/Exphormer}. △ Less

Submitted 24 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2206.11941 [pdf, other]

Affinity-Aware Graph Networks

Authors: Ameya Velingker, Ali Kemal Sinop, Ira Ktena, Petar Veličković, Sreenivas Gollapudi

Abstract: Graph Neural Networks (GNNs) have emerged as a powerful technique for learning on relational data. Owing to the relatively limited number of message passing steps they perform -- and hence a smaller receptive field -- there has been significant interest in improving their expressivity by incorporating structural aspects of the underlying graph. In this paper, we explore the use of affinity measure… ▽ More Graph Neural Networks (GNNs) have emerged as a powerful technique for learning on relational data. Owing to the relatively limited number of message passing steps they perform -- and hence a smaller receptive field -- there has been significant interest in improving their expressivity by incorporating structural aspects of the underlying graph. In this paper, we explore the use of affinity measures as features in graph neural networks, in particular measures arising from random walks, including effective resistance, hitting and commute times. We propose message passing networks based on these features and evaluate their performance on a variety of node and graph property prediction tasks. Our architecture has lower computational complexity, while our features are invariant to the permutations of the underlying graph. The measures we compute allow the network to exploit the connectivity properties of the graph, thereby allowing us to outperform relevant benchmarks for a wide variety of tasks, often with significantly fewer message passing steps. On one of the largest publicly available graph regression datasets, OGB-LSC-PCQM4Mv1, we obtain the best known single-model validation MAE at the time of writing. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2112.03548 [pdf, ps, other]

Private Robust Estimation by Stabilizing Convex Relaxations

Authors: Pravesh K. Kothari, Pasin Manurangsi, Ameya Velingker

Abstract: We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifi… ▽ More We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2106.13078 [pdf, other]

Linear Space Streaming Lower Bounds for Approximating CSPs

Authors: Chi-Ning Chou, Alexander Golovnev, Madhu Sudan, Ameya Velingker, Santhoshini Velusamy

Abstract: We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Ω(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any imp… ▽ More We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on $n$ variables taking values in $\{0,\ldots,q-1\}$, we prove that improving over the trivial approximability by a factor of $q$ requires $Ω(n)$ space even on instances with $O(n)$ constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires $Ω(n)$ space. The key technical core is an optimal, $q^{-(k-1)}$-inapproximability for the \textsf{Max $k$-LIN}-$\bmod\; q$ problem, which is the Max CSP problem where every constraint is given by a system of $k-1$ linear equations $\bmod\; q$ over $k$ variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of \textsf{Max $k$-LIN}-$\bmod\; q$ with ${k=q=2}$. For general CSPs in the streaming setting, prior results only yielded $Ω(\sqrt{n})$ space bounds. In particular no linear space lower bound was known for an approximation factor less than $1/2$ for {\em any} CSP. Extending the work of Kapralov and Krachun to \textsf{Max $k$-LIN}-$\bmod\; q$ to $k>2$ and $q>2$ (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work. △ Less

Submitted 24 April, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

arXiv:2003.09756 [pdf, ps, other]

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

Authors: Michael Kapralov, Navid Nouri, Ilya Razenshteyn, Ameya Velingker, Amir Zandieh

Abstract: Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes,… ▽ More Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes, such as the Gaussian kernel and Matern kernel. In this paper, we introduce a simple weighted version of random binning features and show that the corresponding kernel function generates Gaussian processes of any desired smoothness. We show that our weighted random binning features provide a spectral approximation to the corresponding kernel matrix, leading to efficient algorithms for kernel ridge regression. Experiments on large scale regression datasets show that our method outperforms the accuracy of random Fourier features method. △ Less

Submitted 21 March, 2020; originally announced March 2020.

arXiv:2002.01919 [pdf, other]

Pure Differentially Private Summation from Anonymous Messages

Authors: Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, Ameya Velingker

Abstract: The shuffled (aka anonymous) model has recently generated significant interest as a candidate distributed privacy framework with trust assumptions better than the central model but with achievable errors smaller than the local model. We study pure differentially private (DP) protocols in the shuffled model for summation, a basic and widely used primitive: - For binary summation where each of n u… ▽ More The shuffled (aka anonymous) model has recently generated significant interest as a candidate distributed privacy framework with trust assumptions better than the central model but with achievable errors smaller than the local model. We study pure differentially private (DP) protocols in the shuffled model for summation, a basic and widely used primitive: - For binary summation where each of n users holds a bit as an input, we give a pure $ε$-DP protocol for estimating the number of ones held by the users up to an error of $O_ε(1)$, and each user sends $O_ε(\log n)$ messages each of 1 bit. This is the first pure protocol in the shuffled model with error $o(\sqrt{n})$ for constant $ε$. Using this protocol, we give a pure $ε$-DP protocol that performs summation of real numbers in $[0, 1]$ up to an error of $O_ε(1)$, and where each user sends $O_ε(\log^3 n)$ messages each of $O(\log\log n)$ bits. - In contrast, we show that for any pure $ε$-DP protocol for binary summation in the shuffled model having absolute error $n^{0.5-Ω(1)}$, the per user communication has to be at least $Ω_ε(\sqrt{\log n})$ bits. This implies the first separation between the (bounded-communication) multi-message shuffled model and the central model, and the first separation between pure and approximate DP protocols in the shuffled model. To prove our lower bound, we consider (a generalization of) the following question: given $γ$ in $(0, 1)$, what is the smallest m for which there are two random variables $X^0, X^1$ supported on $\{0, \dots ,m\}$ such that (i) the total variation distance between $X^0$ and $X^1$ is at least $1-γ$, and (ii) the moment generating functions of $X^0$ and $X^1$ are within a constant factor of each other everywhere? We show that the answer is $m = Θ(\sqrt{\log(1/γ)})$. △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: 40 pages, 3 figures

arXiv:1909.11073 [pdf, ps, other]

Private Aggregation from Fewer Anonymous Messages

Authors: Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, Ameya Velingker

Abstract: Consider the setup where $n$ parties are each given a number $x_i \in \mathbb{F}_q$ and the goal is to compute the sum $\sum_i x_i$ in a secure fashion and with as little communication as possible. We study this problem in the anonymized model of Ishai et al. (FOCS 2006) where each party may broadcast anonymous messages on an insecure channel. We present a new analysis of the one-round "split an… ▽ More Consider the setup where $n$ parties are each given a number $x_i \in \mathbb{F}_q$ and the goal is to compute the sum $\sum_i x_i$ in a secure fashion and with as little communication as possible. We study this problem in the anonymized model of Ishai et al. (FOCS 2006) where each party may broadcast anonymous messages on an insecure channel. We present a new analysis of the one-round "split and mix" protocol of Ishai et al. In order to achieve the same security parameter, our analysis reduces the required number of messages by a $Θ(\log n)$ multiplicative factor. We complement our positive result with lower bounds showing that the dependence of the number of messages on the domain size, the number of parties, and the security parameter is essentially tight. Using a reduction of Balle et al. (2019), our improved analysis of the protocol of Ishai et al. yields, in the same model, an $\left(\varepsilon, δ\right)$-differentially private protocol for aggregation that, for any constant $\varepsilon > 0$ and any $δ= \frac{1}{\mathrm{poly}(n)}$, incurs only a constant error and requires only a constant number of messages per party. Previously, such a protocol was known only for $Ω(\log n)$ messages per party. △ Less

Submitted 15 October, 2019; v1 submitted 24 September, 2019; originally announced September 2019.

Comments: 31 pages; 1 table

arXiv:1909.01410 [pdf, ps, other]

Oblivious Sketching of High-Degree Polynomial Kernels

Authors: Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, Amir Zandieh

Abstract: Kernel methods are fundamental tools in machine learning that allow detection of non-linear dependencies between data without explicitly constructing feature vectors in high dimensional spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at least) quadratic time, a… ▽ More Kernel methods are fundamental tools in machine learning that allow detection of non-linear dependencies between data without explicitly constructing feature vectors in high dimensional spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at least) quadratic time, as kernel matrices are usually dense. Some methods for speeding up kernel linear algebra are known, but they all invariably take time exponential in either the dimension of the input point set (e.g., fast multipole methods suffer from the curse of dimensionality) or in the degree of the kernel function. Oblivious sketching has emerged as a powerful approach to speeding up numerical linear algebra over the past decade, but our understanding of oblivious sketching solutions for kernel matrices has remained quite limited, suffering from the aforementioned exponential dependence on input parameters. Our main contribution is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly. This leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function, as well as the first oblivious sketch for the Gaussian kernel on bounded datasets that does not suffer from an exponential dependence on the dimensionality of input data points. △ Less

Submitted 22 December, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

arXiv:1908.11358 [pdf, other]

On the Power of Multiple Anonymous Messages

Authors: Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, Ameya Velingker

Abstract: An exciting new development in differential privacy is the shuffled model, in which an anonymous channel enables non-interactive, differentially private protocols with error much smaller than what is possible in the local model, while relying on weaker trust assumptions than in the central model. In this paper, we study basic counting problems in the shuffled model and establish separations betwee… ▽ More An exciting new development in differential privacy is the shuffled model, in which an anonymous channel enables non-interactive, differentially private protocols with error much smaller than what is possible in the local model, while relying on weaker trust assumptions than in the central model. In this paper, we study basic counting problems in the shuffled model and establish separations between the error that can be achieved in the single-message shuffled model and in the shuffled model with multiple messages per user. For the problem of frequency estimation for $n$ users and a domain of size $B$, we obtain: - A nearly tight lower bound of $\tildeΩ( \min(\sqrt[4]{n}, \sqrt{B}))$ on the error in the single-message shuffled model. This implies that the protocols obtained from the amplification via shuffling work of Erlingsson et al. (SODA 2019) and Balle et al. (Crypto 2019) are essentially optimal for single-message protocols. A key ingredient in the proof is a lower bound on the error of locally-private frequency estimation in the low-privacy (aka high $ε$) regime. - Protocols in the multi-message shuffled model with $poly(\log{B}, \log{n})$ bits of communication per user and $poly\log{B}$ error, which provide an exponential improvement on the error compared to what is possible with single-message algorithms. For the related selection problem on a domain of size $B$, we prove: - A nearly tight lower bound of $Ω(B)$ on the number of users in the single-message shuffled model. This significantly improves on the $Ω(B^{1/17})$ lower bound obtained by Cheu et al. (Eurocrypt 2019), and when combined with their $\tilde{O}(\sqrt{B})$-error multi-message protocol, implies the first separation between single-message and multi-message protocols for this problem. △ Less

Submitted 19 May, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

Comments: 70 pages, 2 figures, 3 tables

arXiv:1906.08320 [pdf, other]

Scalable and Differentially Private Distributed Aggregation in the Shuffled Model

Authors: Badih Ghazi, Rasmus Pagh, Ameya Velingker

Abstract: Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all comm… ▽ More Federated learning promises to make machine learning feasible on distributed, private datasets by implementing gradient descent using secure aggregation methods. The idea is to compute a global weight update without revealing the contributions of individual users. Current practical protocols for secure aggregation work in an "honest but curious" setting where a curious adversary observing all communication to and from the server cannot learn any private information assuming the server is honest and follows the protocol. A more scalable and robust primitive for privacy-preserving protocols is shuffling of user data, so as to hide the origin of each data item. Highly scalable and secure protocols for shuffling, so-called mixnets, have been proposed as a primitive for privacy-preserving analytics in the Encode-Shuffle-Analyze framework by Bittau et al., which was later analytically studied by Erlingsson et al. and Cheu et al.. The recent papers by Cheu et al., and Balle et al. have given protocols for secure aggregation that achieve differential privacy guarantees in this "shuffled model". Their protocols come at a cost, though: Either the expected aggregation error or the amount of communication per user scales as a polynomial $n^{Ω(1)}$ in the number of users $n$. In this paper we propose simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in $n$. Our new technique is a conceptual "invisibility cloak" that makes users' data almost indistinguishable from random noise while introducing zero distortion on the sum. △ Less

Submitted 2 December, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: 17 pages, 1 figure, 1 table, 2 algorithms

arXiv:1902.10633 [pdf, ps, other]

Dimension-independent Sparse Fourier Transform

Authors: Michael Kapralov, Ameya Velingker, Amir Zandieh

Abstract: The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm. One remarkable feature of FFT is the fact that its runtime depends only on the size $N$ of the input vector, but not on the dimensionality of the input domain: FFT runs in time $O(N\log N)$ irrespective of whether the DF… ▽ More The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm. One remarkable feature of FFT is the fact that its runtime depends only on the size $N$ of the input vector, but not on the dimensionality of the input domain: FFT runs in time $O(N\log N)$ irrespective of whether the DFT in question is on $\mathbb{Z}_N$ or $\mathbb{Z}_n^d$ for some $d>1$, where $N=n^d$. The state of the art for Sparse FFT, i.e. the problem of computing the DFT of a signal that has at most $k$ nonzeros in Fourier domain, is very different: all current techniques for sublinear time computation of Sparse FFT incur an exponential dependence on the dimension $d$ in the runtime. In this paper we give the first algorithm that computes the DFT of a $k$-sparse signal in time $\text{poly}(k, \log N)$ in any dimension $d$, avoiding the curse of dimensionality inherent in all previously known techniques. Our main tool is a new class of filters that we refer to as adaptive aliasing filters: these filters allow isolating frequencies of a $k$-Fourier sparse signal using $O(k)$ samples in time domain and $O(k\log N)$ runtime per frequency, in any dimension $d$. We also investigate natural average case models of the input signal: (1) worst case support in Fourier domain with randomized coefficients and (2) random locations in Fourier domain with worst case coefficients. Our techniques lead to an $\widetilde O(k^2)$ time algorithm for the former and an $\widetilde O(k)$ time algorithm for the latter. △ Less

Submitted 27 February, 2019; originally announced February 2019.

arXiv:1812.08723 [pdf, ps, other]

A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms

Authors: Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, Amir Zandieh

Abstract: Reconstructing continuous signals from a small number of discrete samples is a fundamental problem across science and engineering. In practice, we are often interested in signals with 'simple' Fourier structure, such as bandlimited, multiband, and Fourier sparse signals. More broadly, any prior knowledge about a signal's Fourier power spectrum can constrain its complexity. Intuitively, signals wit… ▽ More Reconstructing continuous signals from a small number of discrete samples is a fundamental problem across science and engineering. In practice, we are often interested in signals with 'simple' Fourier structure, such as bandlimited, multiband, and Fourier sparse signals. More broadly, any prior knowledge about a signal's Fourier power spectrum can constrain its complexity. Intuitively, signals with more highly constrained Fourier structure require fewer samples to reconstruct. We formalize this intuition by showing that, roughly, a continuous signal from a given class can be approximately reconstructed using a number of samples proportional to the *statistical dimension* of the allowed power spectrum of that class. Further, in nearly all settings, this natural measure tightly characterizes the sample complexity of signal reconstruction. Surprisingly, we also show that, up to logarithmic factors, a universal non-uniform sampling strategy can achieve this optimal complexity for *any class of signals*. We present a simple and efficient algorithm for recovering a signal from the samples taken. For bandlimited and sparse signals, our method matches the state-of-the-art. At the same time, it gives the first computationally and sample efficient solution to a broad range of problems, including multiband signal reconstruction and kriging and Gaussian process regression tasks in one dimension. Our work is based on a novel connection between randomized linear algebra and signal reconstruction with constrained Fourier structure. We extend tools based on statistical leverage score sampling and column-based matrix reconstruction to the approximation of continuous linear operators that arise in signal reconstruction. We believe that these extensions are of independent interest and serve as a foundation for tackling a broad range of continuous time problems using randomized methods. △ Less

Submitted 20 December, 2018; originally announced December 2018.

arXiv:1804.09893 [pdf, other]

Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

Authors: Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, Amir Zandieh

Abstract: Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation po… ▽ More Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation point of view, give tight bounds on the number of Fourier features required to achieve a spectral approximation, and show how spectral matrix approximation bounds imply statistical guarantees for kernel ridge regression. Qualitatively, our results are twofold: on the one hand, we show that random Fourier feature approximation can provably speed up kernel ridge regression under reasonable assumptions. At the same time, we show that the method is suboptimal, and sampling from a modified distribution in Fourier space, given by the leverage function of the kernel, yields provably better performance. We study this optimal sampling distribution for the Gaussian kernel, achieving a nearly complete characterization for the case of low-dimensional bounded datasets. Based on this characterization, we propose an efficient sampling scheme with guarantees superior to random Fourier features in this regime. △ Less

Submitted 21 May, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

Comments: An extended abstract of this work appears in the Proceedings of the 34th International Conference on Machine Learning (ICML 2017)

arXiv:1605.08792 [pdf, other]

Bridging the Capacity Gap Between Interactive and One-Way Communication

Authors: Bernhard Haeupler, Ameya Velingker

Abstract: We study the communication rate of coding schemes for interactive communication that transform any two-party interactive protocol into a protocol that is robust to noise. Recently, Haeupler (FOCS '14) showed that if an $ε> 0$ fraction of transmissions are corrupted, adversarially or randomly, then it is possible to achieve a communication rate of $1 - \widetilde{O}(\sqrtε)$. Furthermore, Haeuple… ▽ More We study the communication rate of coding schemes for interactive communication that transform any two-party interactive protocol into a protocol that is robust to noise. Recently, Haeupler (FOCS '14) showed that if an $ε> 0$ fraction of transmissions are corrupted, adversarially or randomly, then it is possible to achieve a communication rate of $1 - \widetilde{O}(\sqrtε)$. Furthermore, Haeupler conjectured that this rate is optimal for general input protocols. This stands in contrast to the classical setting of one-way communication in which error-correcting codes are known to achieve an optimal communication rate of $1 - Θ(H(ε)) = 1 - \widetildeΘ(ε)$. In this work, we show that the quadratically smaller rate loss of the one-way setting can also be achieved in interactive coding schemes for a very natural class of input protocols. We introduce the notion of average message length, or the average number of bits a party sends before receiving a reply, as a natural parameter for measuring the level of interactivity in a protocol. Moreover, we show that any protocol with average message length $\ell = Ω(\mathrm{poly}(1/ε))$ can be simulated by a protocol with optimal communication rate $1 - Θ(H(ε))$ over an oblivious adversarial channel with error fraction $ε$. Furthermore, under the additional assumption of access to public shared randomness, the optimal communication rate is achieved ratelessly, i.e., the communication rate adapts automatically to the actual error rate $ε$ without having to specify it in advance. This shows that the capacity gap between one-way and interactive communication can be bridged even for very small (constant in $ε$) average message lengths, which are likely to be found in many applications. △ Less

Submitted 27 May, 2016; originally announced May 2016.

arXiv:1502.08048 [pdf, other]

Approximating Nearest Neighbor Distances

Authors: Michael B. Cohen, Brittany Terese Fasy, Gary L. Miller, Amir Nayyeri, Donald R. Sheehy, Ameya Velingker

Abstract: Several researchers proposed using non-Euclidean metrics on point sets in Euclidean space for clustering noisy data. Almost always, a distance function is desired that recognizes the closeness of the points in the same cluster, even if the Euclidean cluster diameter is large. Therefore, it is preferred to assign smaller costs to the paths that stay close to the input points. In this paper, we co… ▽ More Several researchers proposed using non-Euclidean metrics on point sets in Euclidean space for clustering noisy data. Almost always, a distance function is desired that recognizes the closeness of the points in the same cluster, even if the Euclidean cluster diameter is large. Therefore, it is preferred to assign smaller costs to the paths that stay close to the input points. In this paper, we consider the most natural metric with this property, which we call the nearest neighbor metric. Given a point set P and a path $γ$, our metric charges each point of $γ$ with its distance to P. The total charge along $γ$ determines its nearest neighbor length, which is formally defined as the integral of the distance to the input points along the curve. We describe a $(3+\varepsilon)$-approximation algorithm and a $(1+\varepsilon)$-approximation algorithm to compute the nearest neighbor metric. Both approximation algorithms work in near-linear time. The former uses shortest paths on a sparse graph using only the input points. The latter uses a sparse sample of the ambient space, to find good approximate geodesic paths. △ Less

Submitted 27 February, 2015; originally announced February 2015.

Comments: corrected author name

arXiv:1502.07410 [pdf, other]

Towards Constructing Ramanujan Graphs Using Shift Lifts

Authors: Karthekeyan Chandrasekaran, Ameya Velingker

Abstract: In a breakthrough work, Marcus-Spielman-Srivastava recently showed that every $d$-regular bipartite Ramanujan graph has a 2-lift that is also $d$-regular bipartite Ramanujan. As a consequence, a straightforward iterative brute-force search algorithm leads to the construction of a $d$-regular bipartite Ramanujan graph on $N$ vertices in time $2^{O(dN)}$. Shift $k$-lifts studied by Agarwal-Kolla-Mad… ▽ More In a breakthrough work, Marcus-Spielman-Srivastava recently showed that every $d$-regular bipartite Ramanujan graph has a 2-lift that is also $d$-regular bipartite Ramanujan. As a consequence, a straightforward iterative brute-force search algorithm leads to the construction of a $d$-regular bipartite Ramanujan graph on $N$ vertices in time $2^{O(dN)}$. Shift $k$-lifts studied by Agarwal-Kolla-Madan lead to a natural approach for constructing Ramanujan graphs more efficiently. The number of possible shift $k$-lifts of a $d$-regular $n$-vertex graph is $k^{nd/2}$. Suppose the following holds for $k=2^{Ω(n)}$: There exists a shift $k$-lift that maintains the Ramanujan property of $d$-regular bipartite graphs on $n$ vertices for all $n$. (*) Then, by performing a similar brute-force search algorithm, one would be able to construct an $N$-vertex bipartite Ramanujan graph in time $2^{O(d\,log^2 N)}$. Furthermore, if (*) holds for all $k \geq 2$, then one would obtain an algorithm that runs in $\mathrm{poly}_d(N)$ time. In this work, we take a first step towards proving (*) by showing the existence of shift $k$-lifts that preserve the Ramanujan property in $d$-regular bipartite graphs for $k=3,4$. △ Less

Submitted 31 August, 2015; v1 submitted 25 February, 2015; originally announced February 2015.

arXiv:1411.6993 [pdf, ps, other]

An Entropy Sumset Inequality and Polynomially Fast Convergence to Shannon Capacity Over All Alphabets

Authors: Venkatesan Guruswami, Ameya Velingker

Abstract: We prove a lower estimate on the increase in entropy when two copies of a conditional random variable $X | Y$, with $X$ supported on $\mathbb{Z}_q=\{0,1,\dots,q-1\}$ for prime $q$, are summed modulo $q$. Specifically, given two i.i.d copies $(X_1,Y_1)$ and $(X_2,Y_2)$ of a pair of random variables $(X,Y)$, with $X$ taking values in $\mathbb{Z}_q$, we show \[ H(X_1 + X_2 \mid Y_1, Y_2) - H(X|Y) \ge… ▽ More We prove a lower estimate on the increase in entropy when two copies of a conditional random variable $X | Y$, with $X$ supported on $\mathbb{Z}_q=\{0,1,\dots,q-1\}$ for prime $q$, are summed modulo $q$. Specifically, given two i.i.d copies $(X_1,Y_1)$ and $(X_2,Y_2)$ of a pair of random variables $(X,Y)$, with $X$ taking values in $\mathbb{Z}_q$, we show \[ H(X_1 + X_2 \mid Y_1, Y_2) - H(X|Y) \ge α(q) \cdot H(X|Y) (1-H(X|Y)) \] for some $α(q) > 0$, where $H(\cdot)$ is the normalized (by factor $\log_2 q$) entropy. Our motivation is an effective analysis of the finite-length behavior of polar codes, and the assumption of $q$ being prime is necessary. For $X$ supported on infinite groups without a finite subgroup and no conditioning, a sumset inequality for the absolute increase in (unnormalized) entropy was shown by Tao (2010). We use our sumset inequality to analyze Arıkan's construction of polar codes and prove that for any $q$-ary source $X$, where $q$ is any fixed prime, and any $ε> 0$, polar codes allow {\em efficient} data compression of $N$ i.i.d. copies of $X$ into $(H(X)+ε)N$ $q$-ary symbols, as soon as $N$ is polynomially large in $1/ε$. We can get capacity-achieving source codes with similar guarantees for composite alphabets, by factoring $q$ into primes and combining different polar codes for each prime in factorization. A consequence of our result for noisy channel coding is that for {\em all} discrete memoryless channels, there are explicit codes enabling reliable communication within $ε> 0$ of the symmetric Shannon capacity for a block length and decoding complexity bounded by a polynomial in $1/ε$. The result was previously shown for the special case of binary input channels (Guruswami-Xia '13 and Hassani-Alishahi-Urbanke '13), and this work extends the result to channels over any alphabet. △ Less

Submitted 25 November, 2014; originally announced November 2014.

arXiv:1304.0524 [pdf, other]

A Fast Algorithm for Well-Spaced Points and Approximate Delaunay Graphs

Authors: Gary L. Miller, Donald R. Sheehy, Ameya Velingker

Abstract: We present a new algorithm that produces a well-spaced superset of points conforming to a given input set in any dimension with guaranteed optimal output size. We also provide an approximate Delaunay graph on the output points. Our algorithm runs in expected time $O(2^{O(d)}(n\log n + m))$, where $n$ is the input size, $m$ is the output point set size, and $d$ is the ambient dimension. The constan… ▽ More We present a new algorithm that produces a well-spaced superset of points conforming to a given input set in any dimension with guaranteed optimal output size. We also provide an approximate Delaunay graph on the output points. Our algorithm runs in expected time $O(2^{O(d)}(n\log n + m))$, where $n$ is the input size, $m$ is the output point set size, and $d$ is the ambient dimension. The constants only depend on the desired element quality bounds. To gain this new efficiency, the algorithm approximately maintains the Voronoi diagram of the current set of points by storing a superset of the Delaunay neighbors of each point. By retaining quality of the Voronoi diagram and avoiding the storage of the full Voronoi diagram, a simple exponential dependence on $d$ is obtained in the running time. Thus, if one only wants the approximate neighbors structure of a refined Delaunay mesh conforming to a set of input points, the algorithm will return a size $2^{O(d)}m$ graph in $2^{O(d)}(n\log n + m)$ expected time. If $m$ is superlinear in $n$, then we can produce a hierarchically well-spaced superset of size $2^{O(d)}n$ in $2^{O(d)}n\log n$ expected time. △ Less

Submitted 1 April, 2013; originally announced April 2013.

Comments: Full version

ACM Class: F.2.2

arXiv:1207.1140 [pdf, ps, other]

Restricted Isometry of Fourier Matrices and List Decodability of Random Linear Codes

Authors: Mahdi Cheraghchi, Venkatesan Guruswami, Ameya Velingker

Abstract: We prove that a random linear code over F_q, with probability arbitrarily close to 1, is list decodable at radius (1-1/q-ε) with list size L=O(1/ε^2) and rate R=Ω_q(ε^2/(log^3(1/ε))). Up to the polylogarithmic factor in (1/ε) and constant factors depending on q, this matches the lower bound L=Ω_q(1/ε^2) for the list size and upper bound R=O_q(ε^2) for the rate. Previously only existence (and not a… ▽ More We prove that a random linear code over F_q, with probability arbitrarily close to 1, is list decodable at radius (1-1/q-ε) with list size L=O(1/ε^2) and rate R=Ω_q(ε^2/(log^3(1/ε))). Up to the polylogarithmic factor in (1/ε) and constant factors depending on q, this matches the lower bound L=Ω_q(1/ε^2) for the list size and upper bound R=O_q(ε^2) for the rate. Previously only existence (and not abundance) of such codes was known for the special case q=2 (Guruswami, Håstad, Sudan and Zuckerman, 2002). In order to obtain our result, we employ a relaxed version of the well known Johnson bound on list decoding that translates the average Hamming distance between codewords to list decoding guarantees. We furthermore prove that the desired average-distance guarantees hold for a code provided that a natural complex matrix encoding the codewords satisfies the Restricted Isometry Property with respect to the Euclidean norm (RIP-2). For the case of random binary linear codes, this matrix coincides with a random submatrix of the Hadamard-Walsh transform matrix that is well studied in the compressed sensing literature. Finally, we improve the analysis of Rudelson and Vershynin (2008) on the number of random frequency samples required for exact reconstruction of k-sparse signals of length N. Specifically, we improve the number of samples from O(k log(N) log^2(k) (log k + loglog N)) to O(k log(N) log^3(k)). The proof involves bounding the expected supremum of a related Gaussian process by using an improved analysis of the metric defined by the process. This improvement is crucial for our application in list decoding. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Preliminary full version

Showing 1–22 of 22 results for author: Velingker, A