subscribe to arXiv mailings

Closing the Computational-Query Depth Gap in Parallel Stochastic Convex Optimization

Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

Abstract: We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state… ▽ More We develop a new parallel algorithm for minimizing Lipschitz, convex functions with a stochastic subgradient oracle. The total number of queries made and the query depth, i.e., the number of parallel rounds of queries, match the prior state-of-the-art, [CJJLLST23], while improving upon the computational depth by a polynomial factor for sufficiently small accuracy. When combined with previous state-of-the-art methods our result closes a gap between the best-known query depth and the best-known computational depth of parallel algorithms. Our method starts with a ball acceleration framework of previous parallel methods, i.e., [CJJJLST20, ACJJS21], which reduce the problem to minimizing a regularized Gaussian convolution of the function constrained to Euclidean balls. By developing and leveraging new stability properties of the Hessian of this induced function, we depart from prior parallel algorithms and reduce these ball-constrained optimization problems to stochastic unconstrained quadratic minimization problems. Although we are unable to prove concentration of the asymmetric matrices that we use to approximate this Hessian, we nevertheless develop an efficient parallel method for solving these quadratics. Interestingly, our algorithms can be improved using fast matrix multiplication and use nearly-linear work if the matrix multiplication exponent is 2. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2403.03905 [pdf, other]

Black-Box $k$-to-$1$-PCA Reductions: Theory and Applications

Authors: Arun Jambulapati, Syamantak Kumar, Jerry Li, Shourya Pandey, Ankit Pensia, Kevin Tian

Abstract: The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box def… ▽ More The $k$-principal component analysis ($k$-PCA) problem is a fundamental algorithmic primitive that is widely-used in data analysis and dimensionality reduction applications. In statistical settings, the goal of $k$-PCA is to identify a top eigenspace of the covariance matrix of a distribution, which we only have black-box access to via samples. Motivated by these settings, we analyze black-box deflation methods as a framework for designing $k$-PCA algorithms, where we model access to the unknown target matrix via a black-box $1$-PCA oracle which returns an approximate top eigenvector, under two popular notions of approximation. Despite being arguably the most natural reduction-based approach to $k$-PCA algorithm design, such black-box methods, which recursively call a $1$-PCA oracle $k$ times, were previously poorly-understood. Our main contribution is significantly sharper bounds on the approximation parameter degradation of deflation methods for $k$-PCA. For a quadratic form notion of approximation we term ePCA (energy PCA), we show deflation methods suffer no parameter loss. For an alternative well-studied approximation notion we term cPCA (correlation PCA), we tightly characterize the parameter regimes where deflation methods are feasible. Moreover, we show that in all feasible regimes, $k$-cPCA deflation algorithms suffer no asymptotic parameter loss for any constant $k$. We apply our framework to obtain state-of-the-art $k$-PCA algorithms robust to dataset contamination, improving prior work in sample complexity by a $\mathsf{poly}(k)$ factor. △ Less

Submitted 11 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13187 [pdf, other]

Testing Calibration in Nearly-Linear Time

Authors: Lunjia Hu, Arun Jambulapati, Kevin Tian, Chutong Yang

Abstract: In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we… ▽ More In the recent literature on machine learning and decision making, calibration has emerged as a desirable and widely-studied statistical property of the outputs of binary prediction models. However, the algorithmic aspects of measuring model calibration have remained relatively less well-explored. Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing. We define the problem of calibration testing from samples where given $n$ draws from a distribution $\mathcal{D}$ on $(predictions, binary outcomes)$, our goal is to distinguish between the case where $\mathcal{D}$ is perfectly calibrated, and the case where $\mathcal{D}$ is $\varepsilon$-far from calibration. We make the simple observation that the empirical smooth calibration linear program can be reformulated as an instance of minimum-cost flow on a highly-structured graph, and design an exact dynamic programming-based solver for it which runs in time $O(n\log^2(n))$, and solves the calibration testing problem information-theoretically optimally in the same time. This improves upon state-of-the-art black-box linear program solvers requiring $Ω(n^ω)$ time, where $ω> 2$ is the exponent of matrix multiplication. We also develop algorithms for tolerant variants of our testing problem improving upon black-box linear program solvers, and give sample complexity lower bounds for alternative calibration measures to the one considered in this work. Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes. △ Less

Submitted 21 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2311.18145 [pdf, ps, other]

Sparsifying generalized linear models

Authors: Arun Jambulapati, James R. Lee, Yang P. Liu, Aaron Sidford

Abstract: We consider the sparsification of sums $F : \mathbb{R}^n \to \mathbb{R}$ where $F(x) = f_1(\langle a_1,x\rangle) + \cdots + f_m(\langle a_m,x\rangle)$ for vectors $a_1,\ldots,a_m \in \mathbb{R}^n$ and functions $f_1,\ldots,f_m : \mathbb{R} \to \mathbb{R}_+$. We show that $(1+\varepsilon)$-approximate sparsifiers of $F$ with support size… ▽ More We consider the sparsification of sums $F : \mathbb{R}^n \to \mathbb{R}$ where $F(x) = f_1(\langle a_1,x\rangle) + \cdots + f_m(\langle a_m,x\rangle)$ for vectors $a_1,\ldots,a_m \in \mathbb{R}^n$ and functions $f_1,\ldots,f_m : \mathbb{R} \to \mathbb{R}_+$. We show that $(1+\varepsilon)$-approximate sparsifiers of $F$ with support size $\frac{n}{\varepsilon^2} (\log \frac{n}{\varepsilon})^{O(1)}$ exist whenever the functions $f_1,\ldots,f_m$ are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each $f_i$ can be evaluated efficiently. Our results generalize the classic case of $\ell_p$ sparsification, where $f_i(z) = |z|^p$, for $p \in (0, 2]$, and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., $f_i(z) = \min\{|z|^p, |z|^2\}$ for $0 < p \leq 2$. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including $\ell_p$ regression for $p \in (1, 2]$ to high accuracy, via solving $(\log n)^{O(1)}$ sparse regression instances with $m \le n(\log n)^{O(1)}$, plus runtime proportional to the number of nonzero entries in the vectors $a_1, \dots, a_m$. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.10886 [pdf, other]

A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

Abstract: We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $ε$-approximate solution using $\widetilde{O}(n ε^{-1/3} + ε^{-2})$ gradient and function evaluations, and $\widetilde{O}(n ε^{-4/3})$ additional runtime. For large $n$, our evaluation complexity is optimal up to pol… ▽ More We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $ε$-approximate solution using $\widetilde{O}(n ε^{-1/3} + ε^{-2})$ gradient and function evaluations, and $\widetilde{O}(n ε^{-4/3})$ additional runtime. For large $n$, our evaluation complexity is optimal up to polylogarithmic factors. In the special case where each $f_i$ is linear -- which corresponds to finding a near-optimal primal strategy in a matrix game -- our method finds an $ε$-approximate solution in runtime $\widetilde{O}(n (d/ε)^{2/3} + nd + dε^{-2})$. For $n>d$ and $ε=1/\sqrt{n}$ this improves over all existing first-order methods. When additionally $d = ω(n^{8/11})$ our runtime also improves over all known interior point methods. Our algorithm combines three novel primitives: (1) A dynamic data structure which enables efficient stochastic gradient estimation in small $\ell_2$ or $\ell_1$ balls. (2) A mirror descent algorithm tailored to our data structure implementing an oracle which minimizes the objective over these balls. (3) A simple ball oracle acceleration framework suitable for non-Euclidean geometry. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2310.18265 [pdf, other]

Structured Semidefinite Programming for Recovering Structured Preconditioners

Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian

Abstract: We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, com… ▽ More We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $ε$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(κ^\star,ε^{-1}))$, where $κ^\star$ is the optimal condition number of the rescaled matrix. We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. Our diagonal preconditioning results improve state-of-the-art runtimes of $Ω(d^{3.5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $Ω(d^ω)$ where $ω> 2.3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: Merge of arXiv:1812.06295 and arXiv:2008.01722

arXiv:2305.09049 [pdf, ps, other]

Sparsifying sums of norms

Authors: Arun Jambulapati, James R. Lee, Yang P. Liu, Aaron Sidford

Abstract: For any norms $N_1,\ldots,N_m$ on $\mathbb{R}^n$ and $N(x) := N_1(x)+\cdots+N_m(x)$, we show there is a sparsified norm $\tilde{N}(x) = w_1 N_1(x) + \cdots + w_m N_m(x)$ such that $|N(x) - \tilde{N}(x)| \leq εN(x)$ for all $x \in \mathbb{R}^n$, where $w_1,\ldots,w_m$ are non-negative weights, of which only $O(ε^{-2} n \log(n/ε) (\log n)^{2.5} )$ are non-zero. Additionally, if $N$ is… ▽ More For any norms $N_1,\ldots,N_m$ on $\mathbb{R}^n$ and $N(x) := N_1(x)+\cdots+N_m(x)$, we show there is a sparsified norm $\tilde{N}(x) = w_1 N_1(x) + \cdots + w_m N_m(x)$ such that $|N(x) - \tilde{N}(x)| \leq εN(x)$ for all $x \in \mathbb{R}^n$, where $w_1,\ldots,w_m$ are non-negative weights, of which only $O(ε^{-2} n \log(n/ε) (\log n)^{2.5} )$ are non-zero. Additionally, if $N$ is $\mathrm{poly}(n)$-equivalent to the Euclidean norm on $\mathbb{R}^n$, then such weights can be found with high probability in time $O(m (\log n)^{O(1)} + \mathrm{poly}(n)) T$, where $T$ is the time required to evaluate a norm $N_i$. This immediately yields analogous statements for sparsifying sums of symmetric submodular functions. More generally, we show how to sparsify sums of $p$th powers of norms when the sum is $p$-uniformly smooth. △ Less

Submitted 30 November, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.08434 [pdf, other]

Linear-Sized Sparsifiers via Near-Linear Time Discrepancy Theory

Authors: Arun Jambulapati, Victor Reis, Kevin Tian

Abstract: Discrepancy theory provides powerful tools for producing higher-quality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancy-theoretic primitives. Our fr… ▽ More Discrepancy theory provides powerful tools for producing higher-quality objects which "beat the union bound" in fundamental settings throughout combinatorics and computer science. However, this quality has often come at the price of more expensive algorithms. We introduce a new framework for bridging this gap, by allowing for the efficient implementation of discrepancy-theoretic primitives. Our framework repeatedly solves regularized optimization problems to low accuracy to approximate the partial coloring method of [Rot17], and simplifies and generalizes recent work of [JSS23] on fast algorithms for Spencer's theorem. In particular, our framework only requires that the discrepancy body of interest has exponentially large Gaussian measure and is expressible as a sublevel set of a symmetric, convex function. We combine this framework with new tools for proving Gaussian measure lower bounds to give improved algorithms for a variety of sparsification and coloring problems. As a first application, we use our framework to obtain an $\widetilde{O}(m \cdot ε^{-3.5})$ time algorithm for constructing an $ε$-approximate spectral sparsifier of an $m$-edge graph, matching the sparsity of [BSS14] up to constant factors and improving upon the $\widetilde{O}(m \cdot ε^{-6.5})$ runtime of [LeeS17]. We further give a state-of-the-art algorithm for constructing graph ultrasparsifiers and an almost-linear time algorithm for constructing linear-sized degree-preserving sparsifiers via discrepancy theory; in the latter case, such sparsifiers were not known to exist previously. We generalize these results to their analogs in sparsifying isotropic sums of positive semidefinite matrices. Finally, to demonstrate the versatility of our technique, we obtain a nearly-input-sparsity time constructive algorithm for Spencer's theorem (where we recover a recent result of [JSS23]). △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2303.07774 [pdf, other]

Testing Causality for High Dimensional Data

Authors: Arun Jambulapati, Hilaf Hasson, Youngsuk Park, Yuyang Wang

Abstract: Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by pr… ▽ More Determining causal relationship between high dimensional observations are among the most important tasks in scientific discoveries. In this paper, we revisited the \emph{linear trace method}, a technique proposed in~\citep{janzing2009telling,zscheischler2011testing} to infer the causal direction between two random variables of high dimensions. We strengthen the existing results significantly by providing an improved tail analysis in addition to extending the results to nonlinear trace functionals with sharper confidence bounds under certain distributional assumptions. We obtain our results by interpreting the trace estimator in the causal regime as a function over random orthogonal matrices, where the concentration of Lipschitz functions over such space could be applied. We additionally propose a novel ridge-regularized variant of the estimator in \cite{zscheischler2011testing}, and give provable bounds relating the ridge-estimated terms to their ground-truth counterparts. We support our theoretical results with encouraging experiments on synthetic datasets, more prominently, under high-dimension low sample size regime. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2301.00457 [pdf, other]

ReSQueing Parallel and Private Stochastic Convex Optimization

Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $ε_{\text{opt}}$ with $d^{1/3}ε_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}ε_{\text{opt}}^{-2/3} + ε_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $ε_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given $n$ samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if $n \gtrsim d ε_{\text{dp}}^{-2}$, $(ε_{\text{dp}}, δ)$-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large $n \gtrsim d^2 ε_{\text{dp}}^{-3}$, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime. △ Less

Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

arXiv:2209.10539 [pdf, ps, other]

Chaining, Group Leverage Score Overestimates, and Fast Spectral Hypergraph Sparsification

Authors: Arun Jambulapati, Yang P. Liu, Aaron Sidford

Abstract: We present an algorithm that given any $n$-vertex, $m$-edge, rank $r$ hypergraph constructs a spectral sparsifier with $O(n \varepsilon^{-2} \log n \log r)$ hyperedges in nearly-linear $\widetilde{O}(mr)$ time. This improves in both size and efficiency over a line of work (Bansal-Svensson-Trevisan 2019, Kapralov-Krauthgamer-Tardos-Yoshida 2021) for which the previous best size was… ▽ More We present an algorithm that given any $n$-vertex, $m$-edge, rank $r$ hypergraph constructs a spectral sparsifier with $O(n \varepsilon^{-2} \log n \log r)$ hyperedges in nearly-linear $\widetilde{O}(mr)$ time. This improves in both size and efficiency over a line of work (Bansal-Svensson-Trevisan 2019, Kapralov-Krauthgamer-Tardos-Yoshida 2021) for which the previous best size was $O(\min\{n \varepsilon^{-4} \log^3 n,nr^3 \varepsilon^{-2} \log n\})$ and runtime was $\widetilde{O}(mr + n^{O(1)})$. Independent Result: In an independent work, Lee (Lee 2022) also shows how to compute a spectral hypergraph sparsifier with $O(n \varepsilon^{-2} \log n \log r)$ hyperedges. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.11644 [pdf, ps, other]

A Slightly Improved Bound for the KLS Constant

Authors: Arun Jambulapati, Yin Tat Lee, Santosh S. Vempala

Abstract: We refine the recent breakthrough technique of Klartag and Lehec to obtain an improved polylogarithmic bound for the KLS constant. We refine the recent breakthrough technique of Klartag and Lehec to obtain an improved polylogarithmic bound for the KLS constant. △ Less

Submitted 6 October, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: minor revision fixing typos

arXiv:2206.08627 [pdf, other]

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

Abstract: The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal… ▽ More The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing $\max_y f(x,y)$ where $f$ is convex in $x$ and strongly-concave in $y$, we improve on the best known (Catalyst-based) bound by a logarithmic factor. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: Accepted at ICML'22

arXiv:2205.15371 [pdf, other]

Optimal and Adaptive Monteiro-Svaiter Acceleration

Authors: Yair Carmon, Danielle Hausler, Arun Jambulapati, Yujia Jin, Aaron Sidford

Abstract: We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem para… ▽ More We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second- or first-order method by solving linear systems or applying MinRes, respectively. On logistic regression our method outperforms previous second-order momentum methods, but under-performs Newton's method; simply iterating our first-order adaptive subproblem solver performs comparably to L-BFGS. △ Less

Submitted 28 November, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.06167 [pdf, ps, other]

Optimal Methods for Higher-Order Smooth Monotone Variational Inequalities

Authors: Deeksha Adil, Brian Bullins, Arun Jambulapati, Sushant Sachdeva

Abstract: In this work, we present new simple and optimal algorithms for solving the variational inequality (VI) problem for $p^{th}$-order smooth, monotone operators -- a problem that generalizes convex optimization and saddle-point problems. Recent works (Bullins and Lai (2020), Lin and Jordan (2021), Jiang and Mokhtari (2022)) present methods that achieve a rate of $\tilde{O}(ε^{-2/(p+1)})$ for… ▽ More In this work, we present new simple and optimal algorithms for solving the variational inequality (VI) problem for $p^{th}$-order smooth, monotone operators -- a problem that generalizes convex optimization and saddle-point problems. Recent works (Bullins and Lai (2020), Lin and Jordan (2021), Jiang and Mokhtari (2022)) present methods that achieve a rate of $\tilde{O}(ε^{-2/(p+1)})$ for $p\geq 1$, extending results by (Nemirovski (2004)) and (Monteiro and Svaiter (2012)) for $p=1,2$. A drawback to these approaches, however, is their reliance on a line search scheme. We provide the first $p^{\textrm{th}}$-order method that achieves a rate of $O(ε^{-2/(p+1)}).$ Our method does not rely on a line search routine, thereby improving upon previous rates by a logarithmic factor. Building on the Mirror Prox method of Nemirovski (2004), our algorithm works even in the constrained, non-Euclidean setting. Furthermore, we prove the optimality of our algorithm by constructing matching lower bounds. These are the first lower bounds for smooth MVIs beyond convex optimization for $p > 1$. This establishes a separation between solving smooth MVIs and smooth convex optimization, and settles the oracle complexity of solving $p^{\textrm{th}}$-order smooth MVIs. △ Less

Submitted 31 May, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: 21 Pages

arXiv:2204.12721 [pdf, other]

Regularized Box-Simplex Games and Dynamic Decremental Bipartite Matching

Authors: Arun Jambulapati, Yujia Jin, Aaron Sidford, Kevin Tian

Abstract: Box-simplex games are a family of bilinear minimax objectives which encapsulate graph-structured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient near-linear time, high-accuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool i… ▽ More Box-simplex games are a family of bilinear minimax objectives which encapsulate graph-structured problems such as maximum flow [She17], optimal transport [JST19], and bipartite matching [AJJ+22]. We develop efficient near-linear time, high-accuracy solvers for regularized variants of these games. Beyond the immediate applications of such solvers for computing Sinkhorn distances, a prominent tool in machine learning, we show that these solvers can be used to obtain improved running times for maintaining a (fractional) $ε$-approximate maximum matching in a dynamic decremental bipartite graph against an adaptive adversary. We give a generic framework which reduces this dynamic matching problem to solving regularized graph-structured optimization problems to high accuracy. Through our reduction framework, our regularized box-simplex game solver implies a new algorithm for dynamic decremental bipartite matching in total time $\tilde{O}(m \cdot ε^{-3})$, from an initial graph with $m$ edges and $n$ nodes. We further show how to use recent advances in flow optimization [CKL+22] to improve our runtime to $m^{1 + o(1)} \cdot ε^{-2}$, thereby demonstrating the versatility of our reduction-based approach. These results improve upon the previous best runtime of $\tilde{O}(m \cdot ε^{-4})$ [BGS20] and illustrate the utility of using regularized optimization problem solvers for designing dynamic algorithms. △ Less

Submitted 13 June, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: Accepted at ICALP'22

arXiv:2112.00722 [pdf, ps, other]

Faster Maxflow via Improved Dynamic Spectral Vertex Sparsifiers

Authors: Jan van den Brand, Yu Gao, Arun Jambulapati, Yin Tat Lee, Yang P. Liu, Richard Peng, Aaron Sidford

Abstract: We make several advances broadly related to the maintenance of electrical flows in weighted graphs undergoing dynamic resistance updates, including: 1. More efficient dynamic spectral vertex sparsification, achieved by faster length estimation of random walks in weighted graphs using Morris counters [Morris 1978, Nelson-Yu 2020]. 2. A direct reduction from detecting edges with large energy in… ▽ More We make several advances broadly related to the maintenance of electrical flows in weighted graphs undergoing dynamic resistance updates, including: 1. More efficient dynamic spectral vertex sparsification, achieved by faster length estimation of random walks in weighted graphs using Morris counters [Morris 1978, Nelson-Yu 2020]. 2. A direct reduction from detecting edges with large energy in dynamic electric flows to dynamic spectral vertex sparsifiers. 3. A procedure for turning algorithms for estimating a sequence of vectors under updates from an oblivious adversary to one that tolerates adaptive adversaries via the Gaussian-mechanism from differential privacy. Combining these pieces with modifications to prior robust interior point frameworks gives an algorithm that on graphs with $m$ edges computes a mincost flow with edge costs and capacities in $[1, U]$ in time $\widetilde{O}(m^{3/2-1/58} \log^2 U)$. In prior and independent work, [Axiotis-Mądry-Vladu FOCS 2021] also obtained an improved algorithm for sparse mincost flows on capacitated graphs. Our algorithm implies a $\widetilde{O}(m^{3/2-1/58} \log U)$ time maxflow algorithm, improving over the $\widetilde{O}(m^{3/2-1/328}\log U)$ time maxflow algorithm of [Gao-Liu-Peng FOCS 2021]. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: 63 pages

arXiv:2111.01848 [pdf, ps, other]

Improved Iteration Complexities for Overconstrained $p$-Norm Regression

Authors: Arun Jambulapati, Yang P. Liu, Aaron Sidford

Abstract: In this paper we obtain improved iteration complexities for solving $\ell_p$ regression. We provide methods which given any full-rank $\mathbf{A} \in \mathbb{R}^{n \times d}$ with $n \geq d$, $b \in \mathbb{R}^n$, and $p \geq 2$ solve $\min_{x \in \mathbb{R}^d} \left\|\mathbf{A} x - b\right\|_p$ to high precision in time dominated by that of solving $\widetilde{O}_p(d^{\frac{p-2}{3p-2}})$ linear s… ▽ More In this paper we obtain improved iteration complexities for solving $\ell_p$ regression. We provide methods which given any full-rank $\mathbf{A} \in \mathbb{R}^{n \times d}$ with $n \geq d$, $b \in \mathbb{R}^n$, and $p \geq 2$ solve $\min_{x \in \mathbb{R}^d} \left\|\mathbf{A} x - b\right\|_p$ to high precision in time dominated by that of solving $\widetilde{O}_p(d^{\frac{p-2}{3p-2}})$ linear systems in $\mathbf{A}^\top \mathbf{D} \mathbf{A}$ for positive diagonal matrices $\mathbf{D}$. This improves upon the previous best iteration complexity of $\widetilde{O}_p(n^{\frac{p-2}{3p-2}})$ (Adil, Kyng, Peng, Sachdeva 2019). As a corollary, we obtain an $\widetilde{O}(d^{1/3}ε^{-2/3})$ iteration complexity for approximate $\ell_\infty$ regression. Further, for $q \in (1, 2]$ and dual norm $q = p/(p-1)$ we provide an algorithm that solves $\ell_q$ regression in $\widetilde{O}(d^{\frac{p-2}{2p-2}})$ iterations. To obtain this result we analyze row reweightings (closely inspired by $\ell_p$-norm Lewis weights) which allow a closer connection between $\ell_2$ and $\ell_p$ regression. We provide adaptations of two different iterative optimization frameworks which leverage this connection and yield our results. The first framework is based on iterative refinement and multiplicative weights based width reduction and the second framework is based on highly smooth acceleration. Both approaches yield $\widetilde{O}_p(d^{\frac{p-2}{3p-2}})$ iteration methods but the second has a polynomial dependence on $p$ (as opposed to the exponential dependence of the first algorithm) and provides a new alternative to the previous state-of-the-art methods for $\ell_p$ regression for large $p$. △ Less

Submitted 10 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 30 pages

arXiv:2106.11938 [pdf, ps, other]

Robust Regression Revisited: Acceleration and Improved Estimation Rates

Authors: Arun Jambulapati, Jerry Li, Tselil Schramm, Kevin Tian

Abstract: We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of… ▽ More We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature. For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: 47 pages

arXiv:2106.09481 [pdf, other]

Stochastic Bias-Reduced Gradient Methods

Authors: Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

Abstract: We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $δ$, variance $O(\log(1/δ))$, and an expected sampling cost of… ▽ More We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $δ$, variance $O(\log(1/δ))$, and an expected sampling cost of $O(\log(1/δ))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method. △ Less

Submitted 28 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:2105.01778 [pdf, other]

Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

Abstract: We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$. For non-smooth functions, existing methods require $O(Nε^{-2})$ queries to a first-order oracle to compute an $ε$-suboptimal point and $\tilde{O}(Nε^{-1})$ queries if the $f_i$ are $O(1/ε)$-smooth. We develop methods with improved complexity bounds of… ▽ More We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$. For non-smooth functions, existing methods require $O(Nε^{-2})$ queries to a first-order oracle to compute an $ε$-suboptimal point and $\tilde{O}(Nε^{-1})$ queries if the $f_i$ are $O(1/ε)$-smooth. We develop methods with improved complexity bounds of $\tilde{O}(Nε^{-2/3} + ε^{-8/3})$ in the non-smooth case and $\tilde{O}(Nε^{-2/3} + \sqrt{N}ε^{-1})$ in the $O(1/ε)$-smooth case. Our methods consist of a recently proposed ball optimization oracle acceleration algorithm (which we refine) and a careful implementation of said oracle for the softmax function. We also prove an oracle complexity lower bound scaling as $Ω(Nε^{-2/3})$, showing that our dependence on $N$ is optimal up to polylogarithmic factors. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2011.08806 [pdf, other]

Ultrasparse Ultrasparsifiers and Faster Laplacian System Solvers

Authors: Arun Jambulapati, Aaron Sidford

Abstract: In this paper we provide an $O(m (\log \log n)^{O(1)} \log(1/ε))$-expected time algorithm for solving Laplacian systems on $n$-node $m$-edge graphs, improving improving upon the previous best expected runtime of $O(m \sqrt{\log n} (\log \log n)^{O(1)} \log(1/ε))$ achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of $\ell_p$-st… ▽ More In this paper we provide an $O(m (\log \log n)^{O(1)} \log(1/ε))$-expected time algorithm for solving Laplacian systems on $n$-node $m$-edge graphs, improving improving upon the previous best expected runtime of $O(m \sqrt{\log n} (\log \log n)^{O(1)} \log(1/ε))$ achieved by (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu 2014). To obtain this result we provide efficient constructions of $\ell_p$-stretch graph approximations with improved stretch and sparsity bounds. Additionally, as motivation for this work, we show that for every set of vectors in $\mathbb{R}^d$ (not just those induced by graphs) and all $k > 1$ there exist ultrasparsifiers with $d-1 + O(d/\sqrt{k})$ re-weighted vectors of relative condition number at most $k$. For small $k$, this improves upon the previous best known relative condition number of $\tilde{O}(\sqrt{k \log d})$, which is only known for the graph case. △ Less

Submitted 31 March, 2023; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: 56 pages. Updated version includes slightly improved running time and new sparsity bounds for graph ultrasparsifiers

arXiv:2011.03495 [pdf, other]

Semi-Streaming Bipartite Matching in Fewer Passes and Optimal Space

Authors: Sepehr Assadi, Arun Jambulapati, Yujia Jin, Aaron Sidford, Kevin Tian

Abstract: We provide $\widetilde{O}(ε^{-1})$-pass semi-streaming algorithms for computing $(1-ε)$-approximate maximum cardinality matchings in bipartite graphs. Our most efficient methods are deterministic and use optimal, $O(n)$, space, improving upon the space complexity of the previous state-of-the-art $\widetilde{O}(ε^{-1})$-pass algorithm of Ahn and Guha. To obtain our results we provide semi-streaming… ▽ More We provide $\widetilde{O}(ε^{-1})$-pass semi-streaming algorithms for computing $(1-ε)$-approximate maximum cardinality matchings in bipartite graphs. Our most efficient methods are deterministic and use optimal, $O(n)$, space, improving upon the space complexity of the previous state-of-the-art $\widetilde{O}(ε^{-1})$-pass algorithm of Ahn and Guha. To obtain our results we provide semi-streaming adaptations of more general continuous optimization tools. Further, we leverage these techniques to obtain improvements for streaming variants of approximate linear programming, optimal transport, exact matching, transshipment, and shortest path problems. △ Less

Submitted 3 August, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: 54 pages, new version adds applications to transshipment

arXiv:2008.01722 [pdf, ps, other]

Fast and Near-Optimal Diagonal Preconditioning

Authors: Arun Jambulapati, Jerry Li, Christopher Musco, Aaron Sidford, Kevin Tian

Abstract: The convergence rates of iterative methods for solving a linear system $\mathbf{A} x = b$ typically depend on the condition number of the matrix $\mathbf{A}$. Preconditioning is a common way of speeding up these methods by reducing that condition number in a computationally inexpensive way. In this paper, we revisit the decades-old problem of how to best improve $\mathbf{A}$'s condition number by… ▽ More The convergence rates of iterative methods for solving a linear system $\mathbf{A} x = b$ typically depend on the condition number of the matrix $\mathbf{A}$. Preconditioning is a common way of speeding up these methods by reducing that condition number in a computationally inexpensive way. In this paper, we revisit the decades-old problem of how to best improve $\mathbf{A}$'s condition number by left or right diagonal rescaling. We make progress on this problem in several directions. First, we provide new bounds for the classic heuristic of scaling $\mathbf{A}$ by its diagonal values (a.k.a. Jacobi preconditioning). We prove that this approach reduces $\mathbf{A}$'s condition number to within a quadratic factor of the best possible scaling. Second, we give a solver for structured mixed packing and covering semidefinite programs (MPC SDPs) which computes a constant-factor optimal scaling for $\mathbf{A}$ in $\widetilde{O}(\text{nnz}(\mathbf{A}) \cdot \text{poly}(κ^\star))$ time; this matches the cost of solving the linear system after scaling up to a $\widetilde{O}(\text{poly}(κ^\star))$ factor. Third, we demonstrate that a sufficiently general width-independent MPC SDP solver would imply near-optimal runtimes for the scaling problems we consider, and natural variants concerned with measures of average conditioning. Finally, we highlight connections of our preconditioning techniques to semi-random noise models, as well as applications in reducing risk in several statistical regression models. △ Less

Submitted 3 November, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: 46 pages

arXiv:2006.06980 [pdf, ps, other]

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

Authors: Arun Jambulapati, Jerry Li, Kevin Tian

Abstract: We develop two methods for the following fundamental statistical task: given an $ε$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(ε\logε^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Ou… ▽ More We develop two methods for the following fundamental statistical task: given an $ε$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(ε\logε^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + ε)$-approximate solution in $O(p\log(\tfrac{nd}ε)ε^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions). △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: 35 pages

arXiv:2003.08078 [pdf, other]

Acceleration with a Ball Optimization Oracle

Authors: Yair Carmon, Arun Jambulapati, Qijia Jiang, Yujia Jin, Yin Tat Lee, Aaron Sidford, Kevin Tian

Abstract: Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}ε$ calls to the oracle suffice to find an $ε$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an… ▽ More Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}ε$ calls to the oracle suffice to find an $ε$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an $ε$-approximate minimizer with roughly $r^{-2/3} \log \frac{1}ε$ oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with locally stable Hessians using a variant of Newton's method. The resulting algorithm applies to a number of problems of practical and theoretical import, improving upon previous results for logistic and $\ell_\infty$ regression and achieving guarantees comparable to the state-of-the-art for $\ell_p$ regression. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: 37 pages

arXiv:2002.04830 [pdf, ps, other]

Positive Semidefinite Programming: Mixed, Parallel, and Width-Independent

Authors: Arun Jambulapati, Yin Tat Lee, Jerry Li, Swati Padmanabhan, Kevin Tian

Abstract: We give the first approximation algorithm for mixed packing and covering semidefinite programs (SDPs) with polylogarithmic dependence on width. Mixed packing and covering SDPs constitute a fundamental algorithmic primitive with recent applications in combinatorial optimization, robust learning, and quantum complexity. The current approximate solvers for positive semidefinite programming can handle… ▽ More We give the first approximation algorithm for mixed packing and covering semidefinite programs (SDPs) with polylogarithmic dependence on width. Mixed packing and covering SDPs constitute a fundamental algorithmic primitive with recent applications in combinatorial optimization, robust learning, and quantum complexity. The current approximate solvers for positive semidefinite programming can handle only pure packing instances, and technical hurdles prevent their generalization to a wider class of positive instances. For a given multiplicative accuracy of $ε$, our algorithm takes $O(\log^3(ndρ) \cdot ε^{-3})$ parallelizable iterations, where $n$, $d$ are dimensions of the problem and $ρ$ is a width parameter of the instance, generalizing or improving all previous parallel algorithms in the positive linear and semidefinite programming literature. When specialized to pure packing SDPs, our algorithm's iteration complexity is $O(\log^2 (nd) \cdot ε^{-2})$, a slight improvement and derandomization of the state-of-the-art (Allen-Zhu et. al. '16, Peng et. al. '16, Wang et. al. '15). For a wide variety of structured instances commonly found in applications, the iterations of our algorithm run in nearly-linear time. In doing so, we give matrix analytic techniques for overcoming obstacles that have stymied prior approaches to this open problem, as stated in past works (Peng et. al. '16, Mahoney et. al. '16). Crucial to our analysis are a simplification of existing algorithms for mixed positive linear programs, achieved by removing an asymmetry caused by modifying covering constraints, and a suite of matrix inequalities whose proofs are based on analyzing the Schur complements of matrices in a higher dimension. We hope that both our algorithm and techniques open the door to improved solvers for positive semidefinite programming, as well as its applications. △ Less

Submitted 12 July, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: There is an error in this manuscript. This version notes the source of the error on the first page

arXiv:1906.00618 [pdf, other]

A Direct $\tilde{O}(1/ε)$ Iteration Parallel Algorithm for Optimal Transport

Authors: Arun Jambulapati, Aaron Sidford, Kevin Tian

Abstract: Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two distributions, is a fundamental primitive which arises in many learning and statistical settings. We give an algorithm which solves this problem to additive $ε$ with $\tilde{O}(1/ε)$ parallel depth, and $\tilde{O}\left(n^2/ε\right)$ work. Barring a breakthrough on a long-standing algorithmic open problem… ▽ More Optimal transportation, or computing the Wasserstein or ``earth mover's'' distance between two distributions, is a fundamental primitive which arises in many learning and statistical settings. We give an algorithm which solves this problem to additive $ε$ with $\tilde{O}(1/ε)$ parallel depth, and $\tilde{O}\left(n^2/ε\right)$ work. Barring a breakthrough on a long-standing algorithmic open problem, this is optimal for first-order methods. Blanchet et. al. '18, Quanrud '19 obtained similar runtimes through reductions to positive linear programming and matrix scaling. However, these reduction-based algorithms use complicated subroutines which may be deemed impractical due to requiring solvers for second-order iterations (matrix scaling) or non-parallelizability (positive LP). The fastest practical algorithms run in time $\tilde{O}(\min(n^2 / ε^2, n^{2.5} / ε))$ (Dvurechensky et. al. '18, Lin et. al. '19). We bridge this gap by providing a parallel, first-order, $\tilde{O}(1/ε)$ iteration algorithm without worse dependence on dimension, and provide preliminary experimental evidence that our algorithm may enjoy improved practical performance. We obtain this runtime via a primal-dual extragradient method, motivated by recent theoretical improvements to maximum flow (Sherman '17). △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: 23 pages, 2 figures

arXiv:1905.08841 [pdf, ps, other]

Parallel Reachability in Almost Linear Work and Square Root Depth

Authors: Arun Jambulapati, Yang P. Liu, Aaron Sidford

Abstract: In this paper we provide a parallel algorithm that given any $n$-node $m$-edge directed graph and source vertex $s$ computes all vertices reachable from $s$ with $\tilde{O}(m)$ work and $n^{1/2 + o(1)}$ depth with high probability in $n$ . This algorithm also computes a set of $\tilde{O}(n)$ edges which when added to the graph preserves reachability and ensures that the diameter of the resulting g… ▽ More In this paper we provide a parallel algorithm that given any $n$-node $m$-edge directed graph and source vertex $s$ computes all vertices reachable from $s$ with $\tilde{O}(m)$ work and $n^{1/2 + o(1)}$ depth with high probability in $n$ . This algorithm also computes a set of $\tilde{O}(n)$ edges which when added to the graph preserves reachability and ensures that the diameter of the resulting graph is at most $n^{1/2 + o(1)}$. Our result improves upon the previous best known almost linear work reachability algorithm due to Fineman which had depth $\tilde{O}(n^{2/3})$. Further, we show how to leverage this algorithm to achieve improved distributed algorithms for single source reachability in the CONGEST model. In particular, we provide a distributed algorithm that given a $n$-node digraph of undirected hop-diameter $D$ solves the single source reachability problem with $\tilde{O}(n^{1/2} + n^{1/3 + o(1)} D^{2/3})$ rounds of the communication in the CONGEST model with high probability in $n$. Our algorithm is nearly optimal whenever $D = O(n^{1/4 - ε})$ for any constant $ε> 0$ and is the first nearly optimal algorithm for general graphs whose diameter is $Ω(n^δ)$ for any constant $δ$. △ Less

Submitted 5 December, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: 38 pages. v2 fixes a small typo in Section 4 found by Aaron Bernstein. v3 fixes some overflow issues. v4 fixes the proof of Lemma 5.1. We thank Aaron Bernstein for pointing this out

arXiv:1812.06295 [pdf, ps, other]

Efficient Structured Matrix Recovery and Nearly-Linear Time Algorithms for Solving Inverse Symmetric $M$-Matrices

Authors: Arun Jambulapati, Kirankumar Shiragur, Aaron Sidford

Abstract: In this paper we show how to recover a spectral approximations to broad classes of structured matrices using only a polylogarithmic number of adaptive linear measurements to either the matrix or its inverse. Leveraging this result we obtain faster algorithms for variety of linear algebraic problems. Key results include: $\bullet$ A nearly linear time algorithm for solving the inverse of symmetri… ▽ More In this paper we show how to recover a spectral approximations to broad classes of structured matrices using only a polylogarithmic number of adaptive linear measurements to either the matrix or its inverse. Leveraging this result we obtain faster algorithms for variety of linear algebraic problems. Key results include: $\bullet$ A nearly linear time algorithm for solving the inverse of symmetric $M$-matrices, a strict superset of Laplacians and SDD matrices. $\bullet$ An $\tilde{O}(n^2)$ time algorithm for solving $n \times n$ linear systems that are constant spectral approximations of Laplacians or more generally, SDD matrices. $\bullet$ An $\tilde{O}(n^2)$ algorithm to recover a spectral approximation of a $n$-vertex graph using only $\tilde{O}(1)$ matrix-vector multiplies with its Laplacian matrix. The previous best results for each problem either used a trivial number of queries to exactly recover the matrix or a trivial $O(n^ω)$ running time, where $ω$ is the matrix multiplication constant. We achieve these results by generalizing recent semidefinite programming based linear sized sparsifier results of Lee and Sun (2017) and providing iterative methods inspired by the semistreaming sparsification results of Kapralov, Lee, Musco, Musco and Sidford (2014) and input sparsity time linear system solving results of Li, Miller, and Peng (2013). We hope that by initiating study of these natural problems, expanding the robustness and scope of recent nearly linear time linear system solving research, and providing general matrix recovery machinery this work may serve as a stepping stone for faster algorithms. △ Less

Submitted 15 December, 2018; originally announced December 2018.

Comments: 33 pages

arXiv:1810.07717 [pdf, ps, other]

Towards Optimal Running Times for Optimal Transport

Authors: Jose Blanchet, Arun Jambulapati, Carson Kent, Aaron Sidford

Abstract: In this work, we provide faster algorithms for approximating the optimal transport distance, e.g. earth mover's distance, between two discrete probability distributions $μ, ν\in Δ^n$. Given a cost function $C : [n] \times [n] \to \mathbb{R}_{\geq 0}$ where $C(i,j) \leq 1$ quantifies the penalty of transporting a unit of mass from $i$ to $j$, we show how to compute a coupling $X$ between $r$ and… ▽ More In this work, we provide faster algorithms for approximating the optimal transport distance, e.g. earth mover's distance, between two discrete probability distributions $μ, ν\in Δ^n$. Given a cost function $C : [n] \times [n] \to \mathbb{R}_{\geq 0}$ where $C(i,j) \leq 1$ quantifies the penalty of transporting a unit of mass from $i$ to $j$, we show how to compute a coupling $X$ between $r$ and $c$ in time $\widetilde{O}\left(n^2 /ε\right)$ whose expected transportation cost is within an additive $ε$ of optimal. This improves upon the previously best known running time for this problem of $\widetilde{O}\left(\text{min}\left\{ n^{9/4}/ε, n^2/ε^2 \right\}\right)$. We achieve our results by providing reductions from optimal transport to canonical optimization problems for which recent algorithmic efforts have provided nearly-linear time algorithms. Leveraging nearly linear time algorithms for solving packing linear programs and for solving the matrix balancing problem, we obtain two separate proofs of our stated running time. Further, one of our algorithms is easily parallelized and can be implemented with depth $\widetilde{O}(1/ε)$. Moreover, we show that further algorithmic improvements to our result would be surprising in the sense that any improvement would yield an $o(n^{2.5})$ algorithm for \textit{maximum cardinality bipartite matching}, for which currently the only known algorithms for achieving such a result are based on fast-matrix multiplication. △ Less

Submitted 28 January, 2020; v1 submitted 17 October, 2018; originally announced October 2018.

arXiv:1810.02348 [pdf, ps, other]

Perron-Frobenius Theory in Nearly Linear Time: Positive Eigenvectors, M-matrices, Graph Kernels, and Other Applications

Authors: AmirMahdi Ahmadinejad, Arun Jambulapati, Amin Saberi, Aaron Sidford

Abstract: In this paper we provide nearly linear time algorithms for several problems closely associated with the classic Perron-Frobenius theorem, including computing Perron vectors, i.e. entrywise non-negative eigenvectors of non-negative matrices, and solving linear systems in asymmetric M-matrices, a generalization of Laplacian systems. The running times of our algorithms depend nearly linearly on the i… ▽ More In this paper we provide nearly linear time algorithms for several problems closely associated with the classic Perron-Frobenius theorem, including computing Perron vectors, i.e. entrywise non-negative eigenvectors of non-negative matrices, and solving linear systems in asymmetric M-matrices, a generalization of Laplacian systems. The running times of our algorithms depend nearly linearly on the input size and polylogarithmically on the desired accuracy and problem condition number. Leveraging these results we also provide improved running times for a broader range of problems including computing random walk-based graph kernels, computing Katz centrality, and more. The running times of our algorithms improve upon previously known results which either depended polynomially on the condition number of the problem, required quadratic time, or only applied to special cases. We obtain these results by providing new iterative methods for reducing these problems to solving linear systems in Row-Column Diagonally Dominant (RCDD) matrices. Our methods are related to the classic shift-and-invert preconditioning technique for eigenvector computation and constitute the first alternative to the result in Cohen et al. (2016) for reducing stationary distribution computation and solving directed Laplacian systems to solving RCDD systems. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1711.00571 [pdf, ps, other]

Efficient $\widetilde{O}(n/ε)$ Spectral Sketches for the Laplacian and its Pseudoinverse

Authors: Arun Jambulapati, Aaron Sidford

Abstract: In this paper we consider the problem of efficiently computing $ε$-sketches for the Laplacian and its pseudoinverse. Given a Laplacian and an error tolerance $ε$, we seek to construct a function $f$ such that for any vector $x$ (chosen obliviously from $f$), with high probability $(1-ε) x^\top A x \leq f(x) \leq (1 + ε) x^\top A x$ where $A$ is either the Laplacian or its pseudoinverse. Our goal i… ▽ More In this paper we consider the problem of efficiently computing $ε$-sketches for the Laplacian and its pseudoinverse. Given a Laplacian and an error tolerance $ε$, we seek to construct a function $f$ such that for any vector $x$ (chosen obliviously from $f$), with high probability $(1-ε) x^\top A x \leq f(x) \leq (1 + ε) x^\top A x$ where $A$ is either the Laplacian or its pseudoinverse. Our goal is to construct such a sketch $f$ efficiently and to store it in the least space possible. We provide nearly-linear time algorithms that, when given a Laplacian matrix $\mathcal{L} \in \mathbb{R}^{n \times n}$ and an error tolerance $ε$, produce $\tilde{O}(n/ε)$-size sketches of both $\mathcal{L}$ and its pseudoinverse. Our algorithms improve upon the previous best sketch size of $\widetilde{O}(n / ε^{1.6})$ for sketching the Laplacian form by Andoni et al (2015) and $O(n / ε^2)$ for sketching the Laplacian pseudoinverse by Batson, Spielman, and Srivastava (2008). Furthermore we show how to compute all-pairs effective resistances from $\widetilde{O}(n/ε)$ size sketch in $\widetilde{O}(n^2/ε)$ time. This improves upon the previous best running time of $\widetilde{O}(n^2/ε^2)$ by Spielman and Srivastava (2008). △ Less

Submitted 7 January, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: Accepted to SODA 2018; v2 fixes a small bug in the proof of lemma 3. This does not affect correctness of any of our results

Showing 1–33 of 33 results for author: Jambulapati, A