Skip to main content

Showing 51–87 of 87 results for author: Kakade, S M

  1. arXiv:1610.03774  [pdf, other

    stat.ML cs.DS cs.LG

    Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification

    Authors: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford

    Abstract: This work characterizes the benefits of averaging schemes widely used in conjunction with stochastic gradient descent (SGD). In particular, this work provides a sharp analysis of: (1) mini-batching, a method of averaging many samples of a stochastic gradient to both reduce the variance of the stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averagin… ▽ More

    Submitted 31 July, 2018; v1 submitted 12 October, 2016; originally announced October 2016.

    Comments: 39 pages. Published in the Journal of Machine Learning Research (JMLR)

  2. arXiv:1605.08754  [pdf, other

    cs.DS cs.LG math.NA math.OC

    Faster Eigenvector Computation via Shift-and-Invert Preconditioning

    Authors: Dan Garber, Elad Hazan, Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

    Abstract: We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

    Comments: Appearing in ICML 2016. Combination of work in arXiv:1509.05647 and arXiv:1510.08896

  3. arXiv:1605.08370  [pdf, ps, other

    cs.LG math.OC stat.ML

    Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

    Authors: Chi Jin, Sham M. Kakade, Praneeth Netrapalli

    Abstract: Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications,… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  4. arXiv:1604.03930  [pdf, ps, other

    cs.LG math.OC stat.ML

    Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

    Authors: Rong Ge, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

    Abstract: This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al., 2009). We provide si… ▽ More

    Submitted 27 May, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

    Comments: International Conference on Machine Learning (ICML) 2016

  5. arXiv:1602.06929  [pdf, ps, other

    cs.LG cs.DS cs.NE stat.ML

    Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

    Authors: Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli, Aaron Sidford

    Abstract: This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = Σ$ for $Σ\succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $Σ$. The algorithm nearly matches (and in… ▽ More

    Submitted 28 March, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: Updated title

  6. arXiv:1602.06586  [pdf, ps, other

    cs.LG

    Recovering Structured Probability Matrices

    Authors: Qingqing Huang, Sham M. Kakade, Weihao Kong, Gregory Valiant

    Abstract: We consider the problem of accurately recovering a matrix B of size M by M , which represents a probability distribution over M2 outcomes, given access to an observed matrix of "counts" generated by taking independent samples from the distribution B. How can structural properties of the underlying matrix B be leveraged to yield computationally efficient and information theoretically optimal recons… ▽ More

    Submitted 5 February, 2018; v1 submitted 21 February, 2016; originally announced February 2016.

  7. arXiv:1510.08896  [pdf, other

    cs.DS cs.LG math.NA math.OC

    Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

    Authors: Chi Jin, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

    Abstract: We provide faster algorithms and improved sample complexities for approximating the top eigenvector of a matrix. Offline Setting: Given an $n \times d$ matrix $A$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O ( [nnz(A) + \frac{d \cdot sr(A)}{gap^2}]\cdot \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d \cdot sr(A))^{1/4}}{\sqrt{gap}}]\cdot \log1/ε)$. Here $sr(A)$ is… ▽ More

    Submitted 29 May, 2016; v1 submitted 29 October, 2015; originally announced October 2015.

    Comments: Manuscript outdated. Updated version at arxiv:1605.08754

  8. arXiv:1509.07943  [pdf, other

    cs.LG

    Super-Resolution Off the Grid

    Authors: Qingqing Huang, Sham M. Kakade

    Abstract: Super-resolution is the problem of recovering a superposition of point sources using bandlimited measurements, which may be corrupted with noise. This signal processing problem arises in numerous imaging problems, ranging from astronomy to biology to spectroscopy, where it is common to take (coarse) Fourier measurements of an object. Of particular interest is in obtaining estimation procedures whi… ▽ More

    Submitted 25 September, 2015; originally announced September 2015.

  9. arXiv:1507.05854  [pdf, other

    math.NA cs.DS math.OC

    Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot

    Authors: Prateek Jain, Chi Jin, Sham M. Kakade, Praneeth Netrapalli

    Abstract: While there has been a significant amount of work studying gradient descent techniques for non-convex optimization problems over the last few years, all existing results establish either local convergence with good rates or global convergence with highly suboptimal rates, for many problems of interest. In this paper, we take the first step in getting the best of both worlds -- establishing global… ▽ More

    Submitted 9 March, 2017; v1 submitted 21 July, 2015; originally announced July 2015.

    Comments: Appear in AISTATS 2017

  10. arXiv:1506.07512  [pdf, other

    stat.ML cs.DS cs.LG

    Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: We develop a family of accelerated stochastic algorithms that minimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework based on the classical proximal point algorithm. Namely, we provide several a… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  11. arXiv:1503.00424  [pdf, other

    cs.LG

    Learning Mixtures of Gaussians in High Dimensions

    Authors: Rong Ge, Qingqing Huang, Sham M. Kakade

    Abstract: Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covariance matrices of these Gaussians. This learning problem arises in many areas ranging from the natural sciences to the social sciences, and has also found many… ▽ More

    Submitted 9 March, 2015; v1 submitted 2 March, 2015; originally announced March 2015.

  12. arXiv:1412.6606  [pdf, other

    stat.ML cs.LG

    Competing with the Empirical Risk Minimizer in a Single Pass

    Authors: Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

    Abstract: In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or… ▽ More

    Submitted 25 February, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

  13. arXiv:1310.1949  [pdf, other

    cs.LG stat.ML

    Least Squares Revisited: Scalable Approaches for Multi-class Prediction

    Authors: Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

    Abstract: This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large. These robust and parameter free algorithms are essentially iterative least-squares updates and very versatile both in theory and in practice. On the theoretical front, we present several variants with convergence guarantees… ▽ More

    Submitted 21 October, 2013; v1 submitted 7 October, 2013; originally announced October 2013.

  14. arXiv:1302.2684  [pdf, ps, other

    cs.LG cs.SI stat.ML

    A Tensor Approach to Learning Mixed Membership Community Models

    Authors: Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade

    Abstract: Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed… ▽ More

    Submitted 24 October, 2013; v1 submitted 11 February, 2013; originally announced February 2013.

  15. arXiv:1211.5414  [pdf, ps, other

    cs.DS cs.LG math.NA stat.ML

    Analysis of a randomized approximation scheme for matrix multiplication

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: This note gives a simple analysis of a randomized approximation scheme for matrix multiplication proposed by Sarlos (2006) based on a random rotation followed by uniform column sampling. The result follows from a matrix version of Bernstein's inequality and a tail inequality for quadratic forms in subgaussian random vectors.

    Submitted 23 November, 2012; originally announced November 2012.

  16. arXiv:1210.7559  [pdf, ps, other

    cs.LG math.NA stat.ML

    Tensor decompositions for learning latent variable models

    Authors: Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

    Abstract: This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to… ▽ More

    Submitted 13 November, 2014; v1 submitted 29 October, 2012; originally announced October 2012.

    Journal ref: Journal of Machine Learning Research, 15(Aug):2773-2832, 2014

  17. arXiv:1209.5350  [pdf, other

    stat.ML cs.LG stat.AP

    Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

    Authors: Animashree Anandkumar, Daniel Hsu, Adel Javanmard, Sham M. Kakade

    Abstract: Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these… ▽ More

    Submitted 24 May, 2013; v1 submitted 24 September, 2012; originally announced September 2012.

    Comments: 38 pages, 6 figures, 2 tables, applications in topic models and Bayesian networks are studied. Simulation section is added

  18. arXiv:1207.1388  [pdf

    cs.AI cs.FL

    Planning in POMDPs Using Multiplicity Automata

    Authors: Eyal Even-Dar, Sham M. Kakade, Yishay Mansour

    Abstract: Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state represe… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-185-192

  19. arXiv:1206.5766  [pdf, ps, other

    cs.LG stat.ML

    Learning mixtures of spherical Gaussians: moment methods and spectral decompositions

    Authors: Daniel Hsu, Sham M. Kakade

    Abstract: This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationall… ▽ More

    Submitted 28 October, 2012; v1 submitted 25 June, 2012; originally announced June 2012.

  20. arXiv:1206.3137  [pdf, other

    stat.ML cs.LG

    Identifiability and Unmixing of Latent Parse Trees

    Authors: Daniel Hsu, Sham M. Kakade, Percy Liang

    Abstract: This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficient… ▽ More

    Submitted 14 June, 2012; originally announced June 2012.

  21. arXiv:1204.6703  [pdf, ps, other

    cs.LG stat.ML

    A Spectral Algorithm for Latent Dirichlet Allocation

    Authors: Animashree Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Yi-Kai Liu

    Abstract: The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimat… ▽ More

    Submitted 17 January, 2013; v1 submitted 30 April, 2012; originally announced April 2012.

    Comments: Changed title to match conference version, which appears in Advances in Neural Information Processing Systems 25, 2012

  22. arXiv:1203.0697  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Learning High-Dimensional Mixtures of Graphical Models

    Authors: A. Anandkumar, D. Hsu, F. Huang, S. M. Kakade

    Abstract: We consider unsupervised estimation of mixtures of discrete graphical models, where the class variable corresponding to the mixture components is hidden and each mixture component over the observed variables can have a potentially different Markov graph structure and parameters. We propose a novel approach for estimating the mixture components, and our output is a tree-mixture model which serves a… ▽ More

    Submitted 30 June, 2012; v1 submitted 3 March, 2012; originally announced March 2012.

  23. arXiv:1203.0683  [pdf, ps, other

    cs.LG stat.ML

    A Method of Moments for Mixture Models and Hidden Markov Models

    Authors: Animashree Anandkumar, Daniel Hsu, Sham M. Kakade

    Abstract: Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typi… ▽ More

    Submitted 5 September, 2012; v1 submitted 3 March, 2012; originally announced March 2012.

  24. arXiv:1202.3079  [pdf, ps, other

    cs.LG stat.ML

    Towards minimax policies for online linear optimization with bandit feedback

    Authors: Sébastien Bubeck, Nicolò Cesa-Bianchi, Sham M. Kakade

    Abstract: We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $\sqrt{d}$ factor compared to previous works, and give… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

  25. arXiv:1110.2842  [pdf, ps, other

    math.PR cs.LG

    A tail inequality for quadratic forms of subgaussian random vectors

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: We prove an exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector. The bound is analogous to one that holds when the vector has independent Gaussian entries.

    Submitted 13 October, 2011; originally announced October 2011.

  26. arXiv:1107.1744  [pdf, other

    math.OC cs.LG eess.SY

    Stochastic convex optimization with bandit feedback

    Authors: Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin

    Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \xset$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm'… ▽ More

    Submitted 8 October, 2011; v1 submitted 8 July, 2011; originally announced July 2011.

  27. arXiv:1107.1283  [pdf, other

    cs.LG stat.ML

    Spectral Methods for Learning Multivariate Latent Tree Structure

    Authors: Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang

    Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the… ▽ More

    Submitted 8 November, 2011; v1 submitted 6 July, 2011; originally announced July 2011.

  28. arXiv:1106.2363  [pdf, ps, other

    math.ST cs.AI cs.LG stat.ML

    Random design analysis of ridge regression

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: This work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the ``out-of-sample'' prediction error, as opposed to the ``in-sample'' (fixed design) error. The analysis also reveals the effect of… ▽ More

    Submitted 24 March, 2014; v1 submitted 12 June, 2011; originally announced June 2011.

  29. arXiv:1105.0875  [pdf, ps, other

    stat.ML

    A Risk Comparison of Ordinary Least Squares vs Ridge Regression

    Authors: Paramveer S. Dhillon, Dean P. Foster, Sham M. Kakade, Lyle H. Ungar

    Abstract: We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a Principal Component Analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method is within a constant factor (name… ▽ More

    Submitted 31 May, 2013; v1 submitted 4 May, 2011; originally announced May 2011.

    Comments: Appearing in JMLR 14, June 2013

  30. arXiv:1104.1672  [pdf, ps, other

    math.PR cs.LG stat.ML

    Dimension-free tail inequalities for sums of random matrices

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: We derive exponential tail inequalities for sums of random matrices with no dependence on the explicit matrix dimensions. These are similar to the matrix versions of the Chernoff bound and Bernstein inequality except with the explicit matrix dimensions replaced by a trace quantity that can be small even when the dimension is large or infinite. Some applications to principal component analysis and… ▽ More

    Submitted 16 April, 2011; v1 submitted 9 April, 2011; originally announced April 2011.

  31. arXiv:1011.1518  [pdf, ps, other

    stat.ML cs.LG math.NA

    Robust Matrix Decomposition with Outliers

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse matrix (outliers), and the goal is to recover these individual components from the observed sum. Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, and principal components analysis. We study conditions… ▽ More

    Submitted 3 December, 2010; v1 submitted 5 November, 2010; originally announced November 2010.

    Comments: Corrected comparisons to previous work of Candes et al (2009)

  32. arXiv:1001.4598  [pdf, ps, other

    cs.GT

    An Optimal Dynamic Mechanism for Multi-Armed Bandit Processes

    Authors: Sham M. Kakade, Ilan Lobel, Hamid Nazerzadeh

    Abstract: We consider the problem of revenue-optimal dynamic mechanism design in settings where agents' types evolve over time as a function of their (both public and private) experience with items that are auctioned repeatedly over an infinite horizon. A central question here is understanding what natural restrictions on the environment permit the design of optimal mechanisms (note that even in the simpler… ▽ More

    Submitted 15 October, 2010; v1 submitted 26 January, 2010; originally announced January 2010.

  33. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

    Authors: Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias Seeger

    Abstract: Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-U… ▽ More

    Submitted 9 June, 2010; v1 submitted 20 December, 2009; originally announced December 2009.

  34. arXiv:0911.0054  [pdf, ps, other

    cs.LG stat.ML

    Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity

    Authors: Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari

    Abstract: The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in high-dimensions, such as when there is some sparsity pattern of the optimal parameter. This work characterizes a certain strong convexity property of general exponential families, which allow their generalization… ▽ More

    Submitted 16 May, 2015; v1 submitted 30 October, 2009; originally announced November 2009.

    Comments: Errata added. Incorrect claim about cumulants of the Bernoulli distribution fixed

    ACM Class: I.2.6

  35. arXiv:0910.0610  [pdf, ps, other

    cs.LG stat.ML

    Regularization Techniques for Learning with Matrices

    Authors: Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari

    Abstract: There is growing body of learning problems for which it is natural to organize the parameters into matrix, so as to appropriately regularize the parameters under some matrix norm (in order to impose some more sophisticated prior knowledge). This work describes and analyzes a systematic method for constructing such matrix-based, regularization methods. In particular, we focus on how the underlying… ▽ More

    Submitted 17 October, 2010; v1 submitted 4 October, 2009; originally announced October 2009.

    ACM Class: I.2.6

  36. arXiv:0902.1284  [pdf, ps, other

    cs.LG

    Multi-Label Prediction via Compressed Sensing

    Authors: Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang

    Abstract: We consider multi-label prediction problems with large output spaces under the assumption of output sparsity -- that the target (label) vectors have small support. We develop a general theory for a variant of the popular error correcting output code scheme, using ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multi-label regressi… ▽ More

    Submitted 2 June, 2009; v1 submitted 7 February, 2009; originally announced February 2009.

  37. arXiv:0811.4413  [pdf, ps, other

    cs.LG cs.AI

    A Spectral Algorithm for Learning Hidden Markov Models

    Authors: Daniel Hsu, Sham M. Kakade, Tong Zhang

    Abstract: Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the small… ▽ More

    Submitted 6 July, 2012; v1 submitted 26 November, 2008; originally announced November 2008.

    Comments: Published in JCSS Special Issue "Learning Theory 2009"

    Journal ref: Journal of Computer and System Sciences, 78(5):1460-1480, 2012