Skip to main content

Showing 1–35 of 35 results for author: Carmon, Y

  1. arXiv:2406.19146  [pdf, other

    cs.LG cs.CL

    Resolving Discrepancies in Compute-Optimal Scaling of Language Models

    Authors: Tomer Porian, Mitchell Wortsman, Jenia Jitsev, Ludwig Schmidt, Yair Carmon

    Abstract: Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup durati… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  3. arXiv:2404.00666  [pdf, other

    cs.LG math.OC

    Accelerated Parameter-Free Stochastic Optimization

    Authors: Itai Kreisler, Maor Ivgi, Oliver Hinder, Yair Carmon

    Abstract: We propose a method that achieves near-optimal rates for smooth stochastic convex optimization and requires essentially no prior knowledge of problem parameters. This improves on prior work which requires knowing at least the initial distance to optimality d0. Our method, U-DoG, combines UniXGrad (Kavis et al., 2019) and DoG (Ivgi et al., 2023) with novel iterate stabilization techniques. It requi… ▽ More

    Submitted 5 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  4. arXiv:2403.08540  [pdf, other

    cs.CL cs.LG

    Language models scale reliably with over-training and on downstream tasks

    Authors: Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Luca Soldaini, Alexandros G. Dimakis, Gabriel Ilharco, Pang Wei Koh, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

    Abstract: Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contr… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2402.10898  [pdf, other

    math.OC cs.LG stat.ML

    The Price of Adaptivity in Stochastic Convex Optimization

    Authors: Yair Carmon, Oliver Hinder

    Abstract: We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024; to appear in proceedings as an extended abstract

  6. arXiv:2311.10886  [pdf, other

    cs.DS cs.LG math.OC

    A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

    Abstract: We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $ε$-approximate solution using $\widetilde{O}(n ε^{-1/3} + ε^{-2})$ gradient and function evaluations, and $\widetilde{O}(n ε^{-4/3})$ additional runtime. For large $n$, our evaluation complexity is optimal up to pol… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  7. arXiv:2305.13064  [pdf, other

    cs.LG math.OC stat.ML

    Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

    Authors: Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon

    Abstract: Recent research shows that when Gradient Descent (GD) is applied to neural networks, the loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent converges to its ''Edge of Stability'' (EoS). Here, we find a quantity that does decrease monotonically throughout GD training: the sharpness attained by the gradient flow solution (GFS)-the solution that would be obtai… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  8. arXiv:2304.14108  [pdf, other

    cs.CV cs.CL cs.LG

    DataComp: In search of the next generation of multimodal datasets

    Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song , et al. (9 additional authors not shown)

    Abstract: Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Commo… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  9. arXiv:2302.12022  [pdf, other

    cs.LG math.OC

    DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule

    Authors: Maor Ivgi, Oliver Hinder, Yair Carmon

    Abstract: We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). The DoG step sizes depend on simple empirical quantities (distance from the initial point and norms of gradients) and have no ``learning rate'' parameter. Theoretically, we show that a slight variation of the DoG formula enjoys strong parameter-free convergence guarantees for stochastic convex opti… ▽ More

    Submitted 16 July, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: To appear at ICML 2023

  10. arXiv:2301.00457  [pdf, other

    math.OC cs.CR cs.DS cs.LG stat.ML

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Yin Tat Lee, Daogao Liu, Aaron Sidford, Kevin Tian

    Abstract: We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO obj… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

  11. arXiv:2211.15724  [pdf, other

    cs.LG stat.ML

    Malign Overfitting: Interpolation Can Provably Preclude Invariance

    Authors: Yoav Wald, Gal Yona, Uri Shalit, Yair Carmon

    Abstract: Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phe… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

  12. arXiv:2206.08627  [pdf, other

    math.OC cs.DS cs.LG

    RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

    Abstract: The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Accepted at ICML'22

  13. arXiv:2205.15371  [pdf, other

    math.OC cs.DS

    Optimal and Adaptive Monteiro-Svaiter Acceleration

    Authors: Yair Carmon, Danielle Hausler, Arun Jambulapati, Yujia Jin, Aaron Sidford

    Abstract: We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem para… ▽ More

    Submitted 28 November, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

  14. arXiv:2205.02160  [pdf, other

    math.OC cs.LG stat.ML

    Making SGD Parameter-Free

    Authors: Yair Carmon, Oliver Hinder

    Abstract: We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to t… ▽ More

    Submitted 1 March, 2024; v1 submitted 4 May, 2022; originally announced May 2022.

  15. arXiv:2203.13225  [pdf, other

    math.OC cs.DS cs.LG

    Distributionally Robust Optimization via Ball Oracle Acceleration

    Authors: Yair Carmon, Danielle Hausler

    Abstract: We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses. In particular, we consider group-structured and bounded $f$-divergence uncertainty sets. Our approach relies on an accelerated method that queries a ball optimization oracle, i.e., a subroutine that minimizes the objective within a small ball around the query point. Our main contribution is efficient… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  16. arXiv:2203.05482  [pdf, other

    cs.LG cs.CL cs.CV

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

    Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

    Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low… ▽ More

    Submitted 1 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ICML 2022. The last three authors contributed equally

  17. arXiv:2202.06387  [pdf, other

    cs.CL cs.LG math.NA

    Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

    Authors: Maor Ivgi, Yair Carmon, Jonathan Berant

    Abstract: Neural scaling laws define a predictable relationship between a model's parameter count and its performance after training in the form of a power law. However, most research to date has not explicitly investigated whether scaling laws can be used to accelerate model development. In this work, we perform such an empirical investigation across a wide range of language understanding tasks, starting f… ▽ More

    Submitted 18 October, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: Findings of EMNLP 2022

  18. arXiv:2107.04649  [pdf, other

    cs.LG stat.ML

    Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

    Authors: John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt

    Abstract: For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribut… ▽ More

    Submitted 7 October, 2021; v1 submitted 9 July, 2021; originally announced July 2021.

  19. arXiv:2107.00469  [pdf, ps, other

    math.OC cs.LG

    Never Go Full Batch (in Stochastic Convex Optimization)

    Authors: Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

    Abstract: We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated va… ▽ More

    Submitted 29 June, 2021; originally announced July 2021.

  20. arXiv:2106.09481  [pdf, other

    math.OC cs.DS cs.LG

    Stochastic Bias-Reduced Gradient Methods

    Authors: Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

    Abstract: We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $δ$, variance $O(\log(1/δ))$, and an expected sampling cost of… ▽ More

    Submitted 28 October, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  21. arXiv:2105.01778  [pdf, other

    math.OC cs.DS cs.LG

    Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss

    Authors: Yair Carmon, Arun Jambulapati, Yujia Jin, Aaron Sidford

    Abstract: We characterize the complexity of minimizing $\max_{i\in[N]} f_i(x)$ for convex, Lipschitz functions $f_1,\ldots, f_N$. For non-smooth functions, existing methods require $O(Nε^{-2})$ queries to a first-order oracle to compute an $ε$-suboptimal point and $\tilde{O}(Nε^{-1})$ queries if the $f_i$ are $O(1/ε)$-smooth. We develop methods with improved complexity bounds of… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  22. arXiv:2010.05893  [pdf, other

    math.OC cs.LG stat.ML

    Large-Scale Methods for Distributionally Robust Optimization

    Authors: Daniel Levy, Yair Carmon, John C. Duchi, Aaron Sidford

    Abstract: We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $χ^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $χ^2$ uncertainty sets these are the first such… ▽ More

    Submitted 10 December, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

    Comments: 63 pages, NeurIPS 2020

  23. arXiv:2009.08447  [pdf, other

    cs.DS cs.LG math.OC

    Coordinate Methods for Matrix Games

    Authors: Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

    Abstract: We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $\min_{x \in \mathcal{X}} \max_{y\in\mathcal{Y}} y^\top A x$ which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted at FOCS 2020

  24. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  25. arXiv:2003.08078  [pdf, other

    math.OC cs.DS

    Acceleration with a Ball Optimization Oracle

    Authors: Yair Carmon, Arun Jambulapati, Qijia Jiang, Yujia Jin, Yin Tat Lee, Aaron Sidford, Kevin Tian

    Abstract: Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}ε$ calls to the oracle suffice to find an $ε$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: 37 pages

  26. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  27. arXiv:1907.02056  [pdf, other

    math.OC cs.DS cs.LG

    Variance Reduction for Matrix Games

    Authors: Yair Carmon, Yujia Jin, Aaron Sidford, Kevin Tian

    Abstract: We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $ε$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/ε$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient met… ▽ More

    Submitted 30 November, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

  28. arXiv:1905.13736  [pdf, other

    stat.ML cs.CV cs.LG

    Unlabeled Data Improves Adversarial Robustness

    Authors: Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, John C. Duchi

    Abstract: We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high… ▽ More

    Submitted 13 January, 2022; v1 submitted 31 May, 2019; originally announced May 2019.

    Comments: Corrected some math typos in the proof of Lemma 1

  29. arXiv:1903.02675  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    A Rank-1 Sketch for Matrix Multiplicative Weights

    Authors: Yair Carmon, John C. Duchi, Aaron Sidford, Kevin Tian

    Abstract: We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys (in expectation) the same regret bounds as MMW, up to a small constant factor. Unlike MMW, where every step requires full matrix exponentiation, our steps require only a single product of the form $e^A b$, which the Lanczos method approximates efficiently. Our key technique is to view the sketch as a… ▽ More

    Submitted 12 August, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

  30. arXiv:1612.00547  [pdf, other

    math.OC cs.DS

    Gradient Descent Finds the Cubic-Regularized Non-Convex Newton Step

    Authors: Yair Carmon, John C. Duchi

    Abstract: We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the $\textit{global minimum}$ to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ step… ▽ More

    Submitted 30 August, 2022; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: Corrected Lemma 4.6(iii) and changed the title and some notation to match the journal version of the paper

  31. arXiv:1611.00756  [pdf, other

    math.OC cs.DS

    Accelerated Methods for Non-Convex Optimization

    Authors: Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

    Abstract: We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time $O(ε^{-7/4} \log(1/ ε) )$ to find an $ε$-stationary point, meaning a point $x$ such that $\|\nabla f(x)\| \le ε$. The method improves upon the $O(ε^{-2} )$ complexity of gradient descent and provides the additional second-order guarantee th… ▽ More

    Submitted 2 February, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

  32. arXiv:1605.08361  [pdf, ps, other

    stat.ML cs.LG cs.NE

    No bad local minima: Data independent training error guarantees for multilayer neural networks

    Authors: Daniel Soudry, Yair Carmon

    Abstract: We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minim… ▽ More

    Submitted 30 May, 2016; v1 submitted 26 May, 2016; originally announced May 2016.

  33. arXiv:1401.1480  [pdf, other

    cs.IT

    Lower Bounds and Approximations for the Information Rate of the ISI Channel

    Authors: Yair Carmon, Shlomo Shamai

    Abstract: We consider the discrete-time intersymbol interference (ISI) channel model, with additive Gaussian noise and fixed i.i.d. inputs. In this setting, we investigate the expression put forth by Shamai and Laroia as a conjectured lower bound for the input-output mutual information after application of a MMSE-DFE receiver. A low-SNR expansion is used to prove that the conjectured bound does not hold und… ▽ More

    Submitted 7 January, 2014; originally announced January 2014.

    Comments: 21 pages, 4 figures

  34. Comparison of the Achievable Rates in OFDM and Single Carrier Modulation with I.I.D. Inputs

    Authors: Yair Carmon, Shlomo Shamai, Tsachy Weissman

    Abstract: We compare the maximum achievable rates in single-carrier and OFDM modulation schemes, under the practical assumptions of i.i.d. finite alphabet inputs and linear ISI with additive Gaussian noise. We show that the Shamai-Laroia approximation serves as a bridge between the two rates: while it is well known that this approximation is often a lower bound on the single-carrier achievable rate, it is r… ▽ More

    Submitted 1 November, 2014; v1 submitted 24 June, 2013; originally announced June 2013.

    Comments: Revised version of IEEE IT submission. Includes new results on uniform inputs

  35. Information, Estimation, and Lookahead in the Gaussian channel

    Authors: Kartik Venkat, Tsachy Weissman, Yair Carmon, Shlomo Shamai

    Abstract: We consider mean squared estimation with lookahead of a continuous-time signal corrupted by additive white Gaussian noise. We show that the mutual information rate function, i.e., the mutual information rate as function of the signal-to-noise ratio (SNR), does not, in general, determine the minimum mean squared error (MMSE) with fixed finite lookahead, in contrast to the special cases with 0 and i… ▽ More

    Submitted 8 February, 2013; originally announced February 2013.

    Comments: 30 pages, 10 figures, submitted to IEEE Transactions on Information Theory