Skip to main content

Showing 1–2 of 2 results for author: Hseu, J

  1. arXiv:1904.00962  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

    Authors: Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

    Abstract: Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue. The most prominent algorithm in this line of research is LARS, which by employing layerwise adaptive learning rates trains ResNet on ImageNet in a few minutes. However, LARS performs poorly fo… ▽ More

    Submitted 3 January, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: Published as a conference paper at ICLR 2020

  2. arXiv:1901.08256  [pdf, other

    cs.LG stat.ML

    Large-Batch Training for LSTM and Beyond

    Authors: Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

    Abstract: Large-batch training approaches have enabled researchers to utilize large-scale distributed processing and greatly accelerate deep-neural net (DNN) training. For example, by scaling the batch size from 256 to 32K, researchers have been able to reduce the training time of ResNet50 on ImageNet from 29 hours to 2.2 minutes (Ying et al., 2018). In this paper, we propose a new approach called linear-ep… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: Preprint. Work in progress. We may update this draft recently