AI505 Paper list for share

1. ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
https://arxiv.org/pdf/1412.6980.pdf

2. SVRG
https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

3. SGD: General Analysis and Improved Rates
https://arxiv.org/pdf/1901.09401.pdf

4. A CLOSER LOOK AT DEEP LEARNING HEURISTICS: LEARNING RATE RESTARTS, WARMUP AND DISTILLATION
https://openreview.net/pdf?id=r14EOsCqKX

5. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
https://arxiv.org/abs/1610.02132

6. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
https://arxiv.org/abs/1407.0202

7. SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS
https://arxiv.org/pdf/1608.03983.pdf

8. Scaling SGD Batch Size to 32K for ImageNet Training
https://people.eecs.berkeley.edu/~youyang/publications/batch32k.pdf

9. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
https://arxiv.org/pdf/1706.02677.pdf

10. Don’t Decay the Learning Rate, Increase the Batch Size
https://arxiv.org/abs/1711.00489

 

[References]

  1. Kingma, D. P. & Ba, J. L. (2015) ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION, conference paper at ICLR
  2. Johnson, R. & Zhang, T. (2013) Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NeurIPS
  3. Gower, R.M. et al (2019) SGD: General Analysis and Improved Rates, ICML PMLR 97:5200-5209
  4. Gotmare, A. et al (2019) A CLOSER LOOK AT DEEP LEARNING HEURISTICS:LEARNING RATE RESTARTS, WARMUP AND DISTILLATION, ICLR
  5. Alistarh, D. et al (2017) QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, NeurIPS
  6. Defazio, A. et al (2014) SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives, NeurIPS
  7. Loshchilov, I. & Hutter, F. (2017) SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS, ICLR
  8. You, Y. et al (2017) Scaling SGD Batch Size to 32K for ImageNet Training, Technical Report No. UCB/EECS-2017-156
  9. Goyal, P. (2017) Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour, ArXiv
  10. Smith, S. L. (2017) Don’t Decay the Learning Rate, Increase the Batch Size, ArXiv

Leave a Comment