AI505 Paper list for share

1. ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION
https://arxiv.org/pdf/1412.6980.pdf

3. SGD: General Analysis and Improved Rates
https://arxiv.org/pdf/1901.09401.pdf

4. A CLOSER LOOK AT DEEP LEARNING HEURISTICS: LEARNING RATE RESTARTS, WARMUP AND DISTILLATION
https://openreview.net/pdf?id=r14EOsCqKX

5. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
https://arxiv.org/abs/1610.02132

6. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
https://arxiv.org/abs/1407.0202

7. SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS
https://arxiv.org/pdf/1608.03983.pdf

9. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
https://arxiv.org/pdf/1706.02677.pdf

10. Don’t Decay the Learning Rate, Increase the Batch Size
https://arxiv.org/abs/1711.00489

[References]

Kingma, D. P. & Ba, J. L. (2015) ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION, conference paper at ICLR
Johnson, R. & Zhang, T. (2013) Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NeurIPS
Gower, R.M. et al (2019) SGD: General Analysis and Improved Rates, ICML PMLR 97:5200-5209
Gotmare, A. et al (2019) A CLOSER LOOK AT DEEP LEARNING HEURISTICS:LEARNING RATE RESTARTS, WARMUP AND DISTILLATION, ICLR
Alistarh, D. et al (2017) QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding, NeurIPS
Defazio, A. et al (2014) SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives, NeurIPS
Loshchilov, I. & Hutter, F. (2017) SGDR: STOCHASTIC GRADIENT DESCENT WITH WARM RESTARTS, ICLR
You, Y. et al (2017) Scaling SGD Batch Size to 32K for ImageNet Training, Technical Report No. UCB/EECS-2017-156
Goyal, P. (2017) Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour, ArXiv
Smith, S. L. (2017) Don’t Decay the Learning Rate, Increase the Batch Size, ArXiv