1. ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION https://arxiv.org/pdf/1412.6980.pdf 2. SVRG https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf 3. SGD: General Analysis and Improved Rates https://arxiv.org/pdf/1901.09401.pdf 4. A CLOSER LOOK AT DEEP LEARNING HEURISTICS: LEARNING RATE RESTARTS, WARMUP AND DISTILLATION https://openreview.net/pdf?id=r14EOsCqKX 5. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding https://arxiv.org/abs/1610.02132 6. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly …
Read moreAI505 Paper list for share