Skip to main content

Advertisement

Log in

A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

  • Published:
Journal of the Operations Research Society of China Aims and scope Submit manuscript

Abstract

Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term. Proximal stochastic gradient methods are popular for solving such composite optimization problems. We propose a mini-batch proximal stochastic recursive gradient algorithm SRG-DBB, which incorporates the diagonal Barzilai–Borwein (DBB) stepsize strategy to capture the local geometry of the problem. The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions. We further establish the linear convergence of SRG-DBB under the non-strong convexity condition. Moreover, it is proved that SRG-DBB converges sublinearly in the convex case. Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes. Furthermore, SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/~cjlin/libsvm

References

  1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  2. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From theory to algorithms. Cambridge University Press, NY, USA (2014)

    Book  MATH  Google Scholar 

  3. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge, London, England (2012)

    Google Scholar 

  4. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, NY, USA (2009)

    Book  MATH  Google Scholar 

  5. Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5(2), 201–226 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge, London, England (2016)

    MATH  Google Scholar 

  7. Li, X.L.: Preconditioned stochastic gradient descent. IEEE T. Neur. Net. Lear. 29(5), 1454–1466 (2017)

    Article  MathSciNet  Google Scholar 

  8. Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems, pp. 685–693 (2015)

  9. Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE T. Neur. Net. Lear. 30(5), 1360–1369 (2018)

    Article  MathSciNet  Google Scholar 

  10. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  11. Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)

  12. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)

  14. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th international conference on machine, pp. 2613–2621 (2017)

  15. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Front. Appl. Math. Stat. 3(9), 1–14 (2017)

    MATH  Google Scholar 

  17. Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE JSTSP 10(2), 242–255 (2015)

    Google Scholar 

  18. Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(110), 1–48 (2020)

    MathSciNet  MATH  Google Scholar 

  19. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  20. Dai, Y.H., Huang, Y., Liu, X.W.: A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74(1), 43–65 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  21. Bai, J., Hager, W.W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81(1), 479–518 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fletcher, R.: On the Barzilai-Borwein method. In: Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, vol. 96, pp. 235–256. Springer, Boston, USA (2005)

    Chapter  Google Scholar 

  23. Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 685–693 (2016)

  24. Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. 14, 2265–2283 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87, 5 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  26. Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes. IEEE T. Neur. Net. Lear. 32(10), (2021)

  27. Park, Y., Dhar, S., Boyd, S., Shah, M.: Variable metric proximal gradient method with diagonal Barzilai–Borwein stepsize. In: 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3597–3601 (2020)

  28. Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. (2021). https://doi.org/10.3934/jimo.2021084

    Article  MATH  Google Scholar 

  29. Wang, X., Wang, S., Zhang, H.: Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579–618 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang, X., Wang, X., Yuan, Y.X.: Stochastic proximal quasi-newton methods for non-convex composite optimization. Optim. Method Softw. 34(5), 922–948 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  31. Nesterov, Y.: Introductory Lectures on Convex Programming. Springer, Boston, MA, USA (1998)

    Google Scholar 

  32. Beck, A.: First-order Methods in Optimization. SIAM, Philadelphia, PA, USA (2017)

    Book  MATH  Google Scholar 

  33. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  34. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp. 795–811 (2016)

  35. Gong, P., Ye, J.: Linear convergence of variance-reduced stochastic gradient without strong convexity. arXiv:1406.1102 (2014). Accessed 4 June 2014

  36. Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  37. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1–2), 365–397 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  38. Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)

Download references

Author information

Authors and Affiliations

Authors

Contributions

T.-T. Yu: methodology, convergence analysis, numerical experiments, writing; X.-W. Liu: methodology, analysis, writing, funding acquisition; Y.-H. Dai: methodology, writing, funding acquisition; J. Sun: analysis, writing.

Corresponding author

Correspondence to Xin-Wei Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China ( Nos. 11671116, 11701137, 12071108, 11991020, 11991021 and 12021001), the Major Research Plan of the NSFC (No. 91630202), the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA27000000), and the Natural Science Foundation of Hebei Province (No. A2021202010).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, TT., Liu, XW., Dai, YH. et al. A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize. J. Oper. Res. Soc. China 11, 277–307 (2023). https://doi.org/10.1007/s40305-022-00436-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40305-022-00436-2

Keywords

Mathematics Subject Classification