A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

Yu, Teng-Teng; Liu, Xin-Wei; Dai, Yu-Hong; Sun, Jie

doi:10.1007/s40305-022-00436-2

A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

Published: 21 October 2022

Volume 11, pages 277–307, (2023)
Cite this article

Journal of the Operations Research Society of China Aims and scope Submit manuscript

Teng-Teng Yu^1,3,
Xin-Wei Liu²,
Yu-Hong Dai³ &
…
Jie Sun^2,4

623 Accesses
Explore all metrics

Abstract

Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term. Proximal stochastic gradient methods are popular for solving such composite optimization problems. We propose a mini-batch proximal stochastic recursive gradient algorithm SRG-DBB, which incorporates the diagonal Barzilai–Borwein (DBB) stepsize strategy to capture the local geometry of the problem. The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions. We further establish the linear convergence of SRG-DBB under the non-strong convexity condition. Moreover, it is proved that SRG-DBB converges sublinearly in the convex case. Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes. Furthermore, SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Article 01 July 2022

Stochastic Variance Reduced Gradient Methods Using a Trust-Region-Like Scheme

Article 17 February 2021

A new inexact stochastic recursive gradient descent algorithm with Barzilai–Borwein step size in machine learning

Article 23 October 2022

Notes

https://www.csie.ntu.edu.tw/~cjlin/libsvm

References

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From theory to algorithms. Cambridge University Press, NY, USA (2014)
Book MATH Google Scholar
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge, London, England (2012)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, NY, USA (2009)
Book MATH Google Scholar
Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5(2), 201–226 (2013)
Article MathSciNet MATH Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge, London, England (2016)
MATH Google Scholar
Li, X.L.: Preconditioned stochastic gradient descent. IEEE T. Neur. Net. Lear. 29(5), 1454–1466 (2017)
Article MathSciNet Google Scholar
Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems, pp. 685–693 (2015)
Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE T. Neur. Net. Lear. 30(5), 1360–1369 (2018)
Article MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MathSciNet MATH Google Scholar
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Article MathSciNet MATH Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th international conference on machine, pp. 2613–2621 (2017)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Article MathSciNet MATH Google Scholar
Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Front. Appl. Math. Stat. 3(9), 1–14 (2017)
MATH Google Scholar
Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE JSTSP 10(2), 242–255 (2015)
Google Scholar
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(110), 1–48 (2020)
MathSciNet MATH Google Scholar
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Huang, Y., Liu, X.W.: A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74(1), 43–65 (2019)
Article MathSciNet MATH Google Scholar
Bai, J., Hager, W.W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81(1), 479–518 (2022)
Article MathSciNet MATH Google Scholar
Fletcher, R.: On the Barzilai-Borwein method. In: Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, vol. 96, pp. 235–256. Springer, Boston, USA (2005)
Chapter Google Scholar
Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 685–693 (2016)
Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. 14, 2265–2283 (2020)
Article MathSciNet MATH Google Scholar
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87, 5 (2021)
Article MathSciNet MATH Google Scholar
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes. IEEE T. Neur. Net. Lear. 32(10), (2021)
Park, Y., Dhar, S., Boyd, S., Shah, M.: Variable metric proximal gradient method with diagonal Barzilai–Borwein stepsize. In: 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3597–3601 (2020)
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. (2021). https://doi.org/10.3934/jimo.2021084
Article MATH Google Scholar
Wang, X., Wang, S., Zhang, H.: Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579–618 (2017)
Article MathSciNet MATH Google Scholar
Wang, X., Wang, X., Yuan, Y.X.: Stochastic proximal quasi-newton methods for non-convex composite optimization. Optim. Method Softw. 34(5), 922–948 (2019)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Programming. Springer, Boston, MA, USA (1998)
Google Scholar
Beck, A.: First-order Methods in Optimization. SIAM, Philadelphia, PA, USA (2017)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp. 795–811 (2016)
Gong, P., Ye, J.: Linear convergence of variance-reduced stochastic gradient without strong convexity. arXiv:1406.1102 (2014). Accessed 4 June 2014
Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2017)
Article MathSciNet MATH Google Scholar
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1–2), 365–397 (2012)
Article MathSciNet MATH Google Scholar
Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Hebei University of Technology, Tianjin, 300401, China
Teng-Teng Yu
Institute of Mathematics, Hebei University of Technology, Tianjin, 300401, China
Xin-Wei Liu & Jie Sun
LSEC, ICMSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Teng-Teng Yu & Yu-Hong Dai
School of Business, National University of Singapore, Singapore, 119245, Singapore
Jie Sun

Authors

Teng-Teng Yu
View author publications
You can also search for this author inPubMed Google Scholar
Xin-Wei Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yu-Hong Dai
View author publications
You can also search for this author inPubMed Google Scholar
Jie Sun
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

T.-T. Yu: methodology, convergence analysis, numerical experiments, writing; X.-W. Liu: methodology, analysis, writing, funding acquisition; Y.-H. Dai: methodology, writing, funding acquisition; J. Sun: analysis, writing.

Corresponding author

Correspondence to Xin-Wei Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China ( Nos. 11671116, 11701137, 12071108, 11991020, 11991021 and 12021001), the Major Research Plan of the NSFC (No. 91630202), the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA27000000), and the Natural Science Foundation of Hebei Province (No. A2021202010).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yu, TT., Liu, XW., Dai, YH. et al. A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize. J. Oper. Res. Soc. China 11, 277–307 (2023). https://doi.org/10.1007/s40305-022-00436-2

Download citation

Received: 13 November 2021
Revised: 06 May 2022
Accepted: 31 August 2022
Published: 21 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40305-022-00436-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Stochastic Variance Reduced Gradient Methods Using a Trust-Region-Like Scheme

A new inexact stochastic recursive gradient descent algorithm with Barzilai–Borwein step size in machine learning

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now