Abstract
Many machine learning problems can be formulated as minimizing the sum of a function and a non-smooth regularization term. Proximal stochastic gradient methods are popular for solving such composite optimization problems. We propose a mini-batch proximal stochastic recursive gradient algorithm SRG-DBB, which incorporates the diagonal Barzilai–Borwein (DBB) stepsize strategy to capture the local geometry of the problem. The linear convergence and complexity of SRG-DBB are analyzed for strongly convex functions. We further establish the linear convergence of SRG-DBB under the non-strong convexity condition. Moreover, it is proved that SRG-DBB converges sublinearly in the convex case. Numerical experiments on standard data sets indicate that the performance of SRG-DBB is better than or comparable to the proximal stochastic recursive gradient algorithm with best-tuned scalar stepsizes or BB stepsizes. Furthermore, SRG-DBB is superior to some advanced mini-batch proximal stochastic gradient methods.



Similar content being viewed by others
References
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From theory to algorithms. Cambridge University Press, NY, USA (2014)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge, London, England (2012)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, NY, USA (2009)
Recht, B., Ré, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Math. Prog. Comp. 5(2), 201–226 (2013)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press, Cambridge, London, England (2016)
Li, X.L.: Preconditioned stochastic gradient descent. IEEE T. Neur. Net. Lear. 29(5), 1454–1466 (2017)
Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems, pp. 685–693 (2015)
Jin, X.B., Zhang, X.Y., Huang, K., Geng, G.G.: Stochastic conjugate gradient algorithm with variance reduction. IEEE T. Neur. Net. Lear. 30(5), 1360–1369 (2018)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th international conference on machine, pp. 2613–2621 (2017)
Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)
Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. Front. Appl. Math. Stat. 3(9), 1–14 (2017)
Konečnỳ, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE JSTSP 10(2), 242–255 (2015)
Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: an efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. 21(110), 1–48 (2020)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Dai, Y.H., Huang, Y., Liu, X.W.: A family of spectral gradient methods for optimization. Comput. Optim. Appl. 74(1), 43–65 (2019)
Bai, J., Hager, W.W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81(1), 479–518 (2022)
Fletcher, R.: On the Barzilai-Borwein method. In: Qi, L., Teo, K., Yang, X. (eds.) Optimization and Control with Applications, vol. 96, pp. 235–256. Springer, Boston, USA (2005)
Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 685–693 (2016)
Liu, Y., Wang, X., Guo, T.: A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. 14, 2265–2283 (2020)
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87, 5 (2021)
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: A minibatch proximal stochastic recursive gradient algorithm using a trust-region-like scheme and Barzilai–Borwein stepsizes. IEEE T. Neur. Net. Lear. 32(10), (2021)
Park, Y., Dhar, S., Boyd, S., Shah, M.: Variable metric proximal gradient method with diagonal Barzilai–Borwein stepsize. In: 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3597–3601 (2020)
Yu, T., Liu, X.W., Dai, Y.H., Sun, J.: Variable metric proximal stochastic variance reduced gradient methods for nonconvex nonsmooth optimization. J. Ind. Manag. Optim. (2021). https://doi.org/10.3934/jimo.2021084
Wang, X., Wang, S., Zhang, H.: Inexact proximal stochastic gradient method for convex composite optimization. Comput. Optim. Appl. 68(3), 579–618 (2017)
Wang, X., Wang, X., Yuan, Y.X.: Stochastic proximal quasi-newton methods for non-convex composite optimization. Optim. Method Softw. 34(5), 922–948 (2019)
Nesterov, Y.: Introductory Lectures on Convex Programming. Springer, Boston, MA, USA (1998)
Beck, A.: First-order Methods in Optimization. SIAM, Philadelphia, PA, USA (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp. 795–811 (2016)
Gong, P., Ye, J.: Linear convergence of variance-reduced stochastic gradient without strong convexity. arXiv:1406.1102 (2014). Accessed 4 June 2014
Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2017)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1–2), 365–397 (2012)
Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)
Author information
Authors and Affiliations
Contributions
T.-T. Yu: methodology, convergence analysis, numerical experiments, writing; X.-W. Liu: methodology, analysis, writing, funding acquisition; Y.-H. Dai: methodology, writing, funding acquisition; J. Sun: analysis, writing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
This work was supported by the National Natural Science Foundation of China ( Nos. 11671116, 11701137, 12071108, 11991020, 11991021 and 12021001), the Major Research Plan of the NSFC (No. 91630202), the Strategic Priority Research Program of Chinese Academy of Sciences (No. XDA27000000), and the Natural Science Foundation of Hebei Province (No. A2021202010).
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, TT., Liu, XW., Dai, YH. et al. A Mini-Batch Proximal Stochastic Recursive Gradient Algorithm with Diagonal Barzilai–Borwein Stepsize. J. Oper. Res. Soc. China 11, 277–307 (2023). https://doi.org/10.1007/s40305-022-00436-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40305-022-00436-2
Keywords
- Stochastic recursive gradient
- Proximal gradient algorithm
- Barzilai–Borwein method
- Composite optimization