Abstract
In this article, we study approximation properties of the variation spaces corresponding to shallow neural networks with a variety of activation functions. We introduce two main tools for estimating the metric entropy, approximation rates, and n-widths of these spaces. First, we introduce the notion of a smoothly parameterized dictionary and give upper bounds on the nonlinear approximation rates, metric entropy, and n-widths of their absolute convex hull. The upper bounds depend upon the order of smoothness of the parameterization. This result is applied to dictionaries of ridge functions corresponding to shallow neural networks, and they improve upon existing results in many cases. Next, we provide a method for lower bounding the metric entropy and n-widths of variation spaces which contain certain classes of ridge functions. This result gives sharp lower bounds on the \(L^2\)-approximation rates, metric entropy, and n-widths for variation spaces corresponding to neural networks with a range of important activation functions, including ReLU\(^k\) activation functions and sigmoidal activation functions with bounded variation.
Similar content being viewed by others
References
Fernando Albiac and Nigel John Kalton, Topics in banach space theory, vol. 233, Springer, 2006.
Francis Bach, Breaking the curse of dimensionality with convex neural networks, The Journal of Machine Learning Research 18 (2017), no. 1, 629–681.
Keith Ball and Alain Pajor, The entropy of convex bodies with “few” extreme points, Proceedings of the 1989 Conference in Banach Spaces at Strob. Austria. Cambridge Univ. Press, 1990.
Andrew R Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information theory 39 (1993), no. 3, 930–945.
Andrew R Barron, Albert Cohen, Wolfgang Dahmen, and Ronald A DeVore, Approximation and learning by greedy algorithms, 36 (2008), no. 1, 64–94.
James H Bramble and SR Hilbert, Estimation of linear functionals on sobolev spaces with application to fourier transforms and spline interpolation, SIAM Journal on Numerical Analysis 7 (1970), no. 1, 112–124.
Bernd Carl, Entropy numbers, s-numbers, and eigenvalue problems, Journal of Functional Analysis 41 (1981), no. 3, 290–306.
Bernd Carl,Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in banach spaces, Annales de l’institut Fourier, vol. 35, 1985, pp. 79–118.
Bernd Carl, Metric entropy of convex hulls in hilbert spaces, Bulletin of the London Mathematical Society 29 (1997), no. 4, 452–458.
Bernd Carl, Aicke Hinrichs, and Alain Pajor, Gelfand numbers and metric entropy of convex hulls in hilbert spaces, Positivity 17 (2013), no. 1, 171–203.
Bernd Carl, Aicke Hinrichs, and Philipp Rudolph, Entropy numbers of convex hulls in banach spaces and applications, Journal of Complexity 30 (2014), no. 5, 555–587.
Bernd Carl, Ioanna Kyrezi, and Alain Pajor, Metric entropy of convex hulls in banach spaces, Journal of the London Mathematical Society 60 (1999), no. 3, 871–896.
Bernd Carl and Alain Pajor, Gelfand numbers of operators with values in a hilbert space, Inventiones mathematicae 94 (1988), no. 3, 479–504.
Kwok Chiu Chung and Te Hai Yao, On lattices admitting unique lagrange interpolations, SIAM Journal on Numerical Analysis 14 (1977), no. 4, 735–743.
Philippe G Ciarlet and Pierre-Arnaud Raviart, General lagrange and hermite interpolation in rn with applications to finite element methods, Archive for Rational Mechanics and Analysis 46 (1972), no. 3, 177–199.
Albert Cohen, Ronald Devore, Guergana Petrova, and Przemyslaw Wojtaszczyk, Optimal stable nonlinear approximation, Foundations of Computational Mathematics (2021), 1–42.
Ronald A DeVore, Nonlinear approximation, Acta numerica 7 (1998), 51–150.
Ronald A DeVore, Ralph Howard, and Charles Micchelli, Optimal nonlinear approximation, Manuscripta mathematica 63 (1989), no. 4, 469–478.
F Faà Di Bruno,Note sur une nouvelle formule de calcul différentiel, Quarterly J. Pure Appl. Math 1 (1857), no. 359-360, .
David L Donoho, Compressed sensing, IEEE Transactions on information theory 52 (2006), no. 4, 1289–1306.
Richard M Dudley, The sizes of compact subsets of hilbert space and continuity of gaussian processes, Journal of Functional Analysis 1 (1967), no. 3, 290–330.
RM Dudley, Universal donsker classes and metric entropy, Ann. Probab. 15 (1987), no. 4, 1306–1326.
W. E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).
W. E and Stephan Wojtowytsch, Representation formulas and pointwise properties for barron functions., CoRR (2020).
Weinan E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039 (2019).
David E Edmunds and Jan Lang, Gelfand numbers and widths, Journal of Approximation Theory 166 (2013), 78–84.
Uffe Haagerup, The best constants in the khintchine inequality, Studia Mathematica 70 (1981), 231–283.
Lee K Jones, A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuit regression and neural network training, The annals of Statistics 20 (1992), no. 1, 608–613.
Jason M Klusowski and Andrew R Barron, Approximation by combinations of relu and squared relu ridge functions with \(\ell ^1\) and \(\ell ^0\) controls, IEEE Transactions on Information Theory 64 (2018), no. 12, 7649–7656.
Andrei Nikolaevich Kolmogorov, On linear dimensionality of topological vector spaces, Doklady Akademii Nauk, vol. 120, Russian Academy of Sciences, 1958, pp. 239–241.
Vera Kurková and Marcello Sanguineti, Bounds on rates of variable-basis and neural-network approximation, IEEE Transactions on Information Theory 47 (2001), no. 6, 2659–2665.
Vera Kurková and Marcello Sanguineti,Comparison of worst case errors in linear and neural network approximation, IEEE Transactions on Information Theory 48 (2002), no. 1, 264–275.
Jan Lang and David Edmunds, Eigenvalues, embeddings and generalised trigonometric functions, vol. 2016, Springer Science & Business Media, 2011.
Michel Ledoux and Michel Talagrand, Probability in banach spaces: isoperimetry and processes, Springer Science & Business Media, 2013.
Wee Sun Lee, Peter L Bartlett, and Robert C Williamson, Efficient agnostic learning of neural networks with bounded fan-in, IEEE Transactions on Information Theory 42 (1996), no. 6, 2118–2132.
Jihao Long and Lei Wu, Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay, arXiv preprint arXiv:2108.04964 (2021).
George G Lorentz, Manfred v Golitschek, and Yuly Makovoz, Constructive approximation: advanced problems, vol. 304, Springer, 1996.
Y Makovoz, Uniform approximation by neural networks, Journal of Approximation Theory 95 (1998), no. 2, 215–228.
Yuly Makovoz, Random approximants and neural networks, Journal of Approximation Theory 85 (1996), no. 1, 98–109.
Jiří Matoušek, Tight upper bounds for the discrepancy of half-spaces, Discrete & Computational Geometry 13 (1995), no. 3, 593–601.
Jiří Matoušek,Improved upper bounds for approximation by zonotopes, Acta Mathematica 177 (1996), no. 1, 55–73.
Jiri Matousek, Geometric discrepancy: An illustrated guide, vol. 18, Springer Science & Business Media, 1999.
RA Nicolaides, On a class of finite elements generated by lagrange interpolation, SIAM Journal on Numerical Analysis 9 (1972), no. 3, 435–445.
Greg Ongie, Rebecca Willett, Daniel Soudry, and Nathan Srebro, A function space view of bounded norm infinite width relu nets: The multivariate case, International Conference on Learning Representations (ICLR 2020), 2019.
Rahul Parhi and Robert D Nowak, Banach space representer theorems for neural networks and ridge splines, arXiv preprint arXiv:2006.05626 (2020).
Rahul Parhi and Robert D Nowak, What kinds of functions do deep neural networks learn? insights from variational spline theory, arXiv preprint arXiv:2105.03361 (2021).
A. Pietsch, Ideals (Algebra), and North-Holland Publishing Company, Operator ideals, Mathematical Studies, North-Holland Publishing Company, 1980.
Albrecht Pietsch, s-numbers of operators in banach spaces, Studia Mathematica 51 (1974), 201–223.
Albrecht Pietsch,Operator ideals, vol. 16, Deutscher Verl d Wiss, 1978.
Albrecht Pietsch,History of banach spaces and linear operators, Springer Science & Business Media, 2007.
Allan Pinkus, N-widths in approximation theory, vol. 7, Springer Science & Business Media, 2012.
Gilles Pisier, Remarques sur un résultat non publié de b. maurey, Séminaire Analyse fonctionnelle (dit “Maurey-Schwartz") (1981), 1–12.
Gilles Pisier,The volume of convex bodies and banach space geometry, vol. 94, Cambridge University Press, 1999.
R Tyrrell Rockafellar, Convex analysis, no. 28, Princeton university press, 1970.
Jonathan W Siegel and Jinchao Xu, Approximation rates for neural networks with general activation functions, Neural Networks 128 (2020), 313–321.
Jonathan W Siegel and Jinchao Xu,Characterization of the variation spaces corresponding to shallow neural networks, arXiv preprint arXiv:2106.15002 (2021).
Jonathan W Siegel and Jinchao Xu,Improved convergence rates for the orthogonal greedy algorithm, arXiv preprint arXiv:2106.15000 (2021).
Jonathan W Siegel and Jinchao Xu,High-order approximation rates for shallow neural networks with cosine and reluk activation functions, Applied and Computational Harmonic Analysis 58 (2022), 1–26.
Hans Triebel, Interpolation theory, function spaces, Differential Operators (1995).
Jinchao Xu, Error estimates of the finite element method for the 2nd order elliptic equations with discontinuous coefficients, J. Xiangtan University 1 (1982), 1–5.
Jinchao Xu, Estimate of the convergence rate of finite element solutions to elliptic equations of second order with discontinuous coefficients, arXiv preprint arXiv:1311.4178 (2013).
Jinchao Xu, Finite neuron method and convergence analysis, Communications in Computational Physics 28 (2020), no. 5, 1707–1745.
Yuhong Yang and Andrew Barron, Information-theoretic determination of minimax rates of convergence, Annals of Statistics (1999), 1564–1599.
Acknowledgements
We would like to thank Professors Russel Caflisch, Ronald DeVore, Weinan E, Albert Cohen, Stephan Wojtowytsch, Jason Klusowski, and Lei Wu for helpful discussions. This work was supported by the Verne M. Willaman Chair Fund at the Pennsylvania State University and the National Science Foundation (Grant No. DMS-1819157 and DMS-2111387).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Albert Cohen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Siegel, J.W., Xu, J. Sharp Bounds on the Approximation Rates, Metric Entropy, and n-Widths of Shallow Neural Networks. Found Comput Math 24, 481–537 (2024). https://doi.org/10.1007/s10208-022-09595-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-022-09595-3