Aronszajn, Nachman. 1950. “Theory of Reproducing Kernels.”
Transactions of the American Mathematical Society 68 (3):
337–404.
Arora, Raman, Amitabh Basu, Poorya Mianjy, and Anirbit Mukherjee. 2018.
“Understanding Deep Neural Networks with Rectified Linear
Units.” In International Conference on Learning
Representations (ICLR).
Arora, Sanjeev, Nadav Cohen, Wei Hu, and Yuping Luo. 2019.
“Implicit Regularization in Deep Matrix Factorization.”
Advances in Neural Information Processing Systems (NeurIPS) 32.
Bach, Francis. 2017. “Breaking the Curse of Dimensionality with
Convex Neural Networks.” Journal of Machine Learning
Research 18 (1): 629–81.
Barron, Andrew R. 1993. “Universal Approximation Bounds for
Superpositions of a Sigmoidal Function.” IEEE Transactions on
Information Theory 39 (3): 930–45.
Bartolucci, Francesca, Ernesto De Vito, Lorenzo Rosasco, and Stefano
Vigogna. 2023. “Understanding Neural Networks with Reproducing
Kernel Banach Spaces.” Applied and Computational
Harmonic Analysis 62: 194–236.
Bartolucci, Francesca, Ernesto De Vito, Lorenzo Rosasco, and Stefano
Vigogna. 2024. “Neural Reproducing Kernel Banach Spaces
and Representer Theorems for Deep Networks.” arXiv Preprint
arXiv:2403.08750.
Bietti, Alberto, Joan Bruna, Clayton Sanford, and Min Jae Song. 2022.
“Learning Single-Index Models with Shallow Neural
Networks.” In Advances in Neural Information Processing
Systems.
Boyer, Claire, Antonin Chambolle, Yohann De Castro, Vincent Duval,
Frédéric De Gournay, and Pierre Weiss. 2019. “On Representer
Theorems and Convex Regularization.” SIAM Journal on
Optimization 29 (2): 1260–81.
Bredies, Kristian, and Marcello Carioni. 2020. “Sparsity of
Solutions for Variational Inverse Problems with Finite-Dimensional
Data.” Calculus of Variations and Partial Differential
Equations 59 (1): 1–26.
Burer, Samuel, and Renato DC Monteiro. 2003. “A Nonlinear
Programming Algorithm for Solving Semidefinite Programs via Low-Rank
Factorization.” Mathematical Programming 95 (2): 329–57.
Candès, Emmanuel J., Justin Romberg, and Terence Tao. 2006.
“Robust Uncertainty Principles: Exact Signal Reconstruction from
Highly Incomplete Frequency Information.” IEEE Transactions
on Information Theory 52 (2): 489–509.
Carl, Bernd. 1981. “Entropy Numbers, s-Numbers, and Eigenvalue
Problems.” Journal of Functional Analysis 41 (3):
290–306.
Chandrasekaran, Venkat, Benjamin Recht, Pablo A. Parrilo, and Alan S.
Willsky. 2012. “The Convex Geometry of Linear Inverse
Problems.” Foundations of Computational Mathematics 12
(6): 805–49.
Chen, Zhengdao. 2024. “Neural Hilbert Ladders:
Multi-Layer Neural Networks in Function Space.” Journal of
Machine Learning Research 25 (109): 1–65.
Dai, Zhen, Mina Karzand, and Nathan Srebro. 2021. “Representation
Costs of Linear Neural Networks: Analysis and Design.”
Advances in Neural Information Processing Systems 34: 26884–96.
Damian, Alexandru, Jason Lee, and Mahdi Soltanolkotabi. 2022.
“Neural Networks Can Learn Representations with Gradient
Descent.” In Conference on Learning Theory, 5413–52.
PMLR.
DeVore, Ronald A. 1998. “Nonlinear Approximation.” Acta
Numerica 7: 51–150.
DeVore, Ronald, Robert D. Nowak, Rahul Parhi, and Jonathan W. Siegel.
2025. “Weighted Variation Spaces and Approximation by Shallow
ReLU Networks.” Applied and Computational
Harmonic Analysis 74 (101713).
Donoho, David L. 2000. “High-Dimensional Data Analysis: The Curses
and Blessings of Dimensionality.” AMS Math Challenges
Lecture 1 (2000): 32.
Donoho, David L.. 2006. “Compressed Sensing.” IEEE Transactions on
Information Theory 52 (4): 1289–1306.
Donoho, David L., and Iain M. Johnstone. 1994. “Ideal Spatial
Adaptation by Wavelet Shrinkage.” Biometrika 81 (3):
425–55.
Donoho, David L., and Iain M. Johnstone. 1995. “Adapting to Unknown Smoothness via Wavelet
Shrinkage.” Journal of the American Statistical
Association 90 (432): 1200–1224.
Donoho, David L., and Iain M. Johnstone. 1998. “Minimax Estimation via Wavelet Shrinkage.”
The Annals of Statistics 26 (3): 879–921.
Donoho, David L., Richard C. Liu, and Brenda MacGibbon. 1990.
“Minimax Risk over Hyperrectangles, and Implications.”
Annals of Statistics, 1416–37.
E, Weinan, Chao Ma, and Lei Wu. 2022. “The Barron
Space and the Flow-Induced Function Spaces for Neural Network
Models.” Constructive Approximation 55 (1): 369–406.
E, Weinan, and Stephan Wojtowytsch. 2020. “On the
Banach Spaces Associated with Multi-Layer ReLU
Networks: Function Representation, Approximation Theory and
Gradient Descent Dynamics.” CSIAM Transactions on Applied
Mathematics 1 (3): 387–440.
Ergen, Tolga, and Mert Pilanci. 2021. “Convex Geometry and Duality
of over-Parameterized Neural Networks.” Journal of Machine
Learning Research.
Fisher, Stephen D., and Joseph W. Jerome. 1975. “Spline Solutions
to L1
Extremal Problems in One and Several Variables.” Journal of
Approximation Theory 13 (1): 73–83.
Geer, Sara van de. 2000. Empirical Processes in
M-Estimation. Vol. 6. Cambridge University Press.
Golubeva, Anna, Guy Gur-Ari, and Behnam Neyshabur. 2021. “Are
Wider Nets Better Given the Same Number of Parameters?” In
International Conference on Learning Representations.
Grandvalet, Yves. 1998. “Least Absolute Shrinkage Is Equivalent to
Quadratic Penalization.” In International Conference on
Artificial Neural Networks, 201–6. Springer.
Gunasekar, Suriya, Jason D Lee, Daniel Soudry, and Nati Srebro. 2018.
“Implicit Bias of Gradient Descent on Linear Convolutional
Networks.” Advances in Neural Information Processing
Systems 31.
Heeringa, Tjeerd Jan, Len Spek, and Christoph Brune. 2025. “Deep
Networks Are Reproducing Kernel Chains.” arXiv Preprint
arXiv:2501.03697.
Jacot, Arthur. 2023a. “Implicit Bias of Large Depth Networks: A
Notion of Rank for Nonlinear Functions.” In International
Conference on Learning Representations (ICLR).
Jacot, Arthur. 2023b. “Bottleneck Structure in Learned Features:
Low-Dimension Vs Regularity Tradeoff.” Advances in Neural
Information Processing Systems 36 (December): 23607–29.
Jacot, Arthur, Franck Gabriel, and Clement Hongler. 2018. “Neural
Tangent Kernel: Convergence and Generalization in Neural
Networks.” In Advances in Neural Information Processing
Systems. Vol. 31.
Jacot, Arthur, Eugene Golikov, Clément Hongler, and Franck Gabriel.
2022. “Feature Learning in L2-Regularized
DNNs: Attraction/Repulsion and Sparsity.”
Advances in Neural Information Processing Systems 35: 6763–74.
Jacot, Arthur, and Alexandre Kaiser. 2025. “Hamiltonian Mechanics
of Feature Learning: Bottleneck Structure in Leaky
ResNets.” Conference on Parsimony
and Learning (CPAL).
Jacot, Arthur, Peter Súkenı́k, Zihan Wang, and Marco Mondelli. 2024.
“Wide Neural Networks Trained with Weight Decay Provably Exhibit
Neural Collapse.” arXiv Preprint arXiv:2410.04887.
Klusowski, Jason M, and Andrew R Barron. 2018. “Approximation by
Combinations of ReLU and Squared ReLU Ridge
Functions with ℓ1 and ℓ0 Controls.”
IEEE Transactions on Information Theory 64 (12): 7649–56.
Kůrková, Věra, and Marcello Sanguineti. 2001. “Bounds on Rates of
Variable-Basis and Neural-Network Approximation.” IEEE
Transactions on Information Theory 47 (6): 2659–65.
Li, Ker-Chau. 1991. “Sliced Inverse Regression for Dimension
Reduction.” Journal of the American Statistical
Association 86 (414): 316–27.
Lin, Rong Rong, Hai Zhang Zhang, and Jun Zhang. 2022. “On
Reproducing Kernel Banach Spaces: Generic
Definitions and Unified Framework of Constructions.” Acta
Mathematica Sinica, English Series 38 (8): 1459–83.
Mammen, Enno, and Sara van de Geer. 1997. “Locally Adaptive
Regression Splines.” Annals of Statistics 25 (1):
387–413.
Matoušek, Jiří. 1996. “Improved Upper Bounds for Approximation by
Zonotopes.” Acta Mathematica 177 (1): 55–73.
McCarty, Sarah. 2023. “Piecewise Linear Functions Representable
with Infinite Width Shallow ReLU Neural Networks.”
Proceedings of the American Mathematical Society, Series B 10
(27): 296–310.
Meyer, Yves. 1992. Wavelets and Operators. 37. Cambridge
University Oress.
Mhaskar, Hrushikesh N. 2004. “On the Tractability of Multivariate
Integration and Approximation by Neural Networks.” Journal of
Complexity 20 (4): 561–90.
Mousavi-Hosseini, Alireza, Sejun Park, Manuela Girotti, Ioannis
Mitliagkas, and Murat A Erdogdu. 2022. “Neural Networks
Efficiently Learn Low-Dimensional Representations with
SGD.” In The Eleventh International Conference
on Learning Representations.
Nichani, Eshaan, Alex Damian, and Jason D Lee. 2023. “Provable
Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks.” Advances in Neural Information Processing
Systems 36: 10828–75.
Ongie, Greg, Rebecca Willett, Daniel Soudry, and Nathan Srebro. 2020.
“A Function Space View of Bounded Norm Infinite Width
ReLU Nets: The Multivariate Case.” In
International Conference on Learning Representations.
Parhi, Rahul, and Robert D. Nowak. 2021. “Banach Space Representer
Theorems for Neural Networks and Ridge Splines.” Journal of
Machine Learning Research 22 (43): 1–40.
Parhi, Rahul, and Robert D. Nowak. 2022. “What Kinds of Functions Do Deep Neural Networks Learn?
Insights from Variational Spline Theory.” SIAM
Journal on Mathematics of Data Science 4 (2): 464–89.
Parhi, Rahul, and Robert D. Nowak. 2023. “Near-Minimax Optimal Estimation with Shallow
ReLU Neural Networks.” IEEE Transactions on
Information Theory 69 (2): 1125–40.
Parkinson, Suzanna, Greg Ongie, and Rebecca Willett. 2023.
“ReLU Neural Networks with Linear Layers Are Biased
Towards Single-and Multi-Index Models.” arXiv Preprint
arXiv:2305.15598.
Petrushev, Pencho P. 1988. “Direct and Converse Theorems for
Spline and Rational Approximation and Besov Spaces.”
In Function Spaces and Applications: Proceedings of the US-Swedish
Seminar Held in Lund, Sweden, June 15–21, 1986, 363–77. Springer.
Radhakrishnan, Adityanarayanan, Daniel Beaglehole, Parthe Pandit, and
Mikhail Belkin. 2024. “Mechanism for Feature Learning in Neural
Networks and Backpropagation-Free Machine Learning Models.”
Science 383 (6690): 1461–67.
Razin, Noam, and Nadav Cohen. 2020. “Implicit Regularization in
Deep Learning May Not Be Explainable by Norms.” Advances in
Neural Information Processing Systems 33: 21174–87.
Razin, Noam, Asaf Maman, and Nadav Cohen. 2021. “Implicit
Regularization in Tensor Factorization.” In International
Conference on Machine Learning (ICML), 8913–24.
Razin, Noam, Asaf Maman, and Nadav Cohen. 2022. “Implicit Regularization in Hierarchical Tensor
Factorization and Deep Convolutional Neural Networks.” In
International Conference on Machine Learning, 18422–62. PMLR.
Razin, Noam, Tom Verbin, and Nadav Cohen. 2023. “On the Ability of
Graph Neural Networks to Model Interactions Between Vertices.”
Advances in Neural Information Processing Systems 36: 26501–45.
Sahiner, Arda, Tolga Ergen, John M. Pauly, and Mert Pilanci. 2021.
“Vector-Output ReLU Neural Network Problems Are
Copositive Programs: Convex Analysis of Two Layer Networks and
Polynomial-Time Algorithms.” In International Conference on
Learning Representations.
Savarese, Pedro, Itay Evron, Daniel Soudry, and Nathan Srebro. 2019.
“How Do Infinite Width Bounded Norm Networks Look in Function
Space?” In Conference on Learning Theory (COLT),
2667–90. PMLR.
Schmidt-Hieber, Johannes. 2020. “Nonparametric Regression Using
Deep Neural Networks with ReLU Activation Function.”
Annals of Statistics 48 (4): 1875–97.
Schölkopf, Bernhard, and Alexander J. Smola. 2002. Learning with
Kernels: Support Vector Machines, Regularization, Optimization, and
Beyond. Adaptive Computation and Machine Learning. MIT Press.
Shenouda, Joseph, Rahul Parhi, Kangwook Lee, and Robert D Nowak. 2024.
“Variation Spaces for Multi-Output Neural Networks: Insights on
Multi-Task Learning and Network Compression.” Journal of
Machine Learning Research 25 (231): 1–40.
Siegel, Jonathan W. 2023. “Optimal Approximation of Zonoids and
Uniform Approximation by Shallow Neural Networks.” arXiv
Preprint arXiv:2307.15285.
Siegel, Jonathan W, and Jinchao Xu. 2020. “Approximation Rates for
Neural Networks with General Activation Functions.” Neural
Networks 128: 313–21.
Siegel, Jonathan W, and Jinchao Xu. 2023. “Characterization of the Variation Spaces Corresponding
to Shallow Neural Networks.” Constructive Approximation
57 (3): 1109–32.
Srebro, Nathan, Jason Rennie, and Tommi Jaakkola. 2004.
“Maximum-Margin Matrix Factorization.” Advances in
Neural Information Processing Systems 17.
Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via
the Lasso.” Journal of the Royal Statistical Society Series
B: Statistical Methodology 58 (1): 267–88.
Timor, Nadav, Gal Vardi, and Ohad Shamir. 2023. “Implicit
Regularization Towards Rank Minimization in ReLU
Networks.” In International Conference on Algorithmic
Learning Theory, 1429–59. PMLR.
Unser, Michael. 2021. “A Unifying Representer Theorem for Inverse
Problems and Machine Learning.” Foundations of Computational
Mathematics 21 (4): 941–60.
Unser, Michael. 2023. “Ridges, Neural Networks, and the Radon
Transform.” Journal of Machine Learning Research 24
(37): 1–33.
Varshney, Prateek, and Mert Pilanci. 2024. “Convex Distillation:
Efficient Compression of Deep Networks via Convex Optimization.”
arXiv Preprint arXiv:2410.06567.
Wahba, Grace. 1990. Spline Models for Observational Data. Vol.
59. SIAM.
Wang, Yifei, Tolga Ergen, and Mert Pilanci. 2023. “Parallel Deep
Neural Networks Have Zero Duality Gap.” In International
Conference on Learning Representations.
Wen, Yuxiao, and Arthur Jacot. 2024. “Which Frequencies Do
CNNs Need? Emergent Bottleneck Structure in Feature
Learning.” In International Conference on Machine Learning
(ICML).
Wilson, Andrew Gordon, Zhiting Hu, Ruslan Salakhutdinov, and Eric P
Xing. 2016. “Deep Kernel Learning.” In Artificial
Intelligence and Statistics, 370–78. PMLR.
Yang, Liu, Jifan Zhang, Joseph Shenouda, Dimitris Papailiopoulos,
Kangwook Lee, and Robert D Nowak. 2022. “A Better Way to Decay:
Proximal Gradient Training Algorithms for Neural Nets.” In
OPT 2022: Optimization for Machine Learning (NeurIPS 2022
Workshop).
Zeno, Chen, Greg Ongie, Yaniv Blumenfeld, Nir Weinberger, and Daniel
Soudry. 2023. “How Do Minimum-Norm Shallow Denoisers Look in
Function Space?” Advances in Neural Information Processing
Systems 36: 57520–57.
Zhang, Haizhang, Yuesheng Xu, and Jun Zhang. 2009. “Reproducing
Kernel Banach Spaces for Machine Learning.”
Journal of Machine Learning Research 10 (12).
Zhang, Kaiqi, and Yu-Xiang Wang. 2023. “Deep Learning Meets
Nonparametric Regression: Are Weight-Decayed DNNs Locally
Adaptive?” In International Conference on Learning
Representations.