[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4134/JKMS.j200406

MARGIN-BASED GENERALIZATION FOR CLASSIFICATIONS WITH INPUT NOISE

Choe, Hi Jun (Department of Mathematics Yonsei University)
Koh, Hayeong (Software Testing & Certification Laboratory Telecommunication Technology Association)
Lee, Jimin (Center for Mathematical Analysis & Computation Yonsei University)

Publication Information

Journal of the Korean Mathematical Society / v.59, no.2, 2022 , pp. 217-233 More about this Journal

Abstract

Although machine learning shows state-of-the-art performance in a variety of fields, it is short a theoretical understanding of how machine learning works. Recently, theoretical approaches are actively being studied, and there are results for one of them, margin and its distribution. In this paper, especially we focused on the role of margin in the perturbations of inputs and parameters. We show a generalization bound for two cases, a linear model for binary classification and neural networks for multi-classification, when the inputs have normal distributed random noises. The additional generalization term caused by random noises is related to margin and exponentially inversely proportional to the noise level for binary classification. And in neural networks, the additional generalization term depends on (input dimension) × (norms of input and weights). For these results, we used the PAC-Bayesian framework. This paper is considering random noises and margin together, and it will be helpful to a better understanding of model sensitivity and the construction of robust generalization.

Keywords

Generalization bound; PAC-Bayesian; margin loss function;

Citations & Related Records

Reference

1	P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, Spectrally-normalized margin bounds for neural networks, In Advances in Neural Information Processing Systems 30, pages 6240-6249, 2017.
2	Y. Jiang, D. Krishnan, H. Mobahi, and S. Bengio, Predicting the generalization gap in deep networks with margin distributions, arXiv preprint arXiv:1810.00113, 2018.
3	L. Shen-Huan, W. Lu, and Z. Zhi-Hua, Optimal margin distribution network, CoRR, abs/1812.10761, 2018.
4	C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199, 2013.
5	D. Yin, K. Ramchandran, and P. Bartlett, Rademacher complexity for adversarially robust generalization, International Conference on Machine Learning, 2019.
6	C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, arXiv preprint arXiv:1611.03530, 2016.
7	B. Neyshabur, S. Bhojanapalli, and N. Srebro, A pac-bayesian approach to spectrally-normalized margin bounds for neural networks, International Conference on Learning Representations, 2018.
8	S. Arora, R. Ge, B. Neyshabur, and Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, arXiv preprint arXiv:1802.05296, 2018.
9	G. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio, Large margin deep networks for classification, In Advances in neural information processing systems, pages 842-852, 2018.
10	D. Haussler, Probably approximately correct learning, University of California, Santa Cruz, Computer Research Laboratory, 1990.
11	D. A. McAllester, PAC-Bayesian model averaging, in Proceedings of the Twelfth Annual Conference on Computational Learning Theory (Santa Cruz, CA, 1999), 164-170, ACM, New York, 1999. https://doi.org/10.1145/307400.307435 DOI
12	J. Langford and J. Shawe-Taylor, Pac-bayes & margins, In Advances in neural information processing systems, pages 439-446, 2003.
13	J. A. Tropp, User-friendly tail bounds for sums of random matrices, Found. Comput. Math. 12 (2012), no. 4, 389-434. https://doi.org/10.1007/s10208-011-9099-z DOI
14	D. A. McAllester, Pac-bayesian stochastic model selection, Machine Learning 51 (2003), no. 1, 5-21. DOI
15	D. McAllester, Simplified pac-bayesian margin bounds, In Learning theory and Kernel machines, pages 203-215. Springer, 2003.
16	L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry, Adversarially robust generalization requires more data, In Advances in Neural Information Processing Systems, pages 5014-5026, 2018.