On the Flatness of Loss Surface for Two-layered ReLU Networks

Jiezhang Cao, Qingyao Wu, Yuguang Yan, Li Wang, Mingkui Tan*

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

3 引用 (Scopus)

摘要

Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.

源语言英语
页(从-至)545-560
页数16
期刊Journal of Machine Learning Research
77
出版状态已出版 - 2017
已对外发布
活动9th Asian Conference on Machine Learning, ACML 2017 - Seoul, 韩国
期限: 15 11月 201717 11月 2017

指纹

探究 'On the Flatness of Loss Surface for Two-layered ReLU Networks' 的科研主题。它们共同构成独一无二的指纹。

引用此