TY - JOUR
T1 - On the Flatness of Loss Surface for Two-layered ReLU Networks
AU - Cao, Jiezhang
AU - Wu, Qingyao
AU - Yan, Yuguang
AU - Wang, Li
AU - Tan, Mingkui
N1 - Publisher Copyright:
© 2017 J. Cao, Q. Wu, Y. Yan, L. Wang & M. Tan.
PY - 2017
Y1 - 2017
N2 - Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.
AB - Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.
KW - Critical points
KW - Flatness
KW - Loss surface
KW - Two-layered ReLU network
UR - http://www.scopus.com/inward/record.url?scp=85070922352&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85070922352
SN - 1532-4435
VL - 77
SP - 545
EP - 560
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
T2 - 9th Asian Conference on Machine Learning, ACML 2017
Y2 - 15 November 2017 through 17 November 2017
ER -