On the Flatness of Loss Surface for Two-layered ReLU Networks

Jiezhang Cao; Qingyao Wu; Yuguang Yan; Li Wang; Mingkui Tan

On the Flatness of Loss Surface for Two-layered ReLU Networks

Jiezhang Cao, Qingyao Wu, Yuguang Yan, Li Wang, Mingkui Tan^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 会议文章 › 同行评审

3 引用（Scopus）

摘要

Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.

源语言	英语
页（从-至）	545-560
页数	16
期刊	Journal of Machine Learning Research
卷	77
出版状态	已出版 - 2017
已对外发布	是
活动	9th Asian Conference on Machine Learning, ACML 2017 - Seoul, 韩国期限: 15 11月 2017 → 17 11月 2017

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ad7199b5f15744da9d2a53d49f0392cb,

title = "On the Flatness of Loss Surface for Two-layered ReLU Networks",

abstract = "Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.",

keywords = "Critical points, Flatness, Loss surface, Two-layered ReLU network",

author = "Jiezhang Cao and Qingyao Wu and Yuguang Yan and Li Wang and Mingkui Tan",

note = "Publisher Copyright: {\textcopyright} 2017 J. Cao, Q. Wu, Y. Yan, L. Wang & M. Tan.; 9th Asian Conference on Machine Learning, ACML 2017 ; Conference date: 15-11-2017 Through 17-11-2017",

year = "2017",

language = "English",

volume = "77",

pages = "545--560",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

}

TY - JOUR

T1 - On the Flatness of Loss Surface for Two-layered ReLU Networks

AU - Cao, Jiezhang

AU - Wu, Qingyao

AU - Yan, Yuguang

AU - Wang, Li

AU - Tan, Mingkui

PY - 2017

Y1 - 2017

N2 - Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.

AB - Deep learning has achieved unprecedented practical success in many applications. Despite its empirical success, however, the theoretical understanding of deep neural networks still remains a major open problem. In this paper, we explore properties of two-layered ReLU networks. For simplicity, we assume that the optimal model parameters (also called ground-truth parameters) are known. We then assume that a network receives Gaussian input and is trained by minimizing the expected squared loss between the prediction function of the network and a target function. To conduct the analysis, we propose a normal equation for critical points, and study the invariances under three kinds of transformations, namely, scale transformation, rotation transformation and perturbation transformation. We prove that these transformations can keep the loss of a critical point invariant, thus can incur flat regions. Consequently, how to escape from flat regions is vital in training neural networks.

KW - Critical points

KW - Flatness

KW - Loss surface

KW - Two-layered ReLU network

UR - http://www.scopus.com/inward/record.url?scp=85070922352&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85070922352

SN - 1532-4435

VL - 77

SP - 545

EP - 560

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

T2 - 9th Asian Conference on Machine Learning, ACML 2017

Y2 - 15 November 2017 through 17 November 2017

ER -

On the Flatness of Loss Surface for Two-layered ReLU Networks

摘要

其它文件与链接

指纹

引用此