Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data

Qiuling Yang; Alireza Sadeghi; Gang Wang; Jian Sun

doi:10.1109/TSP.2021.3094911

Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data

Qiuling Yang, Alireza Sadeghi, Gang Wang^*, Jian Sun

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

13 引用（Scopus）

摘要

Neural networks with non-linear rectified linear unit (ReLU) activation functions have demonstrated remarkable performance in many fields. It has been observed that a sufficiently wide and/or deep ReLU network can accurately fit the training data, with a small generalization error on the testing data. Nevertheless, existing analytical results on provably training ReLU networks are mostly limited to over-parameterized cases, or they require assumptions on the data distribution. In this paper, training a two-layer ReLU network for binary classification of linearly separable data is revisited. Adopting the hinge loss as classification criterion yields a non-convex objective function with infinite local minima and saddle points. Instead, a modified loss is proposed which enables (stochastic) gradient descent to attain a globally optimal solution. Enticingly, the solution found is globally optimal for the hinge loss too. In addition, an upper bound on the number of iterations required to find a global minimum is derived. To ensure generalization performance, a convex max-margin formulation for two-layer ReLU network classifiers is discussed. Connections between the sought max-margin ReLU network and the max-margin support vector machine are drawn. Finally, an algorithm-dependent theoretical quantification of the generalization performance is developed using classical compression bounds. Numerical tests using synthetic and real data validate the analytical results.

源语言	英语
文章编号	9477126
页（从-至）	4416-4427
页数	12
期刊	IEEE Transactions on Signal Processing
卷	69
DOI	https://doi.org/10.1109/TSP.2021.3094911
出版状态	已出版 - 2021

访问文件

10.1109/TSP.2021.3094911

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2f3f641b95924d25ae3b061dbf8d6ee8,

title = "Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data",

abstract = "Neural networks with non-linear rectified linear unit (ReLU) activation functions have demonstrated remarkable performance in many fields. It has been observed that a sufficiently wide and/or deep ReLU network can accurately fit the training data, with a small generalization error on the testing data. Nevertheless, existing analytical results on provably training ReLU networks are mostly limited to over-parameterized cases, or they require assumptions on the data distribution. In this paper, training a two-layer ReLU network for binary classification of linearly separable data is revisited. Adopting the hinge loss as classification criterion yields a non-convex objective function with infinite local minima and saddle points. Instead, a modified loss is proposed which enables (stochastic) gradient descent to attain a globally optimal solution. Enticingly, the solution found is globally optimal for the hinge loss too. In addition, an upper bound on the number of iterations required to find a global minimum is derived. To ensure generalization performance, a convex max-margin formulation for two-layer ReLU network classifiers is discussed. Connections between the sought max-margin ReLU network and the max-margin support vector machine are drawn. Finally, an algorithm-dependent theoretical quantification of the generalization performance is developed using classical compression bounds. Numerical tests using synthetic and real data validate the analytical results.",

keywords = "Convex loss, Finite iterations, Generalization, Global optimality, Max-margin, ReLU network",

author = "Qiuling Yang and Alireza Sadeghi and Gang Wang and Jian Sun",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2021",

doi = "10.1109/TSP.2021.3094911",

language = "English",

volume = "69",

pages = "4416--4427",

journal = "IEEE Transactions on Signal Processing",

issn = "1053-587X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data

AU - Yang, Qiuling

AU - Sadeghi, Alireza

AU - Wang, Gang

AU - Sun, Jian

PY - 2021

Y1 - 2021

N2 - Neural networks with non-linear rectified linear unit (ReLU) activation functions have demonstrated remarkable performance in many fields. It has been observed that a sufficiently wide and/or deep ReLU network can accurately fit the training data, with a small generalization error on the testing data. Nevertheless, existing analytical results on provably training ReLU networks are mostly limited to over-parameterized cases, or they require assumptions on the data distribution. In this paper, training a two-layer ReLU network for binary classification of linearly separable data is revisited. Adopting the hinge loss as classification criterion yields a non-convex objective function with infinite local minima and saddle points. Instead, a modified loss is proposed which enables (stochastic) gradient descent to attain a globally optimal solution. Enticingly, the solution found is globally optimal for the hinge loss too. In addition, an upper bound on the number of iterations required to find a global minimum is derived. To ensure generalization performance, a convex max-margin formulation for two-layer ReLU network classifiers is discussed. Connections between the sought max-margin ReLU network and the max-margin support vector machine are drawn. Finally, an algorithm-dependent theoretical quantification of the generalization performance is developed using classical compression bounds. Numerical tests using synthetic and real data validate the analytical results.

AB - Neural networks with non-linear rectified linear unit (ReLU) activation functions have demonstrated remarkable performance in many fields. It has been observed that a sufficiently wide and/or deep ReLU network can accurately fit the training data, with a small generalization error on the testing data. Nevertheless, existing analytical results on provably training ReLU networks are mostly limited to over-parameterized cases, or they require assumptions on the data distribution. In this paper, training a two-layer ReLU network for binary classification of linearly separable data is revisited. Adopting the hinge loss as classification criterion yields a non-convex objective function with infinite local minima and saddle points. Instead, a modified loss is proposed which enables (stochastic) gradient descent to attain a globally optimal solution. Enticingly, the solution found is globally optimal for the hinge loss too. In addition, an upper bound on the number of iterations required to find a global minimum is derived. To ensure generalization performance, a convex max-margin formulation for two-layer ReLU network classifiers is discussed. Connections between the sought max-margin ReLU network and the max-margin support vector machine are drawn. Finally, an algorithm-dependent theoretical quantification of the generalization performance is developed using classical compression bounds. Numerical tests using synthetic and real data validate the analytical results.

KW - Convex loss

KW - Finite iterations

KW - Generalization

KW - Global optimality

KW - Max-margin

KW - ReLU network

UR - http://www.scopus.com/inward/record.url?scp=85113369853&partnerID=8YFLogxK

U2 - 10.1109/TSP.2021.3094911

DO - 10.1109/TSP.2021.3094911

M3 - Article

AN - SCOPUS:85113369853

SN - 1053-587X

VL - 69

SP - 4416

EP - 4427

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

M1 - 9477126

ER -

Learning two-layer ReLU networks is nearly as easy as learning linear classifiers on separable data

摘要

访问文件

其它文件与链接

指纹

引用此