Accelerating ELM training over data streams

Hangxu Ji; Gang Wu; Guoren Wang

doi:10.1007/s13042-020-01158-8

Accelerating ELM training over data streams

Hangxu Ji, Gang Wu^*, Guoren Wang

^*此作品的通讯作者

计算机学院

Northeastern University China

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.

源语言	英语
页（从-至）	87-102
页数	16
期刊	International Journal of Machine Learning and Cybernetics
卷	12
期	1
DOI	https://doi.org/10.1007/s13042-020-01158-8
出版状态	已出版 - 1月 2021

访问文件

10.1007/s13042-020-01158-8

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6729612bb6b84117a4bbd79f7d5d6881,

title = "Accelerating ELM training over data streams",

abstract = "In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.",

keywords = "Extreme learning machine, Flink, Offline training, Online training",

author = "Hangxu Ji and Gang Wu and Guoren Wang",

note = "Publisher Copyright: {\textcopyright} 2020, Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2021",

month = jan,

doi = "10.1007/s13042-020-01158-8",

language = "English",

volume = "12",

pages = "87--102",

journal = "International Journal of Machine Learning and Cybernetics",

issn = "1868-8071",

publisher = "Springer Science + Business Media",

number = "1",

}

TY - JOUR

T1 - Accelerating ELM training over data streams

AU - Ji, Hangxu

AU - Wu, Gang

AU - Wang, Guoren

PY - 2021/1

Y1 - 2021/1

N2 - In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.

AB - In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.

KW - Extreme learning machine

KW - Flink

KW - Offline training

KW - Online training

UR - http://www.scopus.com/inward/record.url?scp=85089784256&partnerID=8YFLogxK

U2 - 10.1007/s13042-020-01158-8

DO - 10.1007/s13042-020-01158-8

M3 - Article

AN - SCOPUS:85089784256

SN - 1868-8071

VL - 12

SP - 87

EP - 102

JO - International Journal of Machine Learning and Cybernetics

JF - International Journal of Machine Learning and Cybernetics

IS - 1

ER -

Accelerating ELM training over data streams

摘要

访问文件

其它文件与链接

指纹

引用此