Accelerating ELM training over data streams

Hangxu Ji, Gang Wu*, Guoren Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving.

源语言英语
页(从-至)87-102
页数16
期刊International Journal of Machine Learning and Cybernetics
12
1
DOI
出版状态已出版 - 1月 2021

指纹

探究 'Accelerating ELM training over data streams' 的科研主题。它们共同构成独一无二的指纹。

引用此