TY - JOUR
T1 - Parallel online sequential extreme learning machine based on MapReduce
AU - Wang, Botao
AU - Huang, Shan
AU - Qiu, Junhao
AU - Liu, Yu
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2014 Elsevier B.V.
PY - 2015/2/3
Y1 - 2015/2/3
N2 - In this age of big data, analyzing big data is a very challenging problem. MapReduce is a simple, scalable and fault-tolerant data processing framework that enables us to process a massive volume of data. Many machine learning algorithms have been designed based on MapReduce, but there are only a few works related to parallel extreme learning machine (ELM) which is a fast and accurate learning algorithm.Online sequential extreme learning machine (OS-ELM) is one of improved ELM algorithms to support online sequential learning efficiently. In this paper, we first analyze the dependency relationships of matrix calculations of OS-ELM, then propose a parallel online sequential extreme learning machine (POS-ELM) based on MapReduce.POS-ELM is evaluated with real and synthetic data with the maximum number of training data 1280. K and the maximum number of attributes 128. The experimental results show that the training accuracy and testing accuracy of POS-ELM are at the same level as those of OS-ELM and ELM, and it has good scalability with regard to the number of training data and the number of attributes. Compared to original ELM and OS-ELM where the capability to process large scale data is bounded by the limitation of resources within a single processing unit, POS-ELM can deal with much larger scale data. The larger the number of training data is, the higher the speedup of POS-ELM is. It can be concluded that POS-ELM has more powerful capability than both ELM and OS-ELM for large scale learning.
AB - In this age of big data, analyzing big data is a very challenging problem. MapReduce is a simple, scalable and fault-tolerant data processing framework that enables us to process a massive volume of data. Many machine learning algorithms have been designed based on MapReduce, but there are only a few works related to parallel extreme learning machine (ELM) which is a fast and accurate learning algorithm.Online sequential extreme learning machine (OS-ELM) is one of improved ELM algorithms to support online sequential learning efficiently. In this paper, we first analyze the dependency relationships of matrix calculations of OS-ELM, then propose a parallel online sequential extreme learning machine (POS-ELM) based on MapReduce.POS-ELM is evaluated with real and synthetic data with the maximum number of training data 1280. K and the maximum number of attributes 128. The experimental results show that the training accuracy and testing accuracy of POS-ELM are at the same level as those of OS-ELM and ELM, and it has good scalability with regard to the number of training data and the number of attributes. Compared to original ELM and OS-ELM where the capability to process large scale data is bounded by the limitation of resources within a single processing unit, POS-ELM can deal with much larger scale data. The larger the number of training data is, the higher the speedup of POS-ELM is. It can be concluded that POS-ELM has more powerful capability than both ELM and OS-ELM for large scale learning.
KW - Extreme learning machine
KW - Large scale learning
KW - Mapreduce
KW - Online sequential learning
KW - Parallel classification
UR - http://www.scopus.com/inward/record.url?scp=84922020238&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2014.03.076
DO - 10.1016/j.neucom.2014.03.076
M3 - Article
AN - SCOPUS:84922020238
SN - 0925-2312
VL - 149
SP - 224
EP - 232
JO - Neurocomputing
JF - Neurocomputing
IS - Part A
ER -