TY - JOUR
T1 - An experimental evaluation of extreme learning machines on several hardware devices
AU - Li, Liang
AU - Wang, Guoren
AU - Wu, Gang
AU - Zhang, Qi
N1 - Publisher Copyright:
© 2019, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2020/9/1
Y1 - 2020/9/1
N2 - As an important learning algorithm, extreme learning machine (ELM) is known for its excellent learning speed. With the expansion of ELM’s applications in the field of classification and regression, the need for its real-time performance is increasing. Although the use of hardware acceleration is an obvious solution, how to select the appropriate acceleration hardware for ELM-based applications is a topic worthy of further discussion. For this purpose, we designed and evaluated the optimized ELM algorithms on three kinds of state-of-the-art acceleration hardware, i.e., multi-core CPU, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA) which are all suitable for matrix multiplication optimization. The experimental results showed that the speedup ratio of these optimized algorithms on acceleration hardware achieved 10–800. Therefore, we suggest that (1) use GPU to accelerate ELM algorithms for large dataset, and (2) use FPGA for small dataset because of its lower power, especially for some embedded applications. We also opened our source code.
AB - As an important learning algorithm, extreme learning machine (ELM) is known for its excellent learning speed. With the expansion of ELM’s applications in the field of classification and regression, the need for its real-time performance is increasing. Although the use of hardware acceleration is an obvious solution, how to select the appropriate acceleration hardware for ELM-based applications is a topic worthy of further discussion. For this purpose, we designed and evaluated the optimized ELM algorithms on three kinds of state-of-the-art acceleration hardware, i.e., multi-core CPU, Graphics Processing Unit (GPU), and Field-Programmable Gate Array (FPGA) which are all suitable for matrix multiplication optimization. The experimental results showed that the speedup ratio of these optimized algorithms on acceleration hardware achieved 10–800. Therefore, we suggest that (1) use GPU to accelerate ELM algorithms for large dataset, and (2) use FPGA for small dataset because of its lower power, especially for some embedded applications. We also opened our source code.
KW - Extreme learning machine
KW - FPGA
KW - GPU
KW - Hardware
KW - Multi-core
UR - http://www.scopus.com/inward/record.url?scp=85073833462&partnerID=8YFLogxK
U2 - 10.1007/s00521-019-04481-6
DO - 10.1007/s00521-019-04481-6
M3 - Article
AN - SCOPUS:85073833462
SN - 0941-0643
VL - 32
SP - 14385
EP - 14397
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 18
ER -