TY - GEN
T1 - Automatic Operator Performance Tumng in a Machine Learning System on Edge
AU - Xu, Peng
AU - Chang, Xinyu
AU - Zhao, Jianxin
AU - Liu, Chi Harold
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - With the current large scale deployment of machine learning technologies, such as those on cloud servers and edge and IoT hardwares, machine learning systems have been widely prevalence. Practical requirement has driven their performance increase in both academia and industry. However, the application requirement varies greatly across different applications, and directly using off-the-shelf systems might not be sufficient in many cases. In this work, we first propose to implement a series of techniques to optimize performance of convolution operation, one of the most important operations, in constructing deep learning networks. Besides, we also propose to apply the automated empirical optimisation of software approach to improve the performance of operators in machine learning system, most notably across various hardware platforms. Evaluation compared to existing libraries on different hardware devices has proved the efficiency of our proposed method.
AB - With the current large scale deployment of machine learning technologies, such as those on cloud servers and edge and IoT hardwares, machine learning systems have been widely prevalence. Practical requirement has driven their performance increase in both academia and industry. However, the application requirement varies greatly across different applications, and directly using off-the-shelf systems might not be sufficient in many cases. In this work, we first propose to implement a series of techniques to optimize performance of convolution operation, one of the most important operations, in constructing deep learning networks. Besides, we also propose to apply the automated empirical optimisation of software approach to improve the performance of operators in machine learning system, most notably across various hardware platforms. Evaluation compared to existing libraries on different hardware devices has proved the efficiency of our proposed method.
KW - automatic tuning
KW - convolution
KW - machine learning system
KW - optimization
UR - http://www.scopus.com/inward/record.url?scp=85152928988&partnerID=8YFLogxK
U2 - 10.1109/ICPADS56603.2022.00109
DO - 10.1109/ICPADS56603.2022.00109
M3 - Conference contribution
AN - SCOPUS:85152928988
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 802
EP - 809
BT - Proceedings - 2022 IEEE 28th International Conference on Parallel and Distributed Systems, ICPADS 2022
PB - IEEE Computer Society
T2 - 28th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2022
Y2 - 10 January 2023 through 12 January 2023
ER -