TY - GEN
T1 - Study on Unbalanced Binary Classification with Unknown Misclassification Costs
AU - Gao, J.
AU - Gong, L.
AU - Wang, J. Y.
AU - Mo, Z. C.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - With the rapid development of big data and machine learning technologies, many fields have begun to use related algorithms and methods. Classification algorithms have been widely used in the fields of financial risk identification, fault diagnosis, medical diagnosis, etc. However, the datasets are often unbalanced in these cases and the original methods fail to classify instances correctly. Many methods such as over-sampling, under-sampling and ensemble methods were raised to improve the classifier's performance, but which one to choose for a certain dataset still remains a problem. Therefore, this paper aims at a experimental conclusion on which kind of method can perform best on unbalanced classification problems generally. In detail, we evaluated the performances of 13 kinds of methods for unbalanced classification on several unbalanced datasets which have different amounts of instances and different ratios of positive instances, and finally came to a conclusion.
AB - With the rapid development of big data and machine learning technologies, many fields have begun to use related algorithms and methods. Classification algorithms have been widely used in the fields of financial risk identification, fault diagnosis, medical diagnosis, etc. However, the datasets are often unbalanced in these cases and the original methods fail to classify instances correctly. Many methods such as over-sampling, under-sampling and ensemble methods were raised to improve the classifier's performance, but which one to choose for a certain dataset still remains a problem. Therefore, this paper aims at a experimental conclusion on which kind of method can perform best on unbalanced classification problems generally. In detail, we evaluated the performances of 13 kinds of methods for unbalanced classification on several unbalanced datasets which have different amounts of instances and different ratios of positive instances, and finally came to a conclusion.
KW - Binary Classification
KW - unbalanced Data
UR - http://www.scopus.com/inward/record.url?scp=85061837986&partnerID=8YFLogxK
U2 - 10.1109/IEEM.2018.8607671
DO - 10.1109/IEEM.2018.8607671
M3 - Conference contribution
AN - SCOPUS:85061837986
T3 - IEEE International Conference on Industrial Engineering and Engineering Management
SP - 1538
EP - 1542
BT - 2018 IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2018
PB - IEEE Computer Society
T2 - 2018 IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2018
Y2 - 16 December 2018 through 19 December 2018
ER -