TY - JOUR
T1 - An Intelligible Risk Stratification Model Based on Pairwise and Size Constrained Kmeans
AU - Han, Longfei
AU - Luo, Senlin
AU - Wang, Huaiqing
AU - Pan, Limin
AU - Ma, Xincheng
AU - Zhang, Tiemei
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2017/9
Y1 - 2017/9
N2 - Having a system to stratify individuals according to risk is key to clinical disease prevention. This allows individuals identified at different risk tiers to benefit from further investigation and intervention. But the same risk score estimated for two different persons does not mean they need the same further investigation or represent the similarity health condition between two persons. Meanwhile, users still do not know a prior what most of the risk tiers are, and how many tiers should be found in risk stratification. In this paper, the proposed pairwise and size constrained Kmeans (PSCKmeans) method simultaneously integrates the limited supervised information and the size constraints to screen the high-risk population based on similarity measurement, and gets a feasible and balanced stratification solution to avoid cluster with few points. Results on China Health and Nutrition Survey public dataset and follow-up dataset show that the proposed PSCKmeans method can naturally grade the risk of diabetes into four tiers, and achieve 73.8%, 85.1%, and 0.95% sensitivity, specificity, and ratio of minimum to expected on testing data. The proposed method compares favorably with eight previous semisupervised clustering methods; it demonstrates that semisupervised clustering by unifying multiple forms of constraints can guide a good partition that is more relevant for the domain and find new categories through prior knowledge. Finally, this risk stratification model can provide a tool for risk stratification of clinical disease and be used for further intervention for people with similar health condition.
AB - Having a system to stratify individuals according to risk is key to clinical disease prevention. This allows individuals identified at different risk tiers to benefit from further investigation and intervention. But the same risk score estimated for two different persons does not mean they need the same further investigation or represent the similarity health condition between two persons. Meanwhile, users still do not know a prior what most of the risk tiers are, and how many tiers should be found in risk stratification. In this paper, the proposed pairwise and size constrained Kmeans (PSCKmeans) method simultaneously integrates the limited supervised information and the size constraints to screen the high-risk population based on similarity measurement, and gets a feasible and balanced stratification solution to avoid cluster with few points. Results on China Health and Nutrition Survey public dataset and follow-up dataset show that the proposed PSCKmeans method can naturally grade the risk of diabetes into four tiers, and achieve 73.8%, 85.1%, and 0.95% sensitivity, specificity, and ratio of minimum to expected on testing data. The proposed method compares favorably with eight previous semisupervised clustering methods; it demonstrates that semisupervised clustering by unifying multiple forms of constraints can guide a good partition that is more relevant for the domain and find new categories through prior knowledge. Finally, this risk stratification model can provide a tool for risk stratification of clinical disease and be used for further intervention for people with similar health condition.
KW - Pairwise constraints
KW - risk assessment
KW - semisupervised clustering
KW - size constraints
KW - type 2 diabetes
UR - http://www.scopus.com/inward/record.url?scp=85029940918&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2016.2633403
DO - 10.1109/JBHI.2016.2633403
M3 - Article
C2 - 27913364
AN - SCOPUS:85029940918
SN - 2168-2194
VL - 21
SP - 1288
EP - 1296
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 5
M1 - 7762039
ER -