TY - JOUR
T1 - Medical Knowledge-Driven Contrastive Learning for Similar Patient Retrieval
AU - Meng, Fanqing
AU - Feng, Chong
AU - Shi, Ge
AU - Liu, Xia
AU - Wang, Bo
AU - Zhang, Kaiyuan
AU - Zhuang, Yan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2026
Y1 - 2026
N2 - Similar patient retrieval is a fundamental task in medical informatics, aiming to identify patients with similar clinical characteristics to assist in diagnosis and treatment plan recommendation. While traditional methods relying on lexical features or medical ontologies often fail to capture implicit semantic relationships, recent advancements in dense retrieval methods powered by deep learning have shown promise yet face challenges in adapting to specific tasks such as similar patient retrieval. To address these limitations, we propose a medical knowledge-driven contrastive learning approach to enhance the representation capacity of general-purpose embedding models for medical text. Specifically, our approach introduces a novel negative sampling strategy leveraging International Classification of Diseases (ICD) codes to identify hard negatives. However, due to data imbalance issues, this method struggles to adequately mine negative examples. To overcome this limitation, we develop an external knowledge-based negative sampling method that incorporates both statistical and ambiguous knowledge, thereby enhancing the model’s ability to differentiate between fine-grained medical conditions and complex clinical scenarios. We then integrate these methods into a contrastive learning framework to train more robust patient representations. Extensive experiments on real-world medical datasets show that our proposed method achieves significant improvements over existing state-of-the-art baseline models.
AB - Similar patient retrieval is a fundamental task in medical informatics, aiming to identify patients with similar clinical characteristics to assist in diagnosis and treatment plan recommendation. While traditional methods relying on lexical features or medical ontologies often fail to capture implicit semantic relationships, recent advancements in dense retrieval methods powered by deep learning have shown promise yet face challenges in adapting to specific tasks such as similar patient retrieval. To address these limitations, we propose a medical knowledge-driven contrastive learning approach to enhance the representation capacity of general-purpose embedding models for medical text. Specifically, our approach introduces a novel negative sampling strategy leveraging International Classification of Diseases (ICD) codes to identify hard negatives. However, due to data imbalance issues, this method struggles to adequately mine negative examples. To overcome this limitation, we develop an external knowledge-based negative sampling method that incorporates both statistical and ambiguous knowledge, thereby enhancing the model’s ability to differentiate between fine-grained medical conditions and complex clinical scenarios. We then integrate these methods into a contrastive learning framework to train more robust patient representations. Extensive experiments on real-world medical datasets show that our proposed method achieves significant improvements over existing state-of-the-art baseline models.
KW - Similar patient retrieval
KW - contrastive learning
KW - medical knowledge
KW - negative sampling
UR - https://www.scopus.com/pages/publications/105038648698
U2 - 10.1109/JBHI.2026.3690515
DO - 10.1109/JBHI.2026.3690515
M3 - Article
C2 - 42085409
AN - SCOPUS:105038648698
SN - 2168-2194
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
ER -