MedMM: A Multimodal Fusion Framework for 3D Medical Image Classification with Multigranular Text Guidance

Shanbo Zhao, Meihui Zhang*, Xiaoqin Zhu, Junjie Li, Yunyun Duan, Zhizheng Zhuo, Yaou Liu, Chuyang Ye

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning approaches are widely used in medical image analysis and have shown impressive results on many analytical tasks. However, textual information related to medical images are often underutilized in existing methods, despite the great semantic value and potential multigranular guidance in medical image analysis. Meanwhile, many medical images, like magnetic resonance (MR) images are usually in 3D format consisting of multiple slices which contain more complex and redundant information, making them especially hard to be represented. In this paper, we propose a multimodal funsion framework for 3D medical image classification, which utilizes the medical text paired with the 3D medical image to guide the generation and aggregation of image features. Results show that our method significantly outperforms uni-modal and multimodal baseline methods. Ablation studies validate the effectiveness of each component, and visualization results also reveal the strong ability of our model on capturing fine-grained and coarse-grained information.

Original languageEnglish
Title of host publicationProceedings - 2024 10th International Conference on Big Data Computing and Communications, BIGCOM 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages42-49
Number of pages8
Edition2024
ISBN (Electronic)9798331509538
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event10th International Conference on Big Data Computing and Communications, BIGCOM 2024 - Dalian, China
Duration: 9 Aug 202411 Aug 2024

Conference

Conference10th International Conference on Big Data Computing and Communications, BIGCOM 2024
Country/TerritoryChina
CityDalian
Period9/08/2411/08/24

Keywords

  • 3D medical image classification
  • multi-modal feature interaction and fusion
  • vision-Language modeling

Fingerprint

Dive into the research topics of 'MedMM: A Multimodal Fusion Framework for 3D Medical Image Classification with Multigranular Text Guidance'. Together they form a unique fingerprint.

Cite this

Zhao, S., Zhang, M., Zhu, X., Li, J., Duan, Y., Zhuo, Z., Liu, Y., & Ye, C. (2024). MedMM: A Multimodal Fusion Framework for 3D Medical Image Classification with Multigranular Text Guidance. In Proceedings - 2024 10th International Conference on Big Data Computing and Communications, BIGCOM 2024 (2024 ed., pp. 42-49). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIGCOM65357.2024.00015