Skip to main navigation Skip to search Skip to main content

MI2A: A Multimodal Information Interaction Architecture for Automated Diagnosis of Lung Nodules Using PET/CT Imaging

  • Kai Li
  • , Tongtong Li
  • , Lei Zhang
  • , Junfeng Mao
  • , Xuerong Shi
  • , Zhijun Yao*
  • , Lei Fang*
  • , Bin Hu*
  • *Corresponding author for this work
  • Lanzhou University
  • Hexi University
  • The 940th Hospital of Joint Logistics Support Force of Chinese PLA
  • Gansu University of Chinese Medicine
  • Taikang Tongji (Wuhan) Hospital
  • Beijing Institute of Technology
  • Chinese Academy of Sciences

Research output: Contribution to journalArticlepeer-review

Abstract

Lung cancer is one of the most common malignancies globally, with malignant nodules being an early indicator of the disease. Thus, accurate early diagnosis of lung nodules is imperative. Positron emission tomography–computed tomography (PET/CT) is a noninvasive imaging technique that provides both anatomical and metabolic information, playing a crucial role in the diagnosis of cancer. Existing deep learning-based multimodal fusion strategies often rely on the simple concatenation of features from two modalities, overlooking the intricate interactions between them. In this study, we proposed a multimodal information interaction framework named multimodal information interaction architecture (MI2A) for the automated diagnosis of lung nodules using PET/CT imaging. Specifically, the lung parenchymal regions were cropped as regions of interest (ROIs) using a pretrained U-Net model. Second, higher-order multimodal features from PET/CT scans were extracted and integrated using a custom-designed PET–CT imaging encoder (PCIE) module and a cross-attention multimodal encoder (CAME) module, respectively. Predictions were generated using multipath pooling layers and a multilayer perceptron (MLP) layer. Furthermore, an alignment loss function was designed to minimize the discrepancy between modality features during training. Finally, the proposed model was evaluated on an actual clinical dataset, achieving accuracy (Acc), precision (Prec), recall (Rec), specificity (Spec), and the F1-score (F-1) of 0.9179, 0.8972, 0.8937, 0.9335, and 0.8954, respectively. In addition, the findings revealed that certain benign lesions, particularly those related to inflammatory or infectious conditions, displayed high metabolic activity, which is the main reason for limiting the model’s performance. This insight provides a promising direction for future research[Figure Presented].

Original languageEnglish
Pages (from-to)28547-28559
Number of pages13
JournalIEEE Sensors Journal
Volume25
Issue number15
DOIs
Publication statusPublished - 2025
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Deep learning
  • lung nodules classification
  • multimodal fusion
  • positron emission tomography–computed tomography (PET/CT)

Fingerprint

Dive into the research topics of 'MI2A: A Multimodal Information Interaction Architecture for Automated Diagnosis of Lung Nodules Using PET/CT Imaging'. Together they form a unique fingerprint.

Cite this