SiamAT: transformer-based target-aware siamese tracking network for target tracking in ultrasound image sequences

Research output: Contribution to journalArticlepeer-review

Abstract

Objective. Accurate target localization is essential for effective radiation therapy, yet respiratory motion introduces considerable uncertainty. Ultrasound-based motion tracking offers a non-invasive solution, but the presence of similar anatomical structures and significant changes in appearance severely hinders robust and accurate tracking. Robust tracking is essential to ensure accuracy and improve treatment outcomes.Approach. We propose SiamAT, a tracking framework comprising a Siamese-like feature extraction network with a Swin-Transformer backbone, an attention-based target-aware module, and multi-task prediction heads. The target-aware module adaptively integrate template and search features to generate target-aware representations for precise localization.Main results. The proposed SiamAT is evaluated on a public dataset, i.e. MICCAI 2015 challenge on liver US tracking, and our clinical dataset provided by the Chinese People's Liberation Army General Hospital. Experimental results demonstrate that the method achieves accurate and robust tracking(0.60±0.30 mm and 0.47±0.31 mm tacking errors in two datasets) and outperforms existing methods. The proposed method runs at approximatively 36 fps on GPU.Significance. The proposed SiamAT provides clinicians with an effective tool for analyzing 2D motion in US images in clinical settings, assisting in the planning of optimal treatment strategies while reducing diagnostic and therapeutic risks.

Original languageEnglish
JournalPhysics in Medicine and Biology
Volume70
Issue number23
DOIs
Publication statusPublished - 27 Nov 2025
Externally publishedYes

Keywords

  • siamese tracking
  • target tracking
  • transformer
  • ultrasound sequence

Fingerprint

Dive into the research topics of 'SiamAT: transformer-based target-aware siamese tracking network for target tracking in ultrasound image sequences'. Together they form a unique fingerprint.

Cite this