Gise-Ttt:A Framework for Global Information Segmentation and Enhancement

  • Fenglei Hao
  • , Yuliang Yang
  • , Zhengran Zhao
  • , Ruiyuan Su
  • , Yukun Qiao
  • , Mengyu Zhu*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper addresses the challenge of capturing global temporal dependencies in long video sequences for Video Object Segmentation (VOS). Existing architectures often fail to effectively model these dependencies across extended temporal horizons. To overcome this limitation, we introduce GISE-TTT, a novel architecture that integrates Test-Time Training (TTT) layers into transformer-based frameworks through a co-designed hierarchical approach.The TTT layer systematically condenses historical temporal information into hidden states that encode globally coherent contextual representations. By leveraging multistage contextual aggregation through hierarchical concatenation, our framework progressively refines spatiotemporal dependencies across network layers. This design represents the first systematic empirical evidence that distributing global information across multiple network layers is critical for optimal dependency utilization in video segmentation tasks.Ablation studies demonstrate that incorporating TTT modules at high-level feature stages significantly enhances global modeling capabilities, thereby improving the network's ability to capture long-range temporal relationships. Extensive experiments on DAVIS 2017 show that GISETTT achieves a 3.2 % improvement in segmentation accuracy over the baseline model, providing comprehensive evidence that global information should be strategically leveraged throughout the network architecture.

Original languageEnglish
Title of host publication2025 IEEE 8th International Conference on Computer and Communication Engineering Technology, CCET 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages226-230
Number of pages5
Edition2025
ISBN (Electronic)9798331558109
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event8th IEEE International Conference on Computer and Communication Engineering Technology, CCET 2025 - Beijing, China
Duration: 15 Aug 202517 Aug 2025

Conference

Conference8th IEEE International Conference on Computer and Communication Engineering Technology, CCET 2025
Country/TerritoryChina
CityBeijing
Period15/08/2517/08/25

Keywords

  • Global Information
  • TTT
  • Video Object Segmentation

Fingerprint

Dive into the research topics of 'Gise-Ttt:A Framework for Global Information Segmentation and Enhancement'. Together they form a unique fingerprint.

Cite this