摘要
This paper addresses the challenge of capturing global temporal dependencies in long video sequences for Video Object Segmentation (VOS). Existing architectures often fail to effectively model these dependencies across extended temporal horizons. To overcome this limitation, we introduce GISE-TTT, a novel architecture that integrates Test-Time Training (TTT) layers into transformer-based frameworks through a co-designed hierarchical approach.The TTT layer systematically condenses historical temporal information into hidden states that encode globally coherent contextual representations. By leveraging multistage contextual aggregation through hierarchical concatenation, our framework progressively refines spatiotemporal dependencies across network layers. This design represents the first systematic empirical evidence that distributing global information across multiple network layers is critical for optimal dependency utilization in video segmentation tasks.Ablation studies demonstrate that incorporating TTT modules at high-level feature stages significantly enhances global modeling capabilities, thereby improving the network's ability to capture long-range temporal relationships. Extensive experiments on DAVIS 2017 show that GISETTT achieves a 3.2 % improvement in segmentation accuracy over the baseline model, providing comprehensive evidence that global information should be strategically leveraged throughout the network architecture.
| 源语言 | 英语 |
|---|---|
| 主期刊名 | 2025 IEEE 8th International Conference on Computer and Communication Engineering Technology, CCET 2025 |
| 出版商 | Institute of Electrical and Electronics Engineers Inc. |
| 页 | 226-230 |
| 页数 | 5 |
| 版本 | 2025 |
| ISBN(电子版) | 9798331558109 |
| DOI | |
| 出版状态 | 已出版 - 2025 |
| 已对外发布 | 是 |
| 活动 | 8th IEEE International Conference on Computer and Communication Engineering Technology, CCET 2025 - Beijing, 中国 期限: 15 8月 2025 → 17 8月 2025 |
会议
| 会议 | 8th IEEE International Conference on Computer and Communication Engineering Technology, CCET 2025 |
|---|---|
| 国家/地区 | 中国 |
| 市 | Beijing |
| 时期 | 15/08/25 → 17/08/25 |
指纹
探究 'Gise-Ttt:A Framework for Global Information Segmentation and Enhancement' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver