Video Object Segmentation with Dynamic Query Modulation

Hantao Zhou; Runze Hu; Xiu Li

doi:10.1109/ICME57554.2024.10687816

Video Object Segmentation with Dynamic Query Modulation

Hantao Zhou, Runze Hu^*, Xiu Li^*

^*Corresponding author for this work

School of Information and Electronics

Tsinghua University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting each object independently without interaction. These shortcomings make the memory-based methods struggle in similar object and multi-object segmentation. To address these issues, we propose a query modulation method, termed QMVOS. This method summarizes object features into dynamic queries and then treats them as dynamic filters for mask prediction, thereby providing high-level descriptions and object-level perception for the model. Efficient and effective multi-object interactions are realized through inter-query attention. Extensive experiments demonstrate that our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks. The code is available at https://github.com/zht8506/QMVOS.

Original language	English
Title of host publication	2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Publisher	IEEE Computer Society
ISBN (Electronic)	9798350390155
DOIs	https://doi.org/10.1109/ICME57554.2024.10687816
Publication status	Published - 2024
Event	2024 IEEE International Conference on Multimedia and Expo, ICME 2024 - Niagra Falls, Canada Duration: 15 Jul 2024 → 19 Jul 2024

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	2024 IEEE International Conference on Multimedia and Expo, ICME 2024
Country/Territory	Canada
City	Niagra Falls
Period	15/07/24 → 19/07/24

Keywords

Memory bank
Object query
SVOS

Access to Document

10.1109/ICME57554.2024.10687816

Cite this

@inproceedings{11af949923c1446d8f3f26ff8a615807,

title = "Video Object Segmentation with Dynamic Query Modulation",

abstract = "Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting each object independently without interaction. These shortcomings make the memory-based methods struggle in similar object and multi-object segmentation. To address these issues, we propose a query modulation method, termed QMVOS. This method summarizes object features into dynamic queries and then treats them as dynamic filters for mask prediction, thereby providing high-level descriptions and object-level perception for the model. Efficient and effective multi-object interactions are realized through inter-query attention. Extensive experiments demonstrate that our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks. The code is available at https://github.com/zht8506/QMVOS.",

keywords = "Memory bank, Object query, SVOS",

author = "Hantao Zhou and Runze Hu and Xiu Li",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE International Conference on Multimedia and Expo, ICME 2024 ; Conference date: 15-07-2024 Through 19-07-2024",

year = "2024",

doi = "10.1109/ICME57554.2024.10687816",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

booktitle = "2024 IEEE International Conference on Multimedia and Expo, ICME 2024",

address = "United States",

}

Zhou, H, Hu, R & Li, X 2024, Video Object Segmentation with Dynamic Query Modulation. in 2024 IEEE International Conference on Multimedia and Expo, ICME 2024. Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2024 IEEE International Conference on Multimedia and Expo, ICME 2024, Niagra Falls, Canada, 15/07/24. https://doi.org/10.1109/ICME57554.2024.10687816

TY - GEN

T1 - Video Object Segmentation with Dynamic Query Modulation

AU - Zhou, Hantao

AU - Hu, Runze

AU - Li, Xiu

PY - 2024

Y1 - 2024

N2 - Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting each object independently without interaction. These shortcomings make the memory-based methods struggle in similar object and multi-object segmentation. To address these issues, we propose a query modulation method, termed QMVOS. This method summarizes object features into dynamic queries and then treats them as dynamic filters for mask prediction, thereby providing high-level descriptions and object-level perception for the model. Efficient and effective multi-object interactions are realized through inter-query attention. Extensive experiments demonstrate that our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks. The code is available at https://github.com/zht8506/QMVOS.

AB - Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting each object independently without interaction. These shortcomings make the memory-based methods struggle in similar object and multi-object segmentation. To address these issues, we propose a query modulation method, termed QMVOS. This method summarizes object features into dynamic queries and then treats them as dynamic filters for mask prediction, thereby providing high-level descriptions and object-level perception for the model. Efficient and effective multi-object interactions are realized through inter-query attention. Extensive experiments demonstrate that our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks. The code is available at https://github.com/zht8506/QMVOS.

KW - Memory bank

KW - Object query

KW - SVOS

UR - http://www.scopus.com/inward/record.url?scp=85206572881&partnerID=8YFLogxK

U2 - 10.1109/ICME57554.2024.10687816

DO - 10.1109/ICME57554.2024.10687816

M3 - Conference contribution

AN - SCOPUS:85206572881

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024

PB - IEEE Computer Society

T2 - 2024 IEEE International Conference on Multimedia and Expo, ICME 2024

Y2 - 15 July 2024 through 19 July 2024

ER -

Video Object Segmentation with Dynamic Query Modulation

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this