GTMS: A Gradient-Driven Tree-Guided Mask-Free Referring Image Segmentation Method

  • Haoxin Lyu
  • , Tianxiong Zhong
  • , Sanyuan Zhao*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Referring image segmentation (RIS) aims to segment an object of interest by a given natural language expression. As fully-supervised methods require expensive pixel-wise labeling, mask-free solutions supervised by low-cost labels are largely desired. However, existing mask-free RIS methods suffer from complicated architectures or insufficient utilization of structural and semantic information resulting in unsatisfactory performance. In this paper, we propose a gradient-driven tree-guided mask-free RIS method, GTMS, which utilizes both structural and semantic information, while only using a bounding box as the supervised signal. Specifically, we first construct the structural information of the input image as a tree structure. Meanwhile, we utilize gradient information to explore semantically related regions from the text feature. Finally, the structural information and semantic information are used to refine the output of the segmentation model to generate pseudo labels, which in turn are used to optimize the model. To verify the effectiveness of our method, the experiments are conducted on three benchmarks, i.e., RefCOCO/+/g. Our method achieves SOTA performance compared with other mask-free RIS methods and even outperforms many fully supervised RIS methods. Specifically, GTMS attains 66.54%, 69.98% and 63.41% IoU on RefCOCO Val-Test, TestA and TestB. Our code will be available at https://github.com/eternalld/GTMS.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
EditorsAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
PublisherSpringer Science and Business Media Deutschland GmbH
Pages288-304
Number of pages17
ISBN (Print)9783031728471
DOIs
Publication statusPublished - 2025
Event18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy
Duration: 29 Sept 20244 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15124 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th European Conference on Computer Vision, ECCV 2024
Country/TerritoryItaly
CityMilan
Period29/09/244/10/24

Keywords

  • Referring Image Segmentation
  • Weakly supervision

Fingerprint

Dive into the research topics of 'GTMS: A Gradient-Driven Tree-Guided Mask-Free Referring Image Segmentation Method'. Together they form a unique fingerprint.

Cite this