跳到主要导航 跳到搜索 跳到主要内容

Zero-Shot Semantic Segmentation Research of Vision Language Models

  • Beijing Institute of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this paper, we systematically research recent based on vision language models (VLMs) semantic segmentation methods, focusing on two major paradigms: VLM-based crossmodal models and large language model (LLM)-enhanced interactive models. We analyze the characteristics and strategies of representative methods in these two paradigms, covering zero-shot learning, visual-language prompting strategies, and multi-round interactive reasoning. We summarize and analyze their segmentation accuracy and computational performance, and show that VLM-based crossmodal models remain competitive in structured datasets due to their efficiency and simplicity, while LLM-enhanced methods show greater flexibility and reasoning capabilities in complex instruction-driven tasks. Our study highlights the advantages of both paradigms and proposes a future direction of combining lightweight visual foundations with high-level semantic reasoning.

源语言英语
主期刊名2026 12th International Conference on Automation, Robotics, and Applications, ICARA 2026
出版商Institute of Electrical and Electronics Engineers Inc.
457-462
页数6
版本2026
ISBN(电子版)9798331563530
DOI
出版状态已出版 - 2026
已对外发布
活动12th International Conference on Automation, Robotics and Applications, ICARA 2026 - Istanbul, 土耳其
期限: 5 2月 20267 2月 2026

会议

会议12th International Conference on Automation, Robotics and Applications, ICARA 2026
国家/地区土耳其
Istanbul
时期5/02/267/02/26

指纹

探究 'Zero-Shot Semantic Segmentation Research of Vision Language Models' 的科研主题。它们共同构成独一无二的指纹。

引用此