跳到主要导航 跳到搜索 跳到主要内容

Toward the unification of generative and discriminative visual foundation model: a survey

  • Xu Liu
  • , Tong Zhou
  • , Chong Wang
  • , Yuping Wang
  • , Yuanxin Wang
  • , Qinjingwen Cao
  • , Weizhi Du
  • , Yonghuan Yang
  • , Junjun He
  • , Yu Qiao
  • , Yiqing Shen*
  • *此作品的通讯作者
  • University of California at Los Angeles
  • Rice University
  • Xinxiang Medical College
  • University of Michigan, Ann Arbor
  • Carnegie Mellon University
  • University of Illinois at Urbana-Champaign
  • Wal-Mart Stores
  • Santa Clara University
  • Shanghai AI Laboratory
  • Johns Hopkins University

科研成果: 期刊稿件文章同行评审

摘要

The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaking developments in computer vision. This review paper delineates the pivotal trajectories of VFMs, emphasizing their scalability and proficiency in generative tasks such as text-to-image synthesis, as well as their adeptness in discriminative tasks including image segmentation. While generative and discriminative models have historically charted distinct paths, we undertake a comprehensive examination of the recent strides made by VFMs in both domains, elucidating their origins, seminal breakthroughs, and pivotal methodologies. Additionally, we collate and discuss the extensive resources that facilitate the development of VFMs and address the challenges that pave the way for future research endeavors. A crucial direction for forthcoming innovation is the amalgamation of generative and discriminative paradigms. The nascent application of generative models within discriminative contexts signifies the early stages of this confluence. This survey aspires to be a contemporary compendium for scholars and practitioners alike, charting the course of VFMs and illuminating their multifaceted landscape.

源语言英语
页(从-至)3371-3412
页数42
期刊Visual Computer
41
5
DOI
出版状态已出版 - 3月 2025
已对外发布

指纹

探究 'Toward the unification of generative and discriminative visual foundation model: a survey' 的科研主题。它们共同构成独一无二的指纹。

引用此