Skip to main navigation Skip to search Skip to main content

Toward the unification of generative and discriminative visual foundation model: a survey

  • Xu Liu
  • , Tong Zhou
  • , Chong Wang
  • , Yuping Wang
  • , Yuanxin Wang
  • , Qinjingwen Cao
  • , Weizhi Du
  • , Yonghuan Yang
  • , Junjun He
  • , Yu Qiao
  • , Yiqing Shen*
  • *Corresponding author for this work
  • University of California at Los Angeles
  • Rice University
  • Xinxiang Medical College
  • University of Michigan, Ann Arbor
  • Carnegie Mellon University
  • University of Illinois at Urbana-Champaign
  • Wal-Mart Stores
  • Santa Clara University
  • Shanghai AI Laboratory
  • Johns Hopkins University

Research output: Contribution to journalArticlepeer-review

Abstract

The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities. Mirroring the transformative impact of foundation models like large language models in natural language processing, visual foundation models (VFMs) have become a catalyst for groundbreaking developments in computer vision. This review paper delineates the pivotal trajectories of VFMs, emphasizing their scalability and proficiency in generative tasks such as text-to-image synthesis, as well as their adeptness in discriminative tasks including image segmentation. While generative and discriminative models have historically charted distinct paths, we undertake a comprehensive examination of the recent strides made by VFMs in both domains, elucidating their origins, seminal breakthroughs, and pivotal methodologies. Additionally, we collate and discuss the extensive resources that facilitate the development of VFMs and address the challenges that pave the way for future research endeavors. A crucial direction for forthcoming innovation is the amalgamation of generative and discriminative paradigms. The nascent application of generative models within discriminative contexts signifies the early stages of this confluence. This survey aspires to be a contemporary compendium for scholars and practitioners alike, charting the course of VFMs and illuminating their multifaceted landscape.

Original languageEnglish
Pages (from-to)3371-3412
Number of pages42
JournalVisual Computer
Volume41
Issue number5
DOIs
Publication statusPublished - Mar 2025
Externally publishedYes

Keywords

  • Deep learning
  • Foundation model
  • Visual foundation model

Fingerprint

Dive into the research topics of 'Toward the unification of generative and discriminative visual foundation model: a survey'. Together they form a unique fingerprint.

Cite this