TSCM: Efficient image synthesis using latent diffusion

  • Zhiwei Lin*
  • , Xiaolong Wang
  • , Songchuan Zhang
  • , Tao Wang
  • , Zhonghua Guo
  • , Tai Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces a novel method for distilling diffusion models, called the Two Step Consistency Model (TSCM). TSCM significantly enhances the generation speed of diffusion models, particularly in generating guiding images, achieving real-time, high-resolution output. Unlike traditional consistency models, TSCM avoids marginal effects associated with distillation by first resolving the diffusion model in one step and then executing consistency model calculations. Furthermore, TSCM incorporates Low-Rank Adaptation (LORA) technology to prevent a significant increase in spatial complexity. Notably, TSCM requires only a minor increase in storage space to significantly improve model inference speed and supports training while keeping the original model weights frozen. To validate TSCM’s effectiveness, experiments were conducted on the Deepfashion dataset and the LAION-5B dataset, evaluating the distillation effects. In the Deepfashion dataset, TSCM improved the Frechet Inception Distance (FID) from 11.238 to 10.352 and the Learned Perceptual Image Patch Similarity (LPIPS) from 0.1921 to 0.1824. In the LAION-5B dataset, TSCM effectively avoided image blurring when reducing the number of generation steps, and its CLIP score increased by 4.21 compared to previous sampling methods. Additionally, TSCM can compress the generation time by an average of 8.8 times while maintaining the same level of image generation quality.

Original languageEnglish
Pages (from-to)57-62
Number of pages6
JournalPattern Recognition Letters
Volume202
DOIs
Publication statusPublished - Apr 2026
Externally publishedYes

Keywords

  • Consistency models
  • Diffusion models
  • Human image generation

Fingerprint

Dive into the research topics of 'TSCM: Efficient image synthesis using latent diffusion'. Together they form a unique fingerprint.

Cite this