Abstract
This paper introduces a novel method for distilling diffusion models, called the Two Step Consistency Model (TSCM). TSCM significantly enhances the generation speed of diffusion models, particularly in generating guiding images, achieving real-time, high-resolution output. Unlike traditional consistency models, TSCM avoids marginal effects associated with distillation by first resolving the diffusion model in one step and then executing consistency model calculations. Furthermore, TSCM incorporates Low-Rank Adaptation (LORA) technology to prevent a significant increase in spatial complexity. Notably, TSCM requires only a minor increase in storage space to significantly improve model inference speed and supports training while keeping the original model weights frozen. To validate TSCM’s effectiveness, experiments were conducted on the Deepfashion dataset and the LAION-5B dataset, evaluating the distillation effects. In the Deepfashion dataset, TSCM improved the Frechet Inception Distance (FID) from 11.238 to 10.352 and the Learned Perceptual Image Patch Similarity (LPIPS) from 0.1921 to 0.1824. In the LAION-5B dataset, TSCM effectively avoided image blurring when reducing the number of generation steps, and its CLIP score increased by 4.21 compared to previous sampling methods. Additionally, TSCM can compress the generation time by an average of 8.8 times while maintaining the same level of image generation quality.
| Original language | English |
|---|---|
| Pages (from-to) | 57-62 |
| Number of pages | 6 |
| Journal | Pattern Recognition Letters |
| Volume | 202 |
| DOIs | |
| Publication status | Published - Apr 2026 |
| Externally published | Yes |
Keywords
- Consistency models
- Diffusion models
- Human image generation
Fingerprint
Dive into the research topics of 'TSCM: Efficient image synthesis using latent diffusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver