Latent Consistency Models

Synthesizing High-Resolution Images with Few-step Inference

Institute for Interdisciplinary Information Sciences, Tsinghua University
" LCMs: The next generation of generative models after Latent Diffusion Models (LDMs). "

Abstract

We propose Latent Consistency Models (LCMs) to overcome the slow iterative sampling process of Latent Diffusion models (LDMs), enabling fast inference with minimal steps on any pre-trained LDMs (e.g Stable Diffusion).
Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE) , LCMs predict its solution directly in latent space, achieving super fast inference with few steps.
A high-quality 768x768 LCM, distilled from Stable Diffusion, requires only 32 A100 GPU training hours (8 node for only 4 hours) for 2~4-step inference.

Few-Step Generated Images

Images generated by Latent Consistency Models (LCMs). LCMs can be distilled from any pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for generating high quality 768 x 768 resolution images in 2~4 steps or even one step , significantly accelerating text-to-image generation. We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.

4-Step Inference

2-Step Inference

1-Step Inference

More Generation Results (4-Steps)

More generated images results with LCM 4-Step inference (768 x 768 Resolution). We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.

More Generation Results (2-Steps)

More generated images results with LCM 2-Step inference (768 x 768 Resolution). We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.

Latent Consistency Fine-tuning (LCF)

LCF is a fine-tuning method designed for pretrained LCM. LCF enables efficient few-step inference on customized datasets without teacher diffusion model, presenting a viable alternative to directly finetune a pretrained LCM.

4-step LCMs using Latent Consistency Fine-tuning (LCF) on two customized datasets: Pokemon Dataset (left), Simpsons Dataset (right). Through LCF, LCM produces images with customized styles.

BibTeX

@misc{luo2023latent,
  title={Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference},
  author={Simian Luo and Yiqin Tan and Longbo Huang and Jian Li and Hang Zhao},
  year={2023},
  eprint={2310.04378},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}