Back to Glossary
Vision

Stable Diffusion

Stable Diffusion is a deep learning, text-to-image model released in 2022. It is primarily used to generate detailed images conditioned on text descriptions, but it can also be applied to other tasks such as inpainting, outpainting, and image-to-image translations.

Explanation

Stable Diffusion operates as a latent diffusion model. This means that instead of working directly in the pixel space (which is computationally expensive), it compresses the image into a lower-dimensional latent space using a variational autoencoder (VAE). The diffusion process then iteratively adds Gaussian noise to this latent representation until it becomes pure noise. Image generation involves reversing this process: starting from random noise in the latent space, the model progressively removes noise based on the provided text prompt or other conditioning information, guided by a neural network (typically a U-Net architecture) trained to predict the noise added at each step. Finally, the denoised latent representation is decoded back into the pixel space using the VAE decoder, producing the final image. Its efficiency and open-source availability have made it a popular choice for image generation.

Related Terms