A Damn Fine Stable Diffusion Book you own this product

Create awesome images with AI

Will Kurt

MEAP began April 2024
Last updated February 2026
Publication in Summer 2026 (estimated)

ISBN 9781633436800
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / AI

resources: Book forum

table of content

INTRODUCTION

0 Introduction

PART 1: GETTING STARTED

1 Making our First Image: “A damn fine cup of coffee”

1.1 Getting Started with A1111

1.2 The Basics of Text-to-Image Creation

1.2.1 The Prompt

1.2.2 Creating more images: Batch Size vs Batch Count

1.2.3 Random Number Generators and The Seed

1.2.4 Adjusting Height and Width

1.3 Prompt Engineering

1.3.1 Favor clear, descriptive prompts

1.3.2 Give you image context

1.3.3 Describe a style for your image

1.4 Summary

2 Easier Image Generation with Stable Diffusion Webui

2.1 Returning to A1111

2.1.1 Quick UI refresher

2.1.2 Viewing the images you’ve created

2.2 New prompting tricks

2.2.1 Negative Prompting

2.2.2 Prompt Attention

2.2.3 “Sandworm” or “Sand worm”? Understanding Tokens

2.3 Other options to tweak

2.3.1 Classifier-Free Guidance (CFG) Scale

2.3.2 Sampling Steps

2.4 PNG Info

2.5 Summary

3 Programming Stable Diffusion with Hugging Face Diffusers

3.1 Why use Python to work with Stable Diffusion?

3.2 Creating a Page for a Graphic Novel

3.3 Working in Python Notebooks

3.3.1 Basics of the Notebook Interface

3.3.2 Notebooks in the Cloud

3.4 Hugging Face Diffusers

3.4.1 Imports

3.4.2 The DiffusionPipeline

3.4.3 Creating our PRNG

3.4.4 Variables

3.4.5 Creating an Image

3.5 Creating our Graphic Novel Page

3.6 Summary

4 Understanding How Stable Diffusion Works

4.1 Artificial Intelligence, Machine Learning and Neural Networks

4.1.1 Artificial Intelligence

4.1.2 Machine Learning

4.1.3 Neural Networks

4.2 Overview of How Stable Diffusion Works

4.2.1 A High Level overview of how Diffusion models work.

4.3 The Main Components of Stable Diffusion

4.3.1 Compressing Images with the Variational Autoencoder

4.3.2 Transforming Text with the CLIP Encoder

4.3.3 Estimating Noise with the U-NET

4.3.4 Putting it all together with the Scheduler/Sampler

4.4 There and back again, Stable Diffusion in code.

4.5 Summary

PART 2: CREATING BETTER IMAGES

5 Using Custom Checkpoints for Better Images

5.1 Customizing Stable Diffusion

5.1.1 Models, Fine-tuning and checkpoints.

5.1.2 Starting with our base checkpoint.

5.2 CivitAI

5.2.1 epiCPhotoGasm

5.2.2 Store Bought Gyoza

5.3 Hugging Face

5.3.1 Using Hugging Face models with A1111

5.3.2 Using Hugging Face Model with Diffusers

5.3.3 Conclusion

5.4 Summary

6 Improving Images with Samplers

6.1 Understanding Sampling Methods and Schedule Types

6.1.1 Sampling Methods

6.1.2 Schedule Type

6.2 Exploring Samplers and Schedule Types

6.2.1 Samplers we’ll be exploring

6.2.2 Exploring our 3 Samplers

6.2.3 Exploring Schedule Types

6.2.4 Ancestral Samplers

6.3 Conclusion

6.4 Summary

7 Creating High Resolution Images

7.1 Getting Started

7.1.1 A Blade Runner Themed Desktop Wallpaper

7.1.2 Why Can’t I Just Increase the Resolution?

7.1.3 Creating Our Low Resolution Image

7.2 Using Upscalers

7.2.1 Configuring the Scale Factor

7.2.2 Denoising Strength

7.2.3 Hi-res steps

7.3 Image-to-Image Generation

7.3.1 Changing the Seed

7.3.2 Using Different Checkpoints for Img2img

7.3.3 Img2img2img2img

7.3.4 Change that Prompt

7.4 Conclusion

7.5 Summary

8 Building Advanced Workflows with ComfyUI

8.1 Making Anime-Style Ramen Real

8.1.1 A1111 Workflow

8.1.2 Introducing ComfyUI

8.2 Creating an Image with ComfyUI

8.2.1 Adding Our First Node - KSampler

8.2.2 Loading Our Checkpoint

8.2.3 Adding Our Prompt (Positive and Negative)

8.2.4 Empty Latent Image

8.2.5 Organizing Nodes Into A Group

8.2.6 VAE Decoding and Saving Our Image

8.3 Implementing Hi-res Fix in ComfyUI

8.3.1 Upscaling Latents

8.3.2 Adding another VAE Decode and Save Image

8.4 Implementing Img2Img Upscaling

8.4.1 Using Existing ComfyUI Workflows

8.4.2 Generating Realistic Ramen Images From Anime Images.

8.5 Summary

PART 3: ADVANCED TECHNIQUES

9 Better Image Generation with Flux

9.1 Why Flux

9.1.1 Enhanced Image Quality

9.1.2 Better Human Anatomy

9.1.3 High Fidelity Text Generation

9.1.4 Prompt Adherence

9.2 Why Stable Diffusion 1.5 and XL?

9.2.1 Resource Requirements

9.2.2 Style

9.3 Conclusion

9.4 Summary

10 The Flux Workflow

10.1 Getting Started

10.1.1 Loading the Workflow

10.1.2 Getting the Necessary Files

10.1.3 Quantized Models and Loading the Model Files

10.2 Prompt and Guidance

10.3 Sampling

10.3.1 Setting Seeds with RandomNoise

10.3.2 ModelSamplingFlux - Base Shift and Max Shift

10.3.3 The BasicScheduler

10.4 The Remaining nodes

10.5 Summary

11 Deeper Customization Using LoRA

11.1 VRAM and Diffusion Models

11.2 What are LoRAs?

11.3 Visualizing Kafka

11.4 Using Your First LoRA

11.4.1 Setting up the LoRA

11.4.2 Modifying LoRA Streams

11.4.3 A Tarot Card LoRA

11.4.4 Using Trigger Words

11.5 Concept LoRAs

11.6 Combining LoRAs

11.7 Conclusion

11.8 Summary

12 Using ControlNets

12.1 Getting Started with ControlNets

12.1.1 The Basics of Control Nets

12.1.2 Image Preprocessing with ComfyUI Controlnet Aux

12.1.3 General Workflow for using ControlNets

12.1.4 ControlNets and Checkpoint files

12.2 Creating Hidden Images with QR Code Control

12.2.1 QR Code control reference image

12.2.2 Generating our Hidden Image

12.3 Stylizing Real Images with Scribble

12.3.1 Setup for Scribble ControlNet

12.3.2 Generating our Anime Style Crowmeo

12.4 Controlling Poses with Open Pose

12.4.1 Setup for Open Pose ControlNet

12.4.2 Generating our Meditating Cyber-Ninja

12.5 Composing Scenes with Semantic Segmentation

12.5.1 Setup for the Semantic Segmentation ControlNet

12.5.2 Generating our Street Scene in the Autumn

12.6 Conclusion

12.7 Summary

PART 4: CREATING YOUR OWN MODELS

13 Training your own LoRA

14 Checkpoint Merging

15 Fine-tuning your own Stable Diffusion Model

Appendix

Appendix A: Installing Stable Diffusion

Overview

10 The Flux Workflow

This chapter introduces Flux as a powerful, open image generation model and walks through how to use it effectively in ComfyUI. It contrasts Flux’s workflow with familiar Stable Diffusion pipelines, highlighting its modular model loading, new guidance mechanism, and a more nuanced sampling stack. The focus is practical: get set up, understand what’s different, and learn the few levers that most strongly shape output quality, style adherence, and realism.

Setup centers on loading separate components rather than a single checkpoint: two CLIP encoders (for stronger prompt understanding), a VAE, and the Flux.1-dev UNet. Because Flux is memory-intensive, the chapter recommends an fp8 quantized UNet variant to substantially cut VRAM usage with only minor quality trade-offs, making high-end consumer GPUs and unified-memory Macs more viable. It also explains the DualCLIPLoader’s role in prompt adherence, notes when custom VAEs matter, and underscores that you can run the workflow without mastering every node—once the files are in place, you can prompt and go.

Flux’s “Guidance” replaces traditional CFG behavior and has a larger, more direct impact: lowering it can both improve style fidelity (e.g., manuscript-like outputs) and reduce the “plasticy” look in photorealistic renders. Sampling is more involved: RandomNoise adds seed control modes (fixed, increment, decrement, randomize); ModelSamplingFlux introduces max_shift (highly impactful on color separation and overall look) and base_shift (subtle and sometimes no-op at certain resolutions); and BasicScheduler exposes denoise, which changes results dramatically within a narrow range and should be adjusted in small steps. The chapter closes by noting batch-size constraints, resolution inputs, and the overarching theme: experiment with Guidance, max_shift, and denoise to steer Flux, while using quantization to fit your hardware.

Drag this image to ComfyUI to get the Flux workflow.

The Flux workflow is notably different than SD1.5 and SDXL

Our adorable first image from Flux

The Load Diffusion Model node allows you to select the U-NET.

Using the fp8 model only minorly impacts the quality of our output.

The DualCLIPLoader allows us to use two CLIP encoders.

FluxGuidance and BasicGuider nodes.

The Guidance parameter in Flux can help the model adhere to style better.

Lower Guidance also helps to make images look less “plasticy”.

Sampling is notably more complex when working with Flux.

RandomNoize adds a control_after_generation option.

Noise Schedule when max_shift and base_shift both are 0.0.

With max_shift of 1.5, the noise removal is no longer uniform.

Max shift can have a notable impact on the resulting image generated.

Base shift slightly modifies the denoising curve.

While there is a difference in images, it is more subtle.

Under some aspect ratios, base_shift has no impact at all!

Denoise should be changed in very small increments

Summary

The Flux workflow has quite a few differences from the standard Stable Diffusion workflow.
When working with models requiring high memory, it is helpful to look for quantized versions of these models.
Part of the reason Flux is so good at prompt adherence is because it uses two CLIP encoders.
Guidance in Flux is similar to CFG in SD1.5 and SDXL, but doesn’t allow for a negative prompt.
Reducing Guidance can help the model pay more attention to the style recommended in the prompt.
Flux allows for much more nuanced control over scheduling.
Max shift can have a pretty major impact on your final result, but base shift is less impactful and in many cases will have no impact at all.
Reducing the Denoise value can make the image look nicer, but should only be done in very small steps as it can quickly degrade the image.

FAQ

How do I load the Flux workflow into ComfyUI?

Go to the ComfyUI Flux examples page (https://comfyanonymous.github.io/ComfyUI_examples/flux/) and drag the example image into the ComfyUI canvas. The workflow is embedded in the image metadata. After dropping it, set your prompt and click “Queue prompt.” Be sure you’ve downloaded and placed the required model files first (see next FAQ).

Which files do I need for Flux, and where do I put them?

You need four files: - CLIP text encoders (place in ComfyUI/models/clip): clip_l.safetensors and t5xxl_fp16.safetensors from https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main - VAE (place in ComfyUI/models/vae): ae.safetensors from https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors - Flux model (place in ComfyUI/models/unet): flux-1dev.safetensors from https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

My GPU runs out of VRAM with Flux.1-dev. How can I make it work?

Use the quantized fp8 UNet to cut VRAM roughly in half: download flux1-dev-fp8.safetensors from https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors and put it in ComfyUI/models/unet. In the Load Diffusion Model node, set unet_name to flux1-dev-fp8.safetensors. Expect only minor quality differences. Also keep batch_size at 1. On Apple Silicon, unified memory may let you run the full model.

Why doesn’t Flux use a single checkpoint like SD1.5/SDXL?

Flux splits components: you load the UNet (Load Diffusion Model), the VAE (Load VAE), and two text encoders via DualCLIPLoader. Unlike SD1.5/SDXL “checkpoint” files, Flux requires these parts separately, and the UNet goes in models/unet (not models/checkpoints).

What does the DualCLIPLoader do, and why are there two text encoders?

Flux uses two encoders (clip_l.safetensors and t5xxl_fp16.safetensors) to improve prompt adherence—each can capture different aspects of your text. The DualCLIPLoader node loads both and has a type option (e.g., flux, sd3, sdxl) you can experiment with.

How is Guidance in Flux different from CFG, and what values should I try?

Flux’s Guidance (via FluxGuidance and BasicGuider) isn’t the same as SD’s CFG and doesn’t rely on a negative prompt. It strongly affects both style and realism. The default is 3.5; lowering Guidance often improves style adherence and reduces the “plasticky” look in photos (e.g., try around 1.3–2.0). Very high values can look distorted/cartoonish.

Why does my seed change every run, and how do I make results reproducible?

The RandomNoise node has control_after_generate. By default it’s randomize, which picks a new seed each run. Set it to fixed to reuse the same seed, or use increment/decrement to step the seed by 1 per generation.

What do max_shift and base_shift in ModelSamplingFlux control?

They shape the denoising schedule (how much noise is removed per step). max_shift has a noticeable effect: higher values often yield more color separation and different structure; try starting at 0.0 and experimenting around 1.5–3.0. base_shift is subtle and can have no effect at common resolutions like 1024×1024; it’s usually fine to leave base_shift at 0 and focus on max_shift.

What does the BasicScheduler’s denoise slider do, and how should I set it?

denoise controls how completely the noise is removed (1.0 = full). Reducing slightly (e.g., ~0.9) can add a more illustrative look, but quality degrades rapidly below that. Change in very small increments near 1.0. The node also sets the sampler (scheduler) and the number of steps.

What is EmptySD3LatentImage for, and how should I configure it?

It creates the initial latent “noise” tensor. Set batch_size (most hardware can only handle 1 with Flux). Height and width are provided by separate nodes; keep their control_after_generate set to fixed—auto-changing dimensions between runs isn’t practical.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $27.99

you save $12.00 (30%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $27.99

you save $12.00 (30%)

eBook

pdf, ePub, online

$39.99 $27.99

you save $12.00 (30%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more