A Damn Fine Stable Diffusion Book you own this product

Create awesome images with AI

Will Kurt

MEAP began April 2024
Last updated October 2025
Publication in Spring 2026 (estimated)

ISBN 9781633436800
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / AI

table of content

PART 1: GETTING STARTED

1 Making our First Image: “A damn fine cup of coffee”

1.1 The Basics of Text-to-Image Creation

1.1.1 The Prompt

1.1.2 Creating more images: Iterations vs Batches

1.1.3 Random Number Generators and The Seed

1.1.4 Adjusting Height and Width

1.2 Prompt Engineering

1.2.1 Favor clear, descriptive prompts

1.2.2 Give you image context

1.2.3 Describe a style for your image

1.3 Hacking the source code!

1.4 Summary

2 Easier Image Generation with Stable Diffusion Webui

2.1 Familiarizing yourself with A1111

2.1.1 Viewing the images you’ve created

2.2 New prompting tricks

2.2.1 Negative Prompting

2.2.2 Prompt Attention

2.2.3 “Sandworm” or “Sand worm”? Understanding Tokens

2.3 Other options to tweak

2.3.1 Classifier-Free Guidance (CFG) Scale

2.3.2 Sampling Steps

2.4 PNG Info

2.5 Summary

3 Programming Stable Diffusion with Hugging Face Diffusers

3.1 Why use Python to work with Stable Diffusion?

3.2 Creating a Page for a Graphic Novel

3.3 Working in Python Notebooks

3.3.1 Basics of the Notebook Interface

3.3.2 Notebooks in the Cloud

3.4 Hugging Face Diffusers

3.4.1 Imports

3.4.2 The DiffusionPipeline

3.4.3 Creating our PRNG

3.4.4 Variables

3.4.5 Creating an Image

3.5 Creating our Graphic Novel Page

3.6 Summary

4 Understanding How Stable Diffusion Works

4.1 Artificial Intelligence, Machine Learning and Neural Networks

4.1.1 Artificial Intelligence

4.1.2 Machine Learning

4.1.3 Neural Networks

4.2 Overview of How Stable Diffusion Works

4.2.1 A High Level overview of how Diffusion models work.

4.3 The Main Components of Stable Diffusion

4.3.1 Compressing Images with the Variational Autoencoder

4.3.2 Transforming Text with the CLIP Encoder

4.3.3 Estimating Noise with the U-NET

4.3.4 Putting it all together with the Scheduler/Sampler

4.4 There and back again, Stable Diffusion in code.

4.5 Summary

PART 2: CREATING BETTER IMAGES

5 Using Custom Checkpoints for Better Images

5.1 Customizing Stable Diffusion

5.1.1 Models, Fine-tuning and checkpoints.

5.1.2 Starting with our base checkpoint.

5.2 CivitAI

5.2.1 epiCPhotoGasm

5.2.2 Store Bought Gyoza

5.3 Hugging Face

5.3.1 Using Hugging Face models with A1111

5.3.2 Using Hugging Face Model with Diffusers

5.3.3 Conclusion

5.4 Summary

6 Improving Images with Samplers

6.1 Understanding Sampling Methods and Schedule Types

6.1.1 Sampling Methods

6.1.2 Schedule Type

6.2 Exploring Samplers and Schedule Types

6.2.1 Samplers we’ll be exploring

6.2.2 Exploring our 3 Samplers

6.2.3 Exploring Schedule Types

6.2.4 Ancestral Samplers

6.3 Conclusion

6.4 Summary

7 Creating High Resolution Images

7.1 Getting Started

7.1.1 A Blade Runner Themed Desktop Wallpaper

7.1.2 Why Can’t I Just Increase the Resolution?

7.1.3 Creating Our Low Resolution Image

7.2 Using Upscalers

7.2.1 Configuring the Scale Factor

7.2.2 Denoising Strength

7.2.3 Hi-res steps

7.3 Image-to-Image Generation

7.3.1 Changing the Seed

7.3.2 Using Different Checkpoints for Img2img

7.3.3 Img2img2img2img

7.3.4 Change that Prompt

7.4 Conclusion

7.5 Summary

8 Building Advanced Workflows with ComfyUI

8.1 Making Anime-Style Ramen Real

8.1.1 A1111 Workflow

8.1.2 Introducing ComfyUI

8.2 Creating an Image with ComfyUI

8.2.1 Adding Our First Node - KSampler

8.2.2 Loading Our Checkpoint

8.2.3 Adding Our Prompt (Positive and Negative)

8.2.4 Empty Latent Image

8.2.5 Organizing Nodes Into A Group

8.2.6 VAE Decoding and Saving Our Image

8.3 Implementing Hi-res Fix in ComfyUI

8.3.1 Upscaling Latents

8.3.2 Adding another VAE Decode and Save Image

8.4 Implementing Img2Img Upscaling

8.4.1 Using Existing ComfyUI Workflows

8.4.2 Generating Realistic Ramen Images From Anime Images.

8.5 Summary

PART 3: ADVANCED TECHNIQUES

9 Better Image Generation with Flux

9.1 Why Flux

9.1.1 Enhanced Image Quality

9.1.2 Better Human Anatomy

9.1.3 High Fidelity Text Generation

9.1.4 Prompt Adherence

9.2 Why Stable Diffusion 1.5 and XL?

9.2.1 Resource Requirements

9.2.2 Style

9.3 Conclusion

9.4 Summary

10 The Flux Workflow

10.1 Getting Started

10.1.1 Loading the Workflow

10.1.2 Getting the Necessary Files

10.1.3 Quantized Models and Loading the Model Files

10.2 Prompt and Guidance

10.3 Sampling

10.3.1 Setting Seeds with RandomNoise

10.3.2 ModelSamplingFlux - Base Shift and Max Shift

10.3.3 The BasicScheduler

10.4 The Remaining nodes

10.5 Summary

11 Deeper Customization Using LoRA

11.1 VRAM and Diffusion Models

11.2 What are LoRAs?

11.3 Visualizing Kafka

11.4 Using Your First LoRA

11.4.1 Setting up the LoRA

11.4.2 Modifying LoRA Streams

11.4.3 A Tarot Card LoRA

11.4.4 Using Trigger Words

11.5 Concept LoRAs

11.6 Combining LoRAs

11.7 Conclusion

11.8 Summary

12 Using ControlNets

12.1 Getting Started with ControlNets

12.1.1 The Basics of Control Nets

12.1.2 Image Preprocessing with ComfyUI Controlnet Aux

12.1.3 General Workflow for using ControlNets

12.1.4 ControlNets and Checkpoint files

12.2 Creating Hidden Images with QR Code Control

12.2.1 QR Code control reference image

12.2.2 Generating our Hidden Image

12.3 Stylizing Real Images with Scribble

12.3.1 Setup for Scribble ControlNet

12.3.2 Generating our Anime Style Crowmeo

12.4 Controlling Poses with Open Pose

12.4.1 Setup for Open Pose ControlNet

12.4.2 Generating our Meditating Cyber-Ninja

12.5 Composing Scenes with Semantic Segmentation

12.5.1 Setup for the Semantic Segmentation ControlNet

12.5.2 Generating our Street Scene in the Autumn

12.6 Conclusion

12.7 Summary

PART 4: CREATING YOUR OWN MODELS

13 Training your own LoRA

14 Checkpoint Merging

15 Fine-tuning your own Stable Diffusion Model

Appendix

Appendix A: Installing Stable Diffusion

Overview

1 Making our First Image: “A damn fine cup of coffee”

The chapter introduces Stable Diffusion through the lens of remix culture, likening it to early hip-hop’s art of combining familiar elements into something new. It invites readers to channel imagination into images, while acknowledging that compelling results come from learning tools and techniques rather than one-click magic. A key theme is empowerment: because Stable Diffusion is open source, a thriving community continually extends its capabilities, and everything in the book can be run on consumer hardware, making exploration accessible and fun.

Readers are guided through the essentials of text-to-image generation using CompVis’s implementation and the prompt “A damn fine cup of coffee.” The chapter explains how to run the script, why generating many images helps, and how iterations versus batch size affect throughput and GPU VRAM—with little real-time advantage to larger batches. It introduces seeds for reproducibility and variety, then shows how height and width constraints (and especially aspect ratio) can meaningfully alter composition and content, all while reminding that higher pixel counts demand more VRAM.

The chapter then focuses on prompt engineering: favor clear, descriptive language over vague poetry, add contextual scene details, and specify artistic styles to steer results—iterating as you go. Examples evolve from a plain “cup of black coffee” to adding a diner setting and stylistic cues like “surrealist painting” or “wood etching” to achieve mood and avoid the uncanny valley. It closes by demonstrating a simple source edit to disable the NSFW safety checker (behind the notorious placeholder image), emphasizing the user control open source provides and setting up more powerful, user-friendly workflows in subsequent chapters.

Getting my imagination on the screen

Browsing an infinite library of Pulp Sci-Fi that never was.

Who knew monks were such avid readers of sci-fi?

Envisioning ancient aliens.

The initial 6 images created by our prompt: “A damn fine cup of coffee.”

Average seconds to create an image, comparing iterations and batch.

Generating 30 images at once.

Creating 6 different images with seed 12345.

Images with a 5:3 landscape aspect ratio.

Images with a 3:5 aspect ratio using the same seed.

Images with a 3:7 aspect ratio using the same seed.

Images with a 4:1 aspect ratio using the same seed.

A poetic prompt does not always yield poetic images.

A straight forward prompt yields more cups of black coffee.

Adding a scene to an image can help provide context.

Choosing a landscape aspect ratio helps display the counter.

Creating surrealistic images.

Images in the style of a wood etching.

I would say that’s a damn fine cup of coffee!

Being “Rick-rolled” by Stable Diffusion.

Summary

Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.
Because Stable Diffusion is open source we (as well as the entire community of users) can change and extend its behavior.

FAQ

How do I set up the exact tools used in this chapter?

Install the CompVis Stable Diffusion repository (https://github.com/CompVis/stable-diffusion) and download the v1.5 model checkpoint from https://huggingface.co/runwayml/stable-diffusion-v1-5. Activate the conda environment you created during setup (e.g., run: conda activate ldm).

How do I generate my first image from the prompt “A damn fine cup of coffee”?

From the repo root, run: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" [optionally add --ckpt=/path/to/model.ckpt]. By default it produces 6 images saved to outputs/txt2img-samples/samples.

I get “model.ckpt not found.” How do I fix it?

Either place the model checkpoint where the script expects it (per the book’s install steps), or point to it explicitly with --ckpt, for example: --ckpt=/path/to/stable-diffusion-v1-5.ckpt.

Why did I get 6 images by default, and how can I create 30?

Defaults are n_iter=2 (iterations) and n_samples=3 (batch size), yielding 2 × 3 = 6 images. To get 30, multiply them to 30 (e.g., --n_samples=5 --n_iter=6). Keep in mind higher batch sizes consume more GPU VRAM.

What’s the difference between iterations (n_iter) and batch size (n_samples)?

Batch size makes multiple images in parallel on the GPU (higher VRAM cost). Iterations make images sequentially. In practice, batches often don’t speed things up much; many users prefer small batches (even 1) to conserve VRAM.

Why did the first 6 images repeat when I generated more, and how do I avoid repeats?

Stable Diffusion is seeded. With the same seed and settings, you’ll reproduce the same images. The default seed is 42. Change it via --seed= to get new results, or vary other parameters (like size) to change outcomes.

Which image sizes are valid, and how do height and width affect results?

Use dimensions that are multiples of 8 (the chapter sticks to multiples of 128). Larger H×W uses more VRAM. Changing aspect ratio (e.g., landscape 384×640 vs. portrait 640×384) can noticeably change composition and subject placement.

How can I improve results with prompt engineering in this example?

Be clear and descriptive (e.g., “A cup of black coffee” instead of poetic phrasing). Add context (“on a diner counter”) and stylistic cues (“surrealist painting”, “wood etching”). Iterate: generate, inspect, tweak, and repeat.

Some outputs look uncanny or have odd artifacts (like extra handles). What can I do?

Try non-photorealistic styles to avoid the uncanny valley, adjust the seed and generate more images to curate better results, and refine your prompt/context. Later techniques can further reduce artifacts, but iteration helps a lot.

What’s the “NSFW safety checker” image about, and can I turn the filter off?

If the safety checker flags an image, it may be replaced (famously with a Rick Astley placeholder). Because the code is open source, you can edit scripts/txt2img.py (search for check_safety) and modify that function to return the original image and indicate no NSFW detected. Do this responsibly and at your discretion.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$39.99 $25.99

you save $14.00 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$39.99 $25.99

you save $14.00 (35%)

eBook

pdf, ePub, online

$39.99 $25.99

you save $14.00 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more