Overview

1 Making our First Image: “A damn fine cup of coffee”

The chapter introduces Stable Diffusion through the lens of remix culture, likening it to early hip-hop’s art of combining familiar elements into something new. It invites readers to channel imagination into images, while acknowledging that compelling results come from learning tools and techniques rather than one-click magic. A key theme is empowerment: because Stable Diffusion is open source, a thriving community continually extends its capabilities, and everything in the book can be run on consumer hardware, making exploration accessible and fun.

Readers are guided through the essentials of text-to-image generation using CompVis’s implementation and the prompt “A damn fine cup of coffee.” The chapter explains how to run the script, why generating many images helps, and how iterations versus batch size affect throughput and GPU VRAM—with little real-time advantage to larger batches. It introduces seeds for reproducibility and variety, then shows how height and width constraints (and especially aspect ratio) can meaningfully alter composition and content, all while reminding that higher pixel counts demand more VRAM.

The chapter then focuses on prompt engineering: favor clear, descriptive language over vague poetry, add contextual scene details, and specify artistic styles to steer results—iterating as you go. Examples evolve from a plain “cup of black coffee” to adding a diner setting and stylistic cues like “surrealist painting” or “wood etching” to achieve mood and avoid the uncanny valley. It closes by demonstrating a simple source edit to disable the NSFW safety checker (behind the notorious placeholder image), emphasizing the user control open source provides and setting up more powerful, user-friendly workflows in subsequent chapters.

Getting my imagination on the screen
Browsing an infinite library of Pulp Sci-Fi that never was.
Who knew monks were such avid readers of sci-fi?
Envisioning ancient aliens.
The initial 6 images created by our prompt: “A damn fine cup of coffee.”
Average seconds to create an image, comparing iterations and batch.
Generating 30 images at once.
Creating 6 different images with seed 12345.
Images with a 5:3 landscape aspect ratio.
Images with a 3:5 aspect ratio using the same seed.
Images with a 3:7 aspect ratio using the same seed.
Images with a 4:1 aspect ratio using the same seed.
A poetic prompt does not always yield poetic images.
A straight forward prompt yields more cups of black coffee.
Adding a scene to an image can help provide context.
Choosing a landscape aspect ratio helps display the counter.
Creating surrealistic images.
Images in the style of a wood etching.
I would say that’s a damn fine cup of coffee!
Being “Rick-rolled” by Stable Diffusion.

Summary

  • Generating with Stable Diffusion is an iterative process, in which we are constantly revising our settings and prompts.
  • Despite the many ways to improve images, it’s always a good idea to generate a variety of images to see if we find a particular one that stands out to us as pleasing.
  • Our prompts should be clear and descriptive. Giving some context for the object we’re prompting can change the image dramatically. Describing the style of the image can further let us change the feeling of the images we’re generating.
  • The aspect ratio that we use to generate an image can have a major impact on the way the image looks. Consider whether the image you want to create would look better as a square, a landscape or portrait.
  • Because Stable Diffusion is open source we (as well as the entire community of users) can change and extend its behavior.

FAQ

How do I set up the exact tools used in this chapter?Install the CompVis Stable Diffusion repository (https://github.com/CompVis/stable-diffusion) and download the v1.5 model checkpoint from https://huggingface.co/runwayml/stable-diffusion-v1-5. Activate the conda environment you created during setup (e.g., run: conda activate ldm).
How do I generate my first image from the prompt “A damn fine cup of coffee”?From the repo root, run: python ./scripts/txt2img.py --prompt="A damn fine cup of coffee" [optionally add --ckpt=/path/to/model.ckpt]. By default it produces 6 images saved to outputs/txt2img-samples/samples.
I get “model.ckpt not found.” How do I fix it?Either place the model checkpoint where the script expects it (per the book’s install steps), or point to it explicitly with --ckpt, for example: --ckpt=/path/to/stable-diffusion-v1-5.ckpt.
Why did I get 6 images by default, and how can I create 30?Defaults are n_iter=2 (iterations) and n_samples=3 (batch size), yielding 2 × 3 = 6 images. To get 30, multiply them to 30 (e.g., --n_samples=5 --n_iter=6). Keep in mind higher batch sizes consume more GPU VRAM.
What’s the difference between iterations (n_iter) and batch size (n_samples)?Batch size makes multiple images in parallel on the GPU (higher VRAM cost). Iterations make images sequentially. In practice, batches often don’t speed things up much; many users prefer small batches (even 1) to conserve VRAM.
Why did the first 6 images repeat when I generated more, and how do I avoid repeats?Stable Diffusion is seeded. With the same seed and settings, you’ll reproduce the same images. The default seed is 42. Change it via --seed= to get new results, or vary other parameters (like size) to change outcomes.
Which image sizes are valid, and how do height and width affect results?Use dimensions that are multiples of 8 (the chapter sticks to multiples of 128). Larger H×W uses more VRAM. Changing aspect ratio (e.g., landscape 384×640 vs. portrait 640×384) can noticeably change composition and subject placement.
How can I improve results with prompt engineering in this example?Be clear and descriptive (e.g., “A cup of black coffee” instead of poetic phrasing). Add context (“on a diner counter”) and stylistic cues (“surrealist painting”, “wood etching”). Iterate: generate, inspect, tweak, and repeat.
Some outputs look uncanny or have odd artifacts (like extra handles). What can I do?Try non-photorealistic styles to avoid the uncanny valley, adjust the seed and generate more images to curate better results, and refine your prompt/context. Later techniques can further reduce artifacts, but iteration helps a lot.
What’s the “NSFW safety checker” image about, and can I turn the filter off?If the safety checker flags an image, it may be replaced (famously with a Rick Astley placeholder). Because the code is open source, you can edit scripts/txt2img.py (search for check_safety) and modify that function to return the original image and indicate no NSFW detected. Do this responsibly and at your discretion.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • A Damn Fine Stable Diffusion Book ebook for free