Fabulous Adventures in Data Structures and Algorithms you own this product

Eric Lippert

MEAP began October 2025
Last updated March 2026
Publication in Summer 2026 (estimated)

ISBN 9781633435032
400 pages (estimated)

Included with a Manning Online subscription

printed in black & white

catalog / Software Development / Software Engineering

resources: Source code Book forum Source code on GitHub

table of content

1 Starting a fabulous adventure

1.1 Defining data structures, algorithms and complexity

1.2 An immutable linked list

1.2.1 Performance so far

1.2.2 Reversing an immutable linked list

1.3 The challenges ahead

1.4 Summary

PART 1: EXTENDING THE BASICS

2 Immutable stacks and queues

2.1 Why immutability?

2.1.1 Correctness

2.1.2 Historical preservation

2.1.3 Security: TOCTOU me about it

2.1.4 Safer multithreading

2.1.5 Memoization for time performance

2.1.6 Persistence for space performance

2.1.7 The functional programming attitude

2.2 An immutable stack

2.2.1 A covariant immutable stack

2.3 A queue, a queue, an immutable queue

2.4 Mutable wrappers

2.5 Undo and redo

2.5.1 Create an army of clones

2.5.2 Use a different data structure and the command pattern

2.5.3 Undo and redo with mutable-over-immutable data structures

2.6 The Hughes List: build it for cheap, pay for it later

2.6.1 Reverse redux

2.6.2 Currying and partial application

2.6.3 Implementing the Hughes list

2.6.4 Complexity of the Hughes list

2.6.5 Where is the data in this data structure?

2.6.6 A couple more performance considerations

2.6.7 Reverse a linked list

2.7 Summary

3 An immutable deque

3.1 An immutable deque ADT

3.2 A bad naïve implementation

3.3 A finger tree

3.3.1 Step one: the mini-deque

3.3.2 A new definition of a deque

3.4 Visualizing the data structure

3.5 Amortized performance of the deque

3.6 Are we abusing the type system?

3.7 Concatenation of deques

3.7.1 Performance after adding concatenation

3.8 Why is this called a “finger tree”?

3.9 Summary

4 Memoizing immutable quadtrees to make a better Life

4.1 The rules of Life

4.2 A typical first attempt

4.2.1 Performance of the naïve implementation

4.2.2 Same algorithm, better constant factor

4.2.3 Change tracking improves the algorithm

4.3 An immutable quadtree

4.3.1 The IQuad interface

4.3.2 Implementing the 0-quad leaf cells

4.3.3 A strategy for compressing space

4.3.4 A general-purpose memoizer

4.3.5 A memoized quadtree implementation

4.3.6 Indexing an immutable quadtree like an array

4.3.7 A few more helpful extension methods

4.4 Gosper’s algorithm

4.4.1 The base case: stepping a 2-quad produces a 1-quad

4.4.2 A first attempt at a recursive algorithm

4.4.3 The grid always shrinks

4.4.4 Is this algorithm inefficient?

4.4.5 The big insight of Gosper’s Algorithm

4.4.6 Putting it all together

4.5 Summing up

5 What’s up with you, Directed Acyclic Word Graph?

5.1 Two problems, two weak solutions

5.1.1 Hash sets are a non-solution

5.1.2 Sorted lists are… fine?

5.2 The prefix tree

5.2.1 Nodes and edges

5.2.2 The trie building algorithm

5.2.3 Using a trie as a word list

5.2.4 How much memory are we saving with a trie?

5.3 Tries are not-so-secretly finite state automata

5.4 Building a DAWG

5.4.1 What problems must we solve to build a DAWG?

5.4.2 Equivalence of nodes

5.4.3 The DAWG building algorithm.

5.4.4 How much expense does optimizing the graph as you go add?

5.5 DAWG vs trie for ENABLE

5.6 Summing Up

6 Combinatorial algorithms

6.1 The Cartesian product

6.1.1 The Cartesian product of a few sets or sequences

6.1.2 The Cartesian product of arbitrarily many sequences

6.1.3 The connection to the integers

6.2 Permutations

6.2.1 Lexicographic permutations with repetitions

6.2.2 The factorial base representation of permutation numbers

6.2.3 The Fischer-Yates shuffling algorithm

6.2.4 A recursive “change ringing” algorithm

6.2.5 Even’s “change ringing” algorithm

6.3 Combinations

6.3.1 Lexicographic combinations, with a twist

6.3.2 Counting combinations

6.3.3 The combinatorial base representation of combinations

6.4 Summary

7 Interlude one: basic category theory for programmers

7.1 Covariant endofunctors in the category of types

7.2 Contravariant endofunctors in the category of types

7.3 Summary

PART 2: SEARCHING, SOLVING, INFERRING

8 Coloring graphs with backtracking search

8.1 Coloring South America

8.2 An immutable multi-dictionary

8.3 An immutable undirected graph

8.4 Coloring simple graphs

8.5 Solving sudokus with backtracking search

8.5.1 Graph coloring is NP-complete

8.5.2 Implementing a general backtracker

8.5.3 How could we improve?

8.5.4 Backtracking and the Cartesian product

8.6 Scheduling problems are graph coloring problems

8.7 Summary

9 Greedy iterative pretty printing

9.1 The pretty printing problem

9.2 Greedy algorithms, and making change

9.3 A greedy pretty printing algorithm

9.3.1 The doc data structure

9.3.2 Visualizing a doc; how deep does it get?

9.3.3 Phase two: implementing Fits() without recursion

9.3.4 Phase two continued: implementing Pretty() without recursion

9.4 Phase one: transforming parse trees with the visitor pattern

9.4.1 Implementing the visitor pattern

9.4.2 From parse tree to doc

9.5 Summary

10 Unification of binary terms

10.1 Unifying binary terms

10.1.1 Unifying binary terms, attempt one

10.1.2 Unifying binary terms, this time with an occurs check

10.2 The performance of binary term unification

10.3 Type inference and logic programming

10.4 Summary

11 Anti-unification of binary terms

11.1 Anti-unifying binary terms

11.2 The first-order binary term anti-unification algorithm

11.2.1 Performance of Reynolds’ anti-unification algorithm

11.3 Clone detection and fix deduction

11.4 Summary

12 Interlude two: monads

12.1 What’s the value for OO programmers?

12.2 How hard can it be to divide by two?

12.3 Generalizing to arbitrary functions with Map and Bind

12.4 The sequence monad is an additive monad

12.5 Summary

PART 3: PROBABILITIES

13 A better abstraction for randomness

14 The alias algorithm for the categorical distribution

15 Conditional probability with Bayes’ Theorem

16 Interlude three: the probability monad

17 Markov processes

18 Continuous distributions with the Metropolis algorithm

Overview

5 What’s up with you, Directed Acyclic Word Graph?

This chapter tackles practical ways to store and search large word lists for exact membership and rapid prefix enumeration, problems that show up in autocomplete and word games. It frames the required operations with a simple interface and compares options: hash sets are excellent for membership tests but are essentially unusable for prefix queries, while sorted lists enable binary search with O(m log n) time but can still be costly per match and consume notable memory. To improve both speed and space, the chapter pursues data structures that share common substrings and highlights a key theoretical lens: the connection to finite state automata.

Tries address time performance by deduplicating prefixes: words are paths labeled by characters, with nodes marking word ends. They’re easy to build in O(mn) total time and support O(m) Contains and StartsWith, a clear win over sorted lists. However, naïve implementations are memory-hungry: many nodes, large per-node dictionary overhead, and vast numbers of redundant leaves and repeated suffixes in real lexicons. Viewing a trie as a deterministic finite-state automaton reveals that many nodes are equivalent; minimizing the automaton would remove them—ideally without first constructing a large, wasteful trie.

The solution is a Directed Acyclic Word Graph (DAWG), a minimal deterministic automaton that deduplicates both prefixes and suffixes. Using words in sorted order, the Daciuk–Watson–Watson algorithm incrementally maintains an almost-minimal graph by finding the common prefix with the previous word, recursively “fixing up” the previous word’s suffix with a canonical set of nodes and a structural equivalence test backed by a hash set, then appending the new suffix. The result preserves O(m) queries and achieves O(mn) construction while dramatically shrinking space in practice: for the ENABLE benchmark, nodes fall from about 388,000 in a trie to roughly 54,000 in a DAWG, with far fewer edges as well. Although object overhead still costs memory, DAWGs can be packed into tight byte arrays when needed, yielding an elegant, memory-efficient, high-performance approach to word-list search.

The trie for a nine-word list, laid out here left to right, and in alphabetical order from top to bottom. Node numbers are not data in the tree; they are just so we can refer to them in the discussion which follows. Double-circle nodes represent ends of words.

The trie for the word list “car”

The trie for the word list “car”, “care”.

A trie with nine words: car, care, cares, cars, fir, fire, firer, firers, firs

Deduplicating all the leaves of the trie produces an equivalent word graph.

We’ve added car, care, cares and cars. The graph is optimized for everything except “cars”. Node 6 is identical to node 5, but node 6 is in “cars” so we have not fixed it yet.

After adding “fir” the graph is optimized for every word except “fir”. State 9 could be replaced with state 5, so this graph is not optimized for the final word.

Do you notice something odd about this graph after adding “fire”?

After adding “firer” and “firers”, the nodes for “firers” are still not optimized.

We look for optimizations moving backwards from the end of the previous word added, and first discover that node 12 is redundant; node 11 should have an edge to node 5 instead.

We’ve now fully optimized the “ers” portion of “firers” and we’re ready to add “firs”.

“firs” is added, and everything is optimized except for node 13, which is redundant to node 4.

This is the smallest possible DAWG that represents those nine words.

Summing Up

Storing n words of average length m in a list such that we can generate all the words matching a particular prefix is a common problem. The number of words can be very large.
Hash sets are great at determining if a given word is contained in a set, but not at all good at generating completions given a prefix.
We can get O(m log n) search performance by using binary search on a sorted list, which is probably acceptable -- but this solution uses a surprisingly large amount of memory if we use off-the-shelf parts.
We can get O(m) search performance by using a trie to deduplicate prefixes, but the number of redundant nodes is potentially enormous.
By noticing that tries are a special case of finite state automata, we could use any one of many algorithms that minimize FSAs without changing the language they recognize. But starting with the huge, space-inefficient trie in memory and running an expensive, difficult minimization algorithm is not great; what we really want is to build an optimized DAWG one word at a time.
Efficiently detecting when two nodes are redundant and can be safely merged is a key sub-problem in graph minimization; if we are careful about not mutating a node after it is in the canonical set of optimized nodes, we can do so very efficiently.
We can make DAWGs very small in memory if we need to; this is exactly what game developers did back in the day.

FAQ

What problems does this chapter tackle, and what operations must a word list support?

It focuses on storing very large word lists and searching them efficiently, especially for autocomplete and word games. The two core operations are: Contains(word) to test membership, and StartsWith(prefix) to enumerate all words that begin with a prefix. It compares multiple data structures (hash sets, sorted lists, tries, DAWGs) for time and space trade-offs and shows how to build tries and minimal DAWGs from a sorted word list.

Why are hash sets a poor choice for prefix searches?

Hash sets excel at Contains with O(m) time (m = word length) because hashing processes each character once. However, they are essentially useless for StartsWith because good hash functions deliberately scatter similar strings across buckets; there’s no efficient way to enumerate all words sharing a prefix from a hash set.

How does a sorted list implementation work, and what are its time costs?

With a sorted List<string>, Contains uses binary search in O(m log n). StartsWith binary-searches to the first candidate in O(m log n), then scans forward, checking the prefix for each match in O(m) per result. Overall: O(m log n) to find the start plus O(m × number_of_matches). Memory with off‑the‑shelf strings can be several megabytes; a custom UTF‑8 byte array plus offsets can cut that substantially.

What is a trie, and how does it improve search performance?

A trie deduplicates common prefixes by representing words as paths from a root through character-labeled edges. Contains and StartsWith both run in O(m) to reach the node for the prefix; emitting each result costs O(m) to construct the string. Building a trie over n words of average length m takes O(mn). Implementation tips include using a StringBuilder during DFS to avoid accidental quadratic string concatenation.

What does a trie node store, and why hide the edge dictionary?

Each node tracks: a boolean IsEnd (word ends here), the outgoing edge labels, and the child nodes. Expose operations like HasEdge, FollowEdge, TryFollowEdge, TryAddNewNode instead of publishing the Dictionary<char,Node> directly. This keeps the API self-descriptive (nodes/edges, not dictionaries), preserves flexibility to change implementations, and keeps all common operations O(1).

Do tries actually save memory on large word lists?

They save character storage by deduplicating prefixes, but have heavy per-node overhead. For ENABLE (172,820 words; 1,570,508 chars), the trie had 387,881 nodes, including 111,507 identical leaf nodes. The object overhead (dictionaries, headers, references) can erase much of the character savings; search is faster, but memory usage can still be high.

What is a DAWG (a.k.a. DAFSA), and why is it more compact than a trie?

A DAWG is a Directed Acyclic Word Graph that deduplicates suffixes as well as prefixes, producing a minimal deterministic automaton for the word set. On ENABLE, the DAWG had 54,167 nodes vs 387,881 in the trie and 122,975 edges vs 387,880—about 86% fewer nodes. Many English words share suffixes (-s, -ed, -ing), so suffix deduplication yields huge wins. Conceptually, both tries and DAWGs are finite state automata; DAWG is the minimized FSA for the same language.

How is a minimal DAWG built from a sorted word list?

The algorithm adds words in ordinal (character-by-character) sorted order while maintaining this invariant: the graph is minimal for all words except the last one added. For each new word: find the common prefix with the previous word, optimize the previous word’s suffix right-to-left (merging equivalent nodes via a canonical set), then append the new suffix. After the final word, fix up the last suffix. Time remains O(mn). Important: use ordinal sorting (beware cases like German “ß” vs “ss”).

How are node equivalence and hash-set safety handled during minimization?

Two nodes are equivalent iff: (1) both are end-of-word or both are not; (2) they have the same number of outgoing edges; (3) they have the same set of edge labels; (4) corresponding edges point to the same child nodes by reference. Use an IEqualityComparer<Node> that compares these and computes a hash from IsEnd plus ordered edge labels and child references. Rule: never mutate objects once inserted into a hash-based set; only add nodes to the canonical set after they’re fully optimized, and mutate edges (ReplaceEdge) before a node becomes canonical.

How do DAWGs perform in practice, and how can they be packed into very little memory?

Search with a DAWG is the same as with a trie: O(m) to reach the prefix node, O(m) per emitted word. Building is O(mn), like a trie, plus cheap O(1) per-node fixups. A dictionary-backed DAWG uses fewer bytes than a naive string list but more than a custom flat string store. For extreme constraints, pack nodes/edges into a single byte array (e.g., 4-byte header with end-flag and alphabet bitmask, followed by 3-byte child offsets). ENABLE’s DAWG fits in roughly 585,593 bytes with such a layout—small enough for 640KB-era machines.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $33.59

you save $22.40 (40%)

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more