Build an AI Agent (From Scratch) you own this product

Agents that reason, plan, and act autonomously

Jungjun Hur and Younghee Song

MEAP began October 2025
Last updated January 2026
Publication in Summer 2026 (estimated)

ISBN 9781633434615
375 pages (estimated)

Included with a Manning Online subscription

printed in black & white

resources: Source code Book forum Source code on GitHub

table of content

PART 1: BUILDING YOUR FIRST LLM AGENT

1 What is an AI agent?

1.1 The age of AI agents

1.2 Understanding LLM agents

1.2.1 What is an LLM?

1.2.2 What is an LLM Agent?

1.3 Workflow vs agent

1.3.1 Workflow: Developer-defined flow

1.3.2 Agent: LLM-directed flow

1.3.3 Combining workflows and agents in practice

1.4 Tasks that require agents

1.4.1 Tasks that require an LLM

1.4.2 Conditions for using agents

1.4.3 GAIA: An agent gym

1.5 Context engineering

1.5.1 Why agents fail

1.5.2 From prompt engineering to context engineering

1.5.3 Bigger context is not always better

1.5.4 Five context engineering strategies

1.5.5 The journey of this book

1.6 Prerequisites for reading this book

1.7 Summary

2 The brain of AI agents: LLMs

2.1 Choosing LLMs for agents

2.1.1 Starting with closed LLMs

2.1.2 Expanding to open LLMs

2.1.3 Essential LLM capabilities for agents

2.2 LLM API basics for building agents

2.2.1 Setting up the development environment

2.2.2 Getting started with the OpenAI API

2.2.3 Unifying providers with LiteLLM

2.2.4 Conversation management: Handling stateless APIs

2.2.5 Structured output

2.2.6 Asynchronous calls

2.3 Enhancing agent intelligence: Prompt engineering

2.3.1 The role of system prompts

2.3.2 Guidelines for agent prompts

2.4 Experiencing LLM limitations with GAIA

2.4.1 Why GAIA? Setting goals for agent development

2.4.2 Experiment setup

2.4.3 Results and analysis: Why LLMs need tools

2.5 Summary

3 Enabling actions: Tool use

3.1 LLM tools

3.1.1 Why do we need LLM tools?

3.1.2 Types of LLM tools

3.2 How LLMs use tools

3.2.1 How tool calling works

3.2.2 How can an LLM choose tools

3.2.3 Guidelines for effective tool calling

3.3 Building tools and tool definitions for LLMs

3.3.1 Implementing a web search tool

3.3.2 Converting to tool definitions

3.3.3 End-to-end tool execution

3.3.4 The challenges of custom tools

3.4 MCP: Standardizing tools

3.4.1 The core of MCP: server–client architecture

3.4.2 Hands-on: Running an MCP server

3.4.3 Understanding the MCP client

3.4.4 Hands-on: Implementing an MCP server

3.5 Summary

4 Implementing a basic ReAct agent

4.1 How ReAct agents work

4.1.1 The think-act cycle

4.1.2 From text parsing to tool calling

4.2 Agent architecture overview

4.2.1 The completed agent

4.2.2 Information flow: The core design

4.2.3 Components we need to build

4.3 ExecutionContext: The agent’s central storage

4.3.1 What happens during agent execution?

4.3.2 Implementing ExecutionContext

4.4 Tool abstraction

4.4.1 Why we need a unified tool interface

4.4.2 BaseTool: The foundation

4.4.3 FunctionTool: Wrapping functions

4.4.4 Integrating MCP Tools

4.5 LLM Communication layer

4.5.1 Why a communication layer?

4.5.2 LlmRequest: Selecting what to send

4.5.3 LlmResponse: Standardizing what we receive

4.5.4 LlmClient: The provider adapter

4.5.5 Putting it together

4.6 Implementing the agent

4.6.1 Agent class structure

4.6.2 The run() Method

4.6.3 The step() method

4.6.4 The think() and act() methods

4.7 Adding structured output

4.7.1 The approach: Tools as output formatters

4.7.2 Modifying the agent

4.7.3 Using structured output in practice

4.8 Testing with the GAIA benchmark

4.8.1 From LLM to agent

4.8.2 Results

4.9 Summary

PART 2: DEVELOPING ADVANCED AGENT CAPABILITIES

5 Building knowledge bases with RAG

5.1 The problem of using internal data

5.1.1 The simple case: Single file

5.1.2 What if there are multiple files?

5.1.3 What if the data is large or extensive?

5.2 Types of search methods

5.2.1 Keyword search

5.2.2 Vector search

5.2.3 Graph search

5.2.4 Structure-based search

5.3 Practicing vector search

5.3.1 Embedding: Converting text to vectors

5.3.2 Chunking: Dividing long text into search units

5.3.3 Implementing vector search

5.3.4 Exercise: Finding relevant information from web search results

5.3.5 Structure-based search

5.3.6 Preparing the GAIA dataset

5.3.7 Implementing file system tools

5.3.8 Connecting tools to the agent

5.3.9 Solving GAIA zip file problems

5.4 Extending agents with callbacks

5.4.1 The need for agent extension

5.4.2 Implementing tool callbacks

5.4.3 Human in the loop: Tool execution approval

5.4.4 Compressing search results

5.5 Summary

6 Adding memory to your agent

6.1 The anatomy of agent memory

6.1.1 Limitations of the current memory architecture

6.1.2 Context engineering and memory

6.2 Managing context during execution

6.2.1 Separating storage from presentation

6.2.2 Sliding window strategy

6.2.3 Token counting

6.2.4 Compaction strategy

6.2.5 Summarization strategy

6.2.6 Hierarchical context management

6.3 Continuous execution: Session and state management

6.3.1 The session class

6.3.2 Managing sessions with SessionManager

6.3.3 Integrating sessions into the agent

6.3.4 Basic example: Multi-turn conversation

6.3.5 Data structures for tool confirmation

6.3.6 Extending tools for confirmation

6.3.7 Implementing pause and resume in the agent

6.3.8 Complete example: Human-in-the-loop workflow

6.4 Long-term memory: Accumulating knowledge across sessions

6.4.1 The structure of long-term memory

6.4.2 Information extraction: Structured output

6.4.3 Building a vector store with ChromaDB

6.4.4 Implementing TaskMemoryManager

6.4.5 Retrieving memories

6.5 Summary

7 Planning and reflection for complex tasks

7.1 Giving agents time to think

7.1.1 The limitations of ReAct

7.1.2 How human experts work

7.1.3 Why time to think matters

7.2 Planning: Setting direction

7.2.1 When is planning necessary?

7.2.2 Implementing the planning tool

7.2.3 Planning tool usage example

7.2.4 Extension directions

7.3 Reflection: Checking and correcting

7.3.1 When is reflection necessary?

7.3.2 Implementing the reflection tool

7.3.3 The real value of reflection: Failure recovery

7.3.4 Running an agent that uses reflection for research synthesis

7.4 Integrating planning and reflection

7.4.1 Failure modes and solutions

7.5 Summary

8 Creating code agents that write their own tools

9 Coordinating multi-agent systems

PART 3: DEPLOYING AGENTS IN PRODUCTION

10 Evaluating performance and monitoring agent behavior

Overview

1 What is an AI agent?

This chapter introduces the modern landscape of AI agents and sets the philosophy of the book: understand and build agents from first principles before relying on frameworks. It surveys how agents show up in practice—from personal assistants and customer-facing systems to specialized coding and research tools—and argues that Large Language Models (LLMs) power nearly all of them. The authors position agent building as an exercise in debugging: to fix failures, you must know how the parts work. They also preview key themes that guide the rest of the book: LLMs as the agent’s “brain,” the distinction between workflows and agents, the GAIA benchmark for measuring progress, and the centrality of context engineering.

The core definition of an agent is LLM + tools + loop: the model decides what to do next, invokes external tools (search, code execution, databases), ingests results back into its context, and iterates until it chooses to stop. This autonomy distinguishes agents from plain LLM calls and from traditional, developer-defined workflows. The chapter maps a spectrum from predictable workflows (single calls, chains, routers) to agentic systems that direct their own multi-step processes and can even write new tools. It offers practical guidance on when agents are warranted—tasks with unstructured inputs, high input diversity, and uncertain step counts—while underscoring trade-offs: higher cost, latency, and error propagation. In production, hybrid designs often work best, embedding agents inside workflow stages for controlled flexibility, cost management, and safer failure handling.

To evaluate agent capabilities, the chapter adopts GAIA, a benchmark of multi-step, real-world questions that demand reasoning, retrieval, and calculation—ideal for iterating on agent designs and quantifying improvements. It then broadens prompt engineering into context engineering: the discipline of curating everything the model sees—system instructions, conversation state, tool outputs, and retrieved knowledge—at the right time and granularity. Most real failures come from missing information rather than insufficient model intelligence, and larger contexts can degrade performance, so relevance and focus matter. The chapter outlines five strategies—Generation, Retrieval, Write, Reduce, and Isolate—that will be layered through the book’s implementation roadmap, alongside practical prerequisites (Python, environment setup, API keys, and cost awareness) to equip readers to build, measure, and iteratively improve agents from scratch.

Example of a language model’s generalization capability.

User requests flow through the research agent, which branches into multiple searches and synthesis.

The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.

Progression of agency levels in LLM applications.

LLMs can only produce accurate, high-quality responses when sufficient information is provided in the context.

Even with large context windows, longer inputs can degrade model performance(Source: https://research.trychroma.com/context-rot).

An overview of the journey through the book

Summary

AI agents span a wide spectrum, from personal assistants like ChatGPT and Claude to customer-facing agents and specialized tools like Claude Code and Cursor. All share a common foundation: LLMs as their decision-making core.
An LLM agent consists of three elements: the LLM (brain), tools (means of interacting with the external world), and a loop (iterative process until goal completion). The LLM decides which tool to use and when to stop.
Workflows are developer-defined execution flows where LLMs perform specific steps. Agents are LLM-directed flows where the model dynamically determines its own process. Production systems often combine both approaches.
Use agents when tasks require multiple unpredictable steps, provide sufficient value to justify costs, and allow for error detection. The GAIA benchmark provides ideal practice problems for agent development.
Context engineering is the discipline of providing the right information at the right time. Five strategies (Generation, Retrieval, Write, Reduce, Isolate) form the framework for building effective agents throughout this book.

FAQ

What is an AI (LLM) agent?

An LLM agent is a program that uses a Large Language Model as its decision-making core, interacts with the external world through tools, and operates in a loop until a goal is achieved. In short: LLM + tools + loop. The model decides which action to take next and when to stop based on the current context.

How do LLMs enable agent behavior if they only predict the next token?

LLMs use their generalization and reasoning abilities to choose actions, not just produce text. When paired with tools (search, code execution, APIs) and an iterative loop (reason, act, observe), the model can plan multi-step tasks, gather missing information, and decide when the task is complete.

What types of AI agents are common today?

Three broad types: 1) Personal agents (general-purpose assistants that adapt to many tasks). 2) Customer-facing agents (business-aligned assistants that follow policies and handle transactions). 3) Specialized agents (domain tools like coding or deep research agents that often run asynchronously).

How does the agent loop work?

The loop repeats: 1) The LLM evaluates the context and decides if a tool is needed. 2) A selected tool is executed. 3) Tool results are added back into the context. 4) The LLM decides to continue or stop. This supports tasks with unpredictable numbers of steps.

How are agents different from simple LLM calls and traditional workflows?

Workflows are developer-defined and predictable (single call, chains, routers). Agents are LLM-directed: they choose actions and tools dynamically and iterate until done. Use workflows for structure and reliability; use agents where flexibility and open-ended problem solving are required.

How do I decide if a task needs an LLM at all?

Prefer an LLM when: 1) The task involves unstructured data (text, images, audio) needing flexible interpretation. 2) Inputs and requests are diverse and hard to predefine. If the task is deterministic over structured data, traditional code or a narrow model is cheaper and more reliable.

When should I use an agent instead of a single call or workflow?

Consider: 1) Task complexity (unknown number of steps or paths). 2) Task value (benefit outweighs extra cost/latency of multi-call loops). 3) Error cost and detectability (can you catch or tolerate mistakes?). Agents trade higher cost/latency for flexibility.

What is the GAIA benchmark and why use it?

GAIA is a dataset of questions that require multi-step reasoning, web search, and calculations—ideal for agent evaluation. It offers clear answers for fast feedback, the right difficulty for iterative development, and minimal domain knowledge requirements.

What is context engineering, and why do agents fail without it?

Context engineering is the practice of providing the right information, at the right time, in the right form to the LLM (prompts, history, tool results, retrieved docs, etc.). Agents often fail not from lack of intelligence but from missing information in context. Good context design improves accuracy and reliability.

Is a bigger context always better? What strategies help?

No. Longer contexts can degrade performance (context rot, lost-in-the-middle). Focus on relevance using five strategies: 1) Generation (plans, reflections). 2) Retrieval (bring in needed info). 3) Write (persist to memory/workspace). 4) Reduce (summarize/filter). 5) Isolate (separate tasks/tools or agents).

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $35.99

you save $12.00 (25%)

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more