Part 1: Our First LLM Agent

1 What are LLM Agents and Multi-Agent Systems?

1.1 Where LLM Agents and Multi-Agent Systems are useful

1.1.1 Report generation

1.1.2 Web search and deep search

1.1.3 Agentic RAG

1.1.4 Coding LLM agents

1.1.5 Computer use

1.1.6 Enhancing applications with MAS

1.2 What is an LLM agent?

1.2.1 Prerequisite LLM capabilities

1.3 The processing loop

1.4 Important enhancements and patterns

1.5 Protocols for LLM agents

1.6 Multi-Agent Systems

1.7 Our LLM agent framework

1.8 How to use this book

1.8.1 Code

1.8.2 The basics of UML diagrams

1.9 Our roadmap

1.10 Summary

2 Working with Tools

2.1 BaseTool: a blueprint for tools

2.1.1 Implementing ToolCall and ToolCallResult

2.1.2 Implementing BaseTool

2.1.3 The AsyncBaseTool

2.2 SimpleFunctionTool: a subclass of BaseTool

2.2.1 Implementing SimpleFunctionTool

2.2.2 The AsyncSimpleFunctionTool

2.3 PydanticFunctionTool: another subclass of BaseTool

2.4 Summary

3 Working with LLMs

3.1 BaseLLM: a blueprint for LLMs

3.1.1 Implementing CompleteResult, ChatMessage, and ChatRole

3.1.2 Implementing BaseLLM

3.2 OllamaLLM: a subclass of BaseLLM

3.2.1 Implementing OllamaLLM

3.2.2 Hailstone tool call with OllamaLLM

3.3 Summary

4 The LLM Agent Class

4.1 LLMAgent: a simple LLM agent class

4.1.1 Implementing Task and TaskResult

4.1.2 Implementing LLMAgent: Part 1

4.2 Implementing a processing loop for LLMAgent

4.2.1 Designing a processing loop

4.2.2 Implementing TaskStep, TaskStepResult, and NextStepDecision

4.2.3 Implementing LLMAgent: Part 2

4.3 Putting it all together: Hailstone LLM agent

4.4 Summary

5 Capstone: Our first LLM agent

Overview

1 What are LLM Agents and Multi-Agent Systems?

Large language models can propose plans in natural language but cannot execute them on their own. This chapter introduces LLM agents—systems that translate an LLM’s intentions into concrete actions by orchestrating tools—and explains why this extra layer is necessary. It also introduces multi-agent systems (MAS), where multiple specialized LLM agents collaborate on different parts of a task, and sets expectations for a hands-on journey toward understanding, building, and evaluating these systems from the ground up.

The chapter explains how agents work by leveraging two prerequisite LLM capabilities: planning and tool-calling. Within an iterative processing loop, an agent formulates and adapts plans, issues structured tool calls, ingests results, and progresses through sub-steps until completion or a stopping condition. Enhancements that improve reliability and performance include memory modules (to reuse past results and trajectories) and human-in-the-loop checkpoints (to review or correct critical steps). The chapter also highlights emerging standards: the Model Context Protocol (MCP) for integrating third‑party tools and resources, and the Agent2Agent (A2A) protocol for inter-agent collaboration, enabling heterogeneous agents to coordinate effectively.

Practical applications span report generation, web and deep research, agentic RAG over private knowledge, coding support, and computer-use automation—often benefiting from MAS when complex tasks can be decomposed into focused subtasks. Finally, the chapter outlines the book’s roadmap for building a complete framework from scratch: start with core interfaces for tools and LLMs and an agent with a processing loop; add MCP compatibility; incorporate memory and human-in-the-loop patterns; and conclude with A2A-based multi-agent coordination and capstone projects that demonstrate end-to-end agentic systems in realistic workflows.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.

An LLM agent is comprised of a backbone LLM and its equipped tools.

LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.

An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.

The tool-calling process, where any equipped tool can be used.

A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.

An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.

A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.

An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.

Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.

A first look at the llm-agents-from-scratch framework that we’ll build together.

A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.

A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.

The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
With MAS, individual LLM agents collaborate to collectively perform tasks.
Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What is an LLM agent, and why aren’t raw LLMs considered agents?

An LLM agent is an autonomous system built around a backbone LLM plus tools. While LLMs can plan and express intent in text, they cannot act. An LLM agent orchestrates tool calls and executes plans generated by the LLM to perform tasks on a user’s behalf.

Which backbone LLM capabilities are required for an effective agent?

Two core capabilities: (1) Planning—formulating and adapting a plan across steps, and (2) Tool-calling—emitting well-formed requests to use tools with parameters. Reasoning LLMs often do better at planning and can be strong backbones.

How does tool-calling work in practice?

You equip the LLM with tool descriptions (name, purpose, parameters). The LLM then generates a structured tool-call request selecting a tool and filling parameters. The application executes the tool, returns results to the LLM, and the LLM synthesizes those results into the next action or answer.

What is the agent’s processing loop?

A repeated cycle that drives task completion: - Initialize with the user’s request or prior progress - Plan the next step - Optionally call one or more tools - Synthesize results and update the plan - Stop on success or a predefined condition (e.g., max steps) Agents can log a “trajectory” of plans, tool calls, and results for debugging and improvement.

Where are LLM agents and MAS useful today?

Common applications include: - Report generation (with monitoring for hallucinations) - Web search and deep research (multi-step browse, synthesize, report) - Agentic RAG (retrieving internal knowledge to answer queries) - Coding agents (including sandboxed interpreters) - Computer use/RPA-like automation (operating apps and OS to complete tasks)

When should I use a Multi-Agent System (MAS) instead of a single agent?

Use MAS when a complex task decomposes into specialized subtasks where focused agents outperform a generalist (e.g., separate agents for retrieval vs. synthesis, or frontend vs. backend coding). MAS can coordinate across different frameworks via A2A, but they add coordination complexity and have their own failure modes.

What enhancements and patterns improve agent effectiveness?

Two key additions: - Memory modules: store and later load useful artifacts (e.g., past results, trajectories) to save time and improve consistency. - Human-in-the-loop (HITL): pause for human review/approval at critical steps to prevent cascading errors; trades off speed for accuracy and safety.

What are MCP and A2A, and why do they matter?

MCP (Model Context Protocol) standardizes how agents access third-party tools and other resources, enabling a rich ecosystem of ready-to-use capabilities. A2A (Agent2Agent) standardizes agent-to-agent communication, allowing agents built on different frameworks to collaborate within MAS.

How do LLM agents differ from LAM and RL agents?

LLM agents repurpose pre-trained LLMs to plan and call tools for general tasks. LAM agents use Large Action Models specialized to predict action sequences in specific domains (e.g., GUI control). RL agents are trained to maximize rewards in an environment via learned policies; by contrast, LLM agents aren’t trained as policies over environment states.

What will I build in this book’s framework, and what’s the roadmap?

You’ll build an agent framework (llm-agents-from-scratch) with modules for base LLM/tool interfaces, concrete tools and LLMs (e.g., an Ollama-backed LLM), the LLMAgent and processing loop, data structures (tasks, tool calls, results), and utilities. Roadmap: 1) Core agent, tools, and LLM interfaces 2) MCP compatibility (plus an MCP server you build) 3) Memory and human-in-the-loop 4) A2A and multi-agent coordination to build full MAS

1 What are LLM Agents and Multi-Agent Systems?

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.

An LLM agent is comprised of a backbone LLM and its equipped tools.

LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.

An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.

The tool-calling process, where any equipped tool can be used.

A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.

An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.

A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.

An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.

Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.

A first look at the llm-agents-from-scratch framework that we’ll build together.

A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.

Summary

FAQ

pro $24.99 per month

lite $19.99 per month

team

pro $24.99 per month

lite $19.99 per month

team

pro

team

pro

team