Overview

1 What are LLM Agents and Multi-Agent Systems?

This chapter introduces why Large Language Models (LLMs) need agents and what multi-agent systems add on top. While LLMs are excellent at expressing intent, they cannot act without surrounding orchestration. LLM agents bridge this gap by turning intentions into actions through tool-calling, enabling real-world applications such as report generation, web and deep research, retrieval-augmented generation, coding assistance, and full computer use. The chapter motivates when a single agent suffices and when multi-agent systems—collections of specialized agents collaborating—deliver better outcomes on complex, decomposable tasks.

Mechanically, an LLM agent pairs a backbone LLM with tools and runs a processing loop that repeatedly plans, calls tools, and synthesizes results until the task is done. Effective agents rely on two core LLM capabilities: planning to set and adapt next actions, and tool-calling to execute those actions with structured requests. The chapter also covers key enhancements that boost performance and reliability: memory modules to reuse past results, human-in-the-loop checkpoints to prevent cascading errors, and careful monitoring to mitigate hallucinations. It highlights emerging protocols that standardize interoperability—Model Context Protocol (MCP) for integrating third-party tools and resources, and Agent2Agent (A2A) for inter-agent communication—along with clarifications on adjacent ideas such as reasoning LLMs and Large Action Model agents.

For scenarios that benefit from specialization and coordination, multi-agent systems let focused agents own subtasks and combine their outputs into an overall result, with A2A enabling cross-framework collaboration. The chapter closes with a hands-on roadmap for building a complete framework from scratch: first defining tool and LLM interfaces and an LLMAgent with a processing loop; then adding MCP compatibility; next incorporating memory and human-in-the-loop patterns; and finally implementing multi-agent coordination. The goal is to equip readers with a deep mental model and practical implementation skills to confidently use existing frameworks or craft bespoke agentic solutions.

The applications for LLM agents are many, including agentic RAG, report generation, deep search and computer use, all of which can benefit from MAS.
An LLM agent is comprised of a backbone LLM and its equipped tools.
LLM agents utilize the planning capability of backbone LLMs to formulate initial plans for tasks, as well as to adapt current plans based on the results of past steps or actions taken towards task completion.
An illustration of the tool-equipping process, where a textual description of the tool that contains the tool’s name, description and its parameters is provided to the LLM agent.
The tool-calling process, where any equipped tool can be used.
A mental model of an LLM agent performing a task through its processing loop, where tool calling and planning are used repeatedly. The task is executed through a series of sub-steps, a typical approach for performing tasks.
An LLM agent that has access to memory modules where it can store key information of task executions and load this back into its context for future tasks.
A mental model of the LLM agent processing loop that has memory modules for saving and loading important information obtained during task execution.
An LLM agent processing loop with access to human operators. The processing loop is effectively paused each time a human operator is required to provide input.
Multiple LLM agents collaborating to complete an overarching task. The outcomes of each LLM agent’s processing loop are combined to form the overall task result.
A first look at the llm-agents-from-scratch framework that we’ll build together.
A simple UML class diagram that shows two classes from the llm-agents-from-scratch framework. The BaseTool class lives in the base module, while the ToolCallResult lives in the data_structures module. The attributes and methods of both classes are indicated in their respective class diagrams and the relation between them is also described.
A UML sequence diagram that illustrates how the flow of a tool call. First, an LLM agent prepares a ToolCall object and invokes the BaseTool, which initiates the processing of the tool call. Once completed, the BaseTool class constructs a ToolCallResult which then gets sent back to the LLM agent.
The build plan for our llm-agents-from-scratch framework. We will build this framework in four stages. In the first stage, we’ll implement the interfaces for tools and LLMs, as well as our LLM agent class. In the second stage, we’ll make our LLM agent MCP compatible so that MCP tools can be equipped to the backbone LLM. In stage three, we will implement the human-in-the-loop pattern and add memory modules to our LLM agent. And, in the fourth and final stage, we’ll incorporate A2A and other multi-agent coordination logic into our framework to enable building MAS.

Summary

  • LLMs have become very powerful text generators that have been applied successfully to tasks like text summarization, question-answering, and text classification, but they have a critical limitation in that they cannot act; they can only express an intent to act (such as making a tool call) through text. That’s where LLM agents come in to bring in the ability to carry out the intended actions.
  • Applications for LLM agents are many, such as report generation, deep research, computer use and coding.
  • With MAS, individual LLM agents collaborate to collectively perform tasks.
  • Many applications for LLM agents can further benefit from MAS. In principle, MAS excel when complex tasks can be decomposed into smaller subtasks, where specialized LLM agents outperform general-purpose LLM agents.
  • LLM agents are systems comprised of an LLM and tools that can act autonomously to perform tasks.
  • LLM agents use a processing loop to execute tasks. Tool calling and planning capabilities are key components of that processing loop.
  • Protocols like MCP and A2A have helped to create a vibrant LLM agent ecosystem that is powering the growth of LLM agents and their applications. MCP is a protocol developed by Anthropic that has paved the way for LLM agents to use third-party provided tools.
  • A2A is a protocol developed by Google to standardize how agent-to-agent interactions are conducted in MAS.
  • Building an LLM agent requires infrastructure elements like interfaces for LLMs, tools, and tasks.
  • We’ll build LLM agents, MAS, and all the required infrastructure from scratch into a Python framework called llm-agents-from-scratch.

FAQ

What is an LLM agent, and why aren’t raw LLMs sufficient?An LLM agent is an autonomous system that wraps a backbone LLM with tools and orchestration so the model’s plans and tool-call requests are actually executed. Raw LLMs only generate text; they can describe what to do but cannot act without an external system to run tools, collect results, and feed them back for synthesis.
How do LLM agents turn intentions into actions with tool-calling?Agents provide the LLM with descriptions of available tools (names, parameters, purpose). The LLM outputs a structured tool-call request (often JSON) selecting a tool and supplying parameters. The application executes the tool, returns results to the LLM, and the LLM uses those results to plan next steps or produce an answer.
What does the agent’s processing loop look like?The agent executes a task as a series of sub-steps. At each step it synthesizes progress, plans the next action, may call tools, and adapts based on prior results until completion or a stopping condition. The step-by-step record (plans, tool calls, results) is the agent’s trajectory or rollout, valuable for debugging and improvement.
What backbone LLM capabilities are required?Two essentials: planning and tool-calling. The LLM should propose sensible next actions and adapt plans as results arrive, and it must reliably produce well-formed tool-call requests when appropriate. Reasoning-tuned models often plan better, and tool usage is typically taught via supervised fine-tuning.
Which real-world applications benefit from LLM agents and MAS?Common uses include report generation, web search and deep research, agentic RAG over internal knowledge, coding with sandboxed code interpreters, and end-to-end computer use (e.g., purchasing tickets). Multi-agent systems can enhance these by assigning specialized agents to different parts of the workflow.
When should I use a multi-agent system instead of a single agent?Use MAS when a complex task decomposes into focused subtasks where specialized agents outperform a generalist (e.g., domain-specific summarization plus structured report writing, or front-end vs. back-end coding). MAS can improve quality and efficiency, though they add coordination complexity and potential failure modes.
How do memory and human-in-the-loop improve agents?Memory modules store useful artifacts—like past trajectories and tool results—and load them into context for future tasks to save time and reduce repetition. Human-in-the-loop lets people approve critical plans or validate outputs, reducing cascading errors at the cost of longer execution due to waiting for human input.
What is the Model Context Protocol (MCP), and why is it important?MCP is a standard for connecting agents to third-party tools and resources. Its rapid ecosystem growth provides a wide catalog of MCP-compatible tools, making it much easier to equip agents with capabilities beyond what you build yourself.
What is Agent2Agent (A2A), and how does it enable MAS?A2A is a protocol for agent-to-agent communication. It allows agents built with different frameworks to collaborate, supporting cross-framework multi-agent systems and enabling richer, interoperable workflows.
How do LLM agents differ from RL agents and LAM agents?RL agents learn policies to maximize rewards through interactions with an environment; LLM agents repurpose pre-trained LLMs to generate plans and tool calls without learning an explicit task policy. LAM agents use a backbone Large Action Model specialized to predict action sequences in narrow domains (e.g., GUI control), whereas LLM agents are more general-purpose.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Build a Multi-Agent System (from Scratch) ebook for free