1 What is an AI agent?
This chapter introduces the modern landscape of AI agents and sets the philosophy of the book: understand and build agents from first principles before relying on frameworks. It surveys how agents show up in practice—from personal assistants and customer-facing systems to specialized coding and research tools—and argues that Large Language Models (LLMs) power nearly all of them. The authors position agent building as an exercise in debugging: to fix failures, you must know how the parts work. They also preview key themes that guide the rest of the book: LLMs as the agent’s “brain,” the distinction between workflows and agents, the GAIA benchmark for measuring progress, and the centrality of context engineering.
The core definition of an agent is LLM + tools + loop: the model decides what to do next, invokes external tools (search, code execution, databases), ingests results back into its context, and iterates until it chooses to stop. This autonomy distinguishes agents from plain LLM calls and from traditional, developer-defined workflows. The chapter maps a spectrum from predictable workflows (single calls, chains, routers) to agentic systems that direct their own multi-step processes and can even write new tools. It offers practical guidance on when agents are warranted—tasks with unstructured inputs, high input diversity, and uncertain step counts—while underscoring trade-offs: higher cost, latency, and error propagation. In production, hybrid designs often work best, embedding agents inside workflow stages for controlled flexibility, cost management, and safer failure handling.
To evaluate agent capabilities, the chapter adopts GAIA, a benchmark of multi-step, real-world questions that demand reasoning, retrieval, and calculation—ideal for iterating on agent designs and quantifying improvements. It then broadens prompt engineering into context engineering: the discipline of curating everything the model sees—system instructions, conversation state, tool outputs, and retrieved knowledge—at the right time and granularity. Most real failures come from missing information rather than insufficient model intelligence, and larger contexts can degrade performance, so relevance and focus matter. The chapter outlines five strategies—Generation, Retrieval, Write, Reduce, and Isolate—that will be layered through the book’s implementation roadmap, alongside practical prerequisites (Python, environment setup, API keys, and cost awareness) to equip readers to build, measure, and iteratively improve agents from scratch.
Example of a language model’s generalization capability.
User requests flow through the research agent, which branches into multiple searches and synthesis.
The LLM Agent's decision loop is an iterative process of LLM decision-making and tool use.
Progression of agency levels in LLM applications.
LLMs can only produce accurate, high-quality responses when sufficient information is provided in the context.
Even with large context windows, longer inputs can degrade model performance(Source: https://research.trychroma.com/context-rot).
An overview of the journey through the book
Summary
- AI agents span a wide spectrum, from personal assistants like ChatGPT and Claude to customer-facing agents and specialized tools like Claude Code and Cursor. All share a common foundation: LLMs as their decision-making core.
- An LLM agent consists of three elements: the LLM (brain), tools (means of interacting with the external world), and a loop (iterative process until goal completion). The LLM decides which tool to use and when to stop.
- Workflows are developer-defined execution flows where LLMs perform specific steps. Agents are LLM-directed flows where the model dynamically determines its own process. Production systems often combine both approaches.
- Use agents when tasks require multiple unpredictable steps, provide sufficient value to justify costs, and allow for error detection. The GAIA benchmark provides ideal practice problems for agent development.
- Context engineering is the discipline of providing the right information at the right time. Five strategies (Generation, Retrieval, Write, Reduce, Isolate) form the framework for building effective agents throughout this book.
Build an AI Agent (From Scratch) ebook for free