AI Agents and Applications you own this product

With LangChain, LangGraph, and MCP

Roberto Infante

MEAP began July 2024
Last updated December 2025
Publication in January 2026 (estimated)

ISBN 9781633436541
427 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

catalog / Data Science / AI / AI Agents

table of content

Part 1: Getting started with LLMs

1 Introduction to AI agents and applications

1.1 Building LLM-based applications and agents

1.1.1 LLM-based applications: summarization and Q&A engines

1.1.2 LLM-based chatbots

1.1.3 AI agents

1.2 Introducing LangChain

1.2.1 LangChain architecture

1.3 LangChain core object model

1.4 Typical LLM use cases

1.5 How to adapt an LLM to your needs

1.5.1 Prompt engineering

1.5.2 Retrieval Augmented Generation (RAG)

1.5.3 Fine-tuning

1.6 Which LLMs to choose

1.7 What You’ll Learn from this Book

1.8 Recap on LLM terminology

1.9 Summary

2 Executing prompts programmatically

2.1 Running prompts programmatically

2.1.1 Setting up a Jupyter Notebook environment for this chapter

2.1.2 Minimal prompt execution

2.2 Running prompts with LangChain

2.3 Prompt templates

2.3.1 Implementing a prompt template with a Python function

2.3.2 Using LangChain’s PromptTemplate

2.4 Types of prompt

2.4.1 Text classification

2.4.2 Sentiment analysis

2.4.3 Text summarization

2.4.4 Composing text

2.4.5 Question answering

2.4.6 Reasoning

2.5 Reasoning in detail

2.5.1 One-shot learning

2.5.2 Two-shot learning

2.5.3 Providing steps

2.5.4 Few-shot learning

2.5.5 Implementing few-shot learning with LangChain

2.5.6 Chain of Thought (CoT)

2.6 Prompt structure

2.7 Summary

Part 2: Summarization

3 Summarizing text using LangChain

3.1 Summarizing a document bigger than context window

3.1.1 Chunking the text into Document objects

3.1.2 Split

3.1.3 Map

3.1.4 Reduce

3.1.5 Map-reduce combined chain

3.1.6 Map-reduce execution

3.2 Summarizing across documents

3.2.1 Creating a list of Document objects

3.2.2 Wikipedia content

3.2.3 File based content

3.2.4 Creating the Document list

3.2.5 Progressively refining the final summary

3.3 Summarization flowchart

3.4 Summary

4 Building a research summarization engine

4.1 Overview of a research summarization engine

4.2 Setting up the project

4.3 Implementing the core functionality

4.3.1 Implementing web searching

4.3.2 Implementing web scraping

4.3.3 Instantiating the LLM client

4.3.4 JSON to Python object converter

4.4 Enhancing the architecture with query rewriting

4.5 Prompt engineering

4.5.1 Crafting Web Search Prompts

4.5.2 Crafting Summarization Prompts

4.5.3 Research Report prompt

4.6 Initial implementation

4.6.1 Importing Functions and Prompt Templates

4.6.2 Setting constants and input variables

4.6.3 Instantiating the LLM client

4.6.4 Generating the web searches and collecting the results

4.6.5 Scraping the web results

4.6.6 Summarizing the web results

4.6.7 Generating the research report

4.7 Reimplementing the research summary engine in LCEL

4.7.1 Assistant Instructions chain

4.7.2 Web Searches chain

4.7.3 Search and Summarization chain

4.7.4 Web Research chain

4.8 Summary

5 Agentic workflows with LangGraph

5.1 Understanding Agentic Workflows and Agents

5.1.1 Workflows

5.1.2 Agents

5.1.3 When to Use Agent-Based Architectures

5.1.4 Agent Development Frameworks

5.2 LangGraph Basics

5.3 Moving from LangChain Chains to LangGraph

5.4 LangGraph Core Components

5.4.1 StateGraph Structure

5.4.2 State Management and Typing

5.4.3 Node Functions and Edge Definitions

5.4.4 Entry Points and End Conditions

5.5 Turning the Web Research Assistant into an AI Agent

5.5.1 Original LangChain Implementation Overview

5.5.2 Identifying Components for Conversion

5.5.3 Step-by-Step Transformation Process

5.5.4 Code Comparison and Benefits Realized

5.6 Summary

Part 3: Q&A chatbots

6 RAG fundamentals with Chroma DB

6.1 Semantic Search

6.1.1 A Basic Q&A Chatbot Over a Single Document

6.1.2 A More Complex Q&A Chatbot Over a Knowledge Base

6.1.3 The RAG Design Pattern

6.2 Vector Stores

6.2.1 What’s a Vector Store?

6.2.2 How Do Vector Stores Work?

6.2.3 Vector Libraries vs. Vector Databases

6.2.4 Most Popular Vector Stores

6.2.5 Storing Text and Performing a Semantic Search Using Chroma

6.3 Implementing RAG from Scratch

6.3.1 Retrieving Content from the Vector Database

6.3.2 Invoking the LLM

6.3.3 Building the Chatbot

6.3.4 Recap on RAG Terminology

6.4 Summary

7 Q&A chatbots with LangChain and LangSmith

7.1 LangChain object model for Q&A chatbots

7.1.1 Content Ingestion (Indexing) Stage

7.1.2 Q&A (Retrieval and Generation) Stage

7.2 Vector Store Content Ingestion

7.2.1 Splitting and Storing the Documents

7.2.2 Ingesting Multiple Documents from a Folder

7.3 Q&A Across Stored Documents

7.3.1 Querying the vector store directly

7.3.2 Asking a Question through a LangChain Chain

7.3.3 Completing the RAG Chain Setup

7.4 Chatbot memory of message history

7.4.1 Amending the Prompt

7.4.2 Updating the Chat Message History

7.4.3 Feeding the Chat History to the RAG Chain

7.4.4 Putting Everything Together

7.5 Tracing Execution with LangSmith

7.5.1 Inspecting the LangSmith Traces

7.6 Summary

Part 4: Advanced RAG

8 Advanced indexing

8.1 Improving RAG Accuracy

8.1.1 Content Ingestion Stage

8.1.2 Question Answering Stage

8.2 Advanced Document Indexing

8.3 Splitting Strategy

8.3.1 Splitting by HTML Header

8.4 Embedding Strategy

8.4.1 Embedding Child Chunks with ParentDocumentRetriever

8.4.2 Embedding Child Chunks with MultiVectorRetriever

8.4.3 Embedding Document Summaries

8.4.4 Embedding Hypothetical Questions

8.5 Granular Chunk Expansion

8.6 Semi-Structured Content

8.7 Multi-Modal RAG

8.8 Summary

9 Question transformations

9.1 Rewrite-Retrieve-Read

9.1.1 Retrieving Content Using the Original User Question

9.1.2 Setting Up the Query Rewriter Chain

9.1.3 Retrieving Content with the Rewritten Query

9.1.4 Combining Everything into a Single RAG Chain

9.2 Generating Multiple Queries

9.2.1 Setting Up the Chain for Generating Multiple Queries

9.2.2 Setting Up a Custom Multi-Query Retriever

9.2.3 Using a Standard MultiQueryRetriever Instance

9.3 Step-Back Question

9.3.1 Setting Up the Chain to Generate a Step-Back Question

9.3.2 Incorporating Step-Back Question Generation into the RAG Chain

9.4 Hypothetical Document Embeddings (HyDE)

9.4.1 Generating an Hypothetical Document for the User Question

9.4.2 Integrating the HyDE Chain into the RAG Chain

9.5 Single-Step and Multi-Step Decomposition

9.6 Summary

10 Query generation, routing and retrieval post-processing

10.1 Content Database Query Generation

10.2 Self-Querying (Metadata Query Enrichment)

10.2.1 Ingestion: Metadata Enrichment

10.2.2 Q & A on a Metadata-Enriched Collection

10.3 Generating a Structured SQL Query

10.3.1 Installing SQLite

10.3.2 Setting Up and Connecting to the Database

10.3.3 Generating SQL Queries from Natural Language

10.3.4 Executing the SQL Query

10.4 Generating a Semantic SQL Query

10.4.1 Standard SQL Query

10.4.2 Semantic SQL Query

10.4.3 Creating the Embeddings

10.4.4 Performing a Semantic SQL Search

10.4.5 Automating Semantic SQL Search

10.4.6 Benefits of Semantic SQL Search

10.5 Generating Queries for a Graph Database

10.6 Chain Routing

10.6.1 Setting Up Data Retrievers

10.6.2 Setting Up the Query Router

10.6.3 Integrating the Chain Router into a Full RAG Chain

10.7 Retrieval Post-Processing

10.7.1 Similarity Postprocessor

10.7.2 Keyword Postprocessors

10.7.3 Time Weighting

10.7.4 RAG Fusion (Reciprocal Rank Fusion)

10.8 Summary

Part 5: AI agents

11 Building tool-based agents with LangGraph

11.1 Starting Simple: Building a Single-Tool Travel Info Agent

11.1.1 Project Setup

11.1.2 Loading Environment Variables

11.1.3 Preparing the Travel Information Vector Store

11.2 Enabling Agents to Call Tools

11.2.1 From Function Calling to Tool Calling

11.2.2 How Tool Calling Works with LLMs

11.2.3 Registering Tools with the LLM

11.2.4 Agent State: Tracking the Conversation

11.2.5 Executing Tool Calls

11.2.6 The LLM Node: Coordinating Reasoning and Action

11.3 Assembling the Agent Graph

11.3.1 Understanding the Agent Graph Structure

11.4 Running the Agent Chatbot: The REPL Loop

11.5 Executing a Request

11.5.1 Step-by-Step Debugging

11.6 Expanding Your Agent: Adding a Weather Forecast Tool

11.6.1 Implementing a Mock Weather Service

11.6.2 Creating the Weather Forecast Tool

11.6.3 Updating the Agent for Multi-Tool Support

11.7 Executing the Multi-Tool Agent

11.7.1 Running the Multi-Tool Agent (Initial Behavior)

11.7.2 Improving LLM Tool Usage with System Guidance

11.8 Using Pre-Built Components for Rapid Development

11.8.1 Refactoring to Use the LangGraph React Agent

11.8.2 Running the Pre-Built Agent

11.8.3 Observing and Debugging with LangSmith

11.9 Summary

12 Multi-agent systems

12.1 Building an Accommodation Booking Agent

12.1.1 Hotel Booking Tool

12.1.2 B&B Booking Tool

12.1.3 ReAct Accommodation Booking Agent

12.2 Building a router-based Travel assistant

12.2.1 Designing the Router Agent

12.2.2 Routing Logic

12.2.3 Building the Multi-Agent Graph

12.2.4 Trying Out the Router Agent

12.3 Handling Multi-Agent Requests with a Supervisor

12.3.1 The Supervisor Pattern: An Agent of Agents

12.3.2 From “One-Way” to “Return Ticket” Interactions

12.3.3 Trying out the Supervisor agent

12.4 Summary

13 Building and consuming MCP servers

13.1 Introduction to MCP Servers

13.1.1 The Problem: Context Integration at Scale

13.1.2 The Solution: The Model Context Protocol (MCP)

13.1.3 The MCP Ecosystem

13.2 How to Build MCP Servers

13.2.1 Essential Resources for MCP Server Development

13.2.2 Official Language-Specific MCP SDKs

13.2.3 Consuming MCP Servers in LLM Applications and Agents

13.3 Building a Weather MCP Server

13.3.1 Implementing the MCP Server

13.3.2 Trying out the MCP Server with MCP Inspector

13.3.3 Consuming the MCP Server from a Test MCP Host

13.4 Integrating the Weather MCP Tool into an Agent

13.4.1 Preparing the Travel Agent for Live Weather Data

13.4.2 Integrating the AccuWeather MCP Tool

13.4.3 Updating the Agent Chat Loop

13.4.4 Combining Local and Remote Tools

13.4.5 Testing and Verification

13.4.6 Using the Agent for Complex Queries

13.5 Summary

14 Productionizing AI agents: Memory, guardrails, and beyond

14.1 Memory

14.1.1 Types of Memory

14.1.2 Why Short-Term Memory is Needed

14.1.3 Checkpoints in LangGraph

14.1.4 Adding Short-Term Memory to Our Travel Assistant

14.1.5 Executing the Checkpointer-Enabled Assistant

14.1.6 Rewinding the State to a Past Checkpoint

14.2 Guardrails

14.2.1 Implementing Guardrails to Reject Non-Travel-Related Questions

14.2.2 Implementing More Restrictive Guardrails at Agent Level

14.3 Beyond this chapter

14.3.1 Long-term user and application memory

14.3.2 Human-in-the-loop (HITL)

14.3.3 Post-model guardrails

14.3.4 Evaluation of AI agents and applications

14.3.5 Deployment on LangGraph Platform and Open Agent Platform (OAP)

14.4 Summary

Appendices

Appendix A: Trying out LangChain

A.1 Trying out LangChain in a Jupyter Notebook environment

A.1.1 Sentence completion example

A.1.2 Prompt engineering examples

A.1.3 Creating chains and executing them with LCEL

Appendix B: Setting up a Jupyter Notebook environment

Appendix C: Choosing an LLM

C.1 Popular Large Language Models

C.1.1 OpenAI GPT series

C.1.2 Gemini

C.1.3 Gemma

C.1.4 Claude

C.1.5 Cohere

C.1.6 Llama

C.1.7 Falcon

C.1.8 Mistral

C.1.9 Qwen

C.1.10 Grok

C.1.11 Phi

C.1.12 DeepSeek

C.2 How to choose a model

C.2.1 Model Purpose

C.2.2 Proprietary vs. Open-Source

C.2.3 Model size (Number of Parameters)

C.2.4 Context Window size

C.2.5 Multilingual Support

C.2.6 Accuracy vs. Speed

C.2.7 Cost and Hardware Requirements

C.2.8 Task suitability (standard benchmarks)

C.2.9 Safety and Bias

C.2.10 A practical example

C.3 A Word of Caution

Appendix D: Installing SQLite on Windows

D.1 Installing SQLite

Appendix E: Open-source LLMs

E.1 Benefits of open-source LLMs

E.1.1 Transparency

E.1.2 Privacy

E.1.3 Community driven

E.1.4 Cost Savings

E.2 Popular open-source LLMs

E.3 Considerations on running open-source LLMs locally

E.3.1 Limitations of consumer hardware

E.3.2 Quantization

E.3.3 OpenAI REST API compatibility

E.4 Local inference engines

E.4.1 Llama.cpp

E.4.2 Ollama

E.4.3 vLLM

E.4.4 llamafile

E.4.5 LM Studio

E.4.6 LocalAI

E.4.7 GPT4All

E.4.8 Comparing local inference engines

E.4.9 Choosing a local inference engine

E.5 Inference via the HuggingFace Transformers library

E.5.1 Hugging Face Transformers library

E.5.2 LangChain’s HuggingFace Pipeline

E.6 Building a local summarization engine

E.6.1 Choosing the inference engine

E.6.2 Starting up the OpenAI compatible server

E.6.3 Modifying the original solution

E.6.4 Running the summarization engine through the local LLM

E.6.5 Comparison between OpenAI and local LLM

E.7 Summary

Overview

1 Introduction to AI agents and applications

Large language models have evolved from curiosities to core building blocks for modern software, powering applications that understand, generate, and reason over natural language. This chapter frames the landscape around three archetypes—LLM-powered engines, conversational chatbots, and autonomous or semi-autonomous agents—and explains why building them is hard without the right abstractions. It introduces the LangChain ecosystem (LangChain, LangGraph, and LangSmith) as modular tooling that reduces boilerplate, promotes best practices, and lets teams focus on product logic while safely orchestrating models, data, and tools.

Under the hood, most systems follow a common pattern: ingest data into Documents, split into chunks, embed into vectors, store in a vector database, retrieve relevant context, and craft prompts for an LLM. Retrieval-Augmented Generation anchors answers in trustworthy data, while prompt templates, role messaging, and conversation memory shape chatbot behavior. Agents extend these ideas with iterative tool use, planning, and control flow, often adding human-in-the-loop checks for high-stakes tasks. The chapter outlines LangChain’s principles—modularity, composability, extensibility—and key components (loaders, splitters, embeddings, vector stores, retrievers, prompts, caches, parsers), the Runnable/LCEL pattern for reliable pipelines, and graph-based orchestration with LangGraph. It also highlights the growing standardization of tool access via the Model Context Protocol, which simplifies integrating external capabilities.

To adapt models to real needs, the chapter compares prompt engineering, RAG, and fine-tuning, emphasizing that grounded retrieval often beats retraining for cost, speed, and maintainability, while fine-tuning remains valuable for highly specialized domains. It surveys common LLM use cases and offers guidance on model selection across accuracy, latency, cost, context size, specialization, and openness. Finally, it previews the book’s hands-on path: building engines and RAG chatbots, progressing to advanced LangGraph agents, and instrumenting with LangSmith—equipping readers to design, evaluate, and ship production-grade AI applications.

A summarization engine efficiently summarizes and stores content from large volumes of text and can be invoked by other systems through REST API.

A Q&A engine implemented with RAG design: an LLM query engine stores domain-specific document information in a vector store. When an external system sends a query, it converts the natural language question into its embeddings (or vector) representation, retrieves the related documents from the vector store, and then gives the LLM the information it needs to craft a natural language response.

A summarization chatbot has some similarities with a summarization engine, but it offers an interactive experience where the LLM and the user can work together to fine-tune and improve the results.

Sequence diagram that outlines how a user interacts with an LLM through a chatbot to create a more concise summary.

Workflow of an AI agent tasked with assembling holiday packages: An external client system sends a customized holiday request in natural language. The agent prompts the LLM to select tools and formulate queries in technology-specific formats. The agent executes the queries, gathers the results, and sends them back to the LLM, along with the original request, to obtain a comprehensive holiday package summary. Finally, the agent forwards the summarized package to the client system.

LangChain architecture: The Document Loader imports data, which the Text Splitter divides into chunks. These are vectorized by an Embedding Model, stored in a Vector Store, and retrieved through a Retriever for the LLM. The LLM Cache checks for prior requests to return cached responses, while the Output Parser formats the LLM's final response.

Object model of classes associated with the Document core entity, including Document loaders (which create Document objects), splitters (which create a list of Document objects), vector stores (which store Document objects in vector stores) and retrievers (which retrieve Document objects from vector stores and other sources)

Object model of classes associated with Language Models, including Prompt Templates and Prompt Values

A collection of documents is split into text chunks and transformed into vector-based embeddings. Both text chunks and related embeddings are then stored in a vector store.

Summary

LLMs have rapidly evolved into core building blocks for modern applications, enabling tasks like summarization, semantic search, and conversational assistants.
Without frameworks, teams often reinvent the wheel—managing ingestion, embeddings, retrieval, and orchestration with brittle, one-off code. LangChain addresses this by standardizing these patterns into modular, reusable components.
LangChain’s modular architecture builds on loaders, splitters, embedding models, retrievers, and vector stores, making it straightforward to build engines such as summarization and Q&A systems.
Conversational use cases demand more than static pipelines. LLM-based chatbots extend engines with dialogue management and memory, allowing adaptive, multi-turn interactions.
Beyond chatbots, AI agents represent the most advanced type of LLM application.
Agents orchestrate multi-step workflows and tools under LLM guidance, with frameworks like LangGraph designed to make this practical and maintainable.
Retrieval-Augmented Generation (RAG) is a foundational pattern that grounds LLM outputs in external knowledge, improving accuracy while reducing hallucinations and token costs.
Prompt engineering remains a critical skill for shaping LLM behavior, but when prompts alone aren’t enough, RAG or even fine-tuning can extend capabilities further.

FAQ

What kinds of applications can LLMs power, and why are they useful?

LLMs excel at understanding and generating natural language, enabling summarization, translation, Q&A, sentiment analysis, semantic search, chatbots, and code generation. They can also coordinate actions across systems, which unlocks more advanced, multi-step applications like AI agents.

What are the main differences between engines, chatbots, and AI agents?

Engines provide a focused capability (e.g., summarization or Q&A) and are often exposed via APIs. Chatbots add real-time, multi-turn interaction with memory and role-based prompting, sometimes grounded with retrieval. AI agents go further by deciding which tools to use, orchestrating multi-step workflows, and adapting plans based on intermediate results.

How does a RAG-based Q&A engine work?

RAG has two phases: ingestion and query. Ingestion loads documents, splits them into chunks, creates embeddings, and stores both chunks and vectors in a vector store. At query time, the user question is embedded, similar chunks are retrieved, and the model generates an answer grounded in that context.

What are embeddings, and why are they important?

Embeddings are high-dimensional vectors that represent the meaning of text units (words, sentences, chunks). They enable semantic similarity search, letting systems find related content by meaning rather than exact keywords—critical for RAG, search, clustering, and many retrieval tasks.

How do LLM-based chatbots maintain context and ensure accurate, safe responses?

Chatbots use role-based prompts (system, user, assistant) to define behavior and safety guidelines, and conversation memory to keep multi-turn context within the model’s context window. They often augment answers with retrieved facts from a vector store to improve grounding and reduce hallucinations.

What is an AI agent, and how does it operate?

An AI agent repeatedly consults an LLM to choose tools, executes those tools, interprets results, and loops until it reaches a goal. It can combine structured and unstructured sources, branch based on outcomes, and optionally include human-in-the-loop approvals. Modern ecosystems also expose tools via the Model Context Protocol (MCP), making integrations more standardized.

What problems do LangChain, LangGraph, and LangSmith solve?

LangChain provides modular components (loaders, splitters, embeddings, retrievers, prompts, vector stores, parsers) and LCEL/Runnable for composable pipelines. LangGraph adds graph-based orchestration for dynamic, branching workflows and agents. LangSmith supports evaluation, debugging, and monitoring once your app is running.

What are the key pieces of LangChain’s architecture?

Core elements include Document loaders, Text splitters, Embedding models, Vector stores, Retrievers, Prompt templates, optional LLM caching, LLM/ChatModel interfaces, and Output parsers. The Runnable interface and LCEL compose these parts into reliable chains or agent graphs, and integrations span vector and graph databases.

When should I use prompt engineering, RAG, or fine-tuning?

Start with prompt engineering to shape behavior and inject inline examples. Use RAG when you need grounded, up-to-date, or proprietary knowledge at inference time. Consider fine-tuning for highly specialized domains and styles when prompts and RAG aren’t enough, accepting the cost of data curation and training.

How should I choose an LLM for my application?

Balance accuracy, speed (latency), and cost; larger models are often more accurate but slower and pricier. Also consider context window, model purpose (general vs. code or reasoning), multilingual needs, and open-source vs. proprietary trade-offs. Many systems mix models (e.g., small for fast steps, larger for complex reasoning) to optimize end-to-end performance.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $31.19

you save $16.80 (35%)

eBook

pdf, ePub, online

$47.99 $31.19

you save $16.80 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more