AI Engineering in Practice you own this product

Richard Davies, Rafael Fischer

MEAP began May 2024
Last updated February 2026
Publication in Fall 2026 (estimated)

ISBN 9781633436305
225 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Korean, Russian

resources: Book forum

table of content

1 AI Engineering - The Blueprint

1.1 What is AI Engineering?

1.1.1 From Prompts to Production Systems

1.1.2 When You Need AI Engineering vs. Simple Prompts

1.2 Why AI Engineering Delivers Results

1.2.1 Customer Support at Scale

1.2.2 Document Intelligence in Legal Services

1.2.3 Workflow Automation in Operations

1.3 The Blueprint: How Production AI Systems Work

1.3.1 Complete System Architecture

1.3.2 Following the Transaction

1.3.3 The Five Engineering Layers

1.3.4 Diagnosing System Failures

1.4 What You’re Building Toward

1.5 Summary

2 Foundation Models: Language & Embedding

2.1 Introduction to Foundation Models

2.1.1 Why are they called Foundation Models?

2.1.2 Applications and Possibilities of Foundation Models

2.1.3 Key Characteristics

2.1.4 Types of Foundation Models

2.1.5 Why This Matters for Developers

2.2 Architecture of a Foundation Model

2.2.1 Pre Training

2.2.2 Post Training

2.2.3 Inference: How Models Generate Answers

2.2.4 Distribution of Foundation Models

2.3 Challenges and Trade-offs of Foundation Models

2.4 Embedding Models

2.4.1 What are embeddings?

2.4.2 How embeddings are used?

2.4.3 Practical Applications of Embeddings

2.4.4 Vector Storage and Search

2.4.5 Embedding Model Training

2.4.6 Embedding vs Language Models

2.4.7 Key Considerations

2.5 Conclusion

2.6 Summary

PART 1: FUNDAMENTALS OF PROMPT ENGINEERING

3 Prompt Design: Structural Elements

3.1 Instructions

3.1.1 Practical Example 1: Generating a Shakespearean Sonnet

3.1.2 Constraints

3.1.3 Practical Example 2: Creating a Recipe (with Constraints)

3.1.4 Hands-On Practice

3.2 Context

3.2.1 Practical Example 1: Generating a News Article

3.2.2 Practical Example 2: Generating Dialogue for a Science Fiction Novel

3.2.3 Hands-On Practice

3.3 Input Parameters

3.3.1 Practical Example 1: Sentiment Analysis

3.3.2 Practical Example 2: Personalized Product Recommendation

3.3.3 Hands-On Practice

3.4 Output Format

3.4.1 Output Indicator Pattern

3.4.2 Practical Example 1: Generating a Product Review

3.4.3 Template Pattern (Aka. Fill-in-the-Blanks Pattern)

3.4.4 Practical Example 2: Generating a Recipe

3.4.5 Hands-On Practice

3.5 Delimiters

3.5.1 The Delimiter Pattern

3.5.2 Practical Example 1: Generating Article Summaries

3.5.3 Practical Example 2: Answering Questions Based on a Document

3.5.4 Hands-On Practice

3.6 Combining Structural Elements

3.6.1 Practical Example 1: Generating Product Descriptions

3.6.2 Practical Example 2: Writing News Article Summaries

3.6.3 Hands-On Practice

3.7 Summary

4 Prompt Design: Linguistic Elements

4.1 Precision

4.1.1 Practical Example 1: Social Media Content Creation

4.1.2 Practical Example 2: Visual Content Generation

4.1.3 Hands-On Practice

4.2 Directness

4.2.1 Practical Example 1: Job Application Follow-Up Email

4.2.2 Practical Example 2: Business Logo Design

4.2.3 Hands-On Practice

4.3 Brevity

4.3.1 Practical Example 1: Customer Feedback Analysis

4.3.2 Practical Example 2: Visual Content Generation

4.3.3 Hands-On Practice

4.4 Additional Linguistic Considerations

4.4.1 Positive versus Negative Framing

4.4.2 Tone and Register Calibration

4.4.3 Visual Content Framing

4.4.4 Question versus Statement Formatting

4.5 Summary

5 Prompt Patterns

5.1 Role Assignment Pattern

5.1.1 Practical Example: Email Subject Line Optimization

5.1.2 Hands-On Practice

5.2 Delimiter Pattern

5.2.1 Practical Example: Brand Positioning Analysis

5.2.2 Hands-On Practice

5.3 Template Pattern

5.3.1 Practical Example: Competitive Analysis Reports

5.3.2 Hands-On Practice

5.4 Tail Generation Pattern

5.4.1 Practical Example: Customer Service Consistency

5.4.2 Hands-On Practice

5.5 Self-Reflection Pattern

5.5.1 Practical Example: Study Plan Development

5.5.2 Hands-On Practice

5.6 Inversion Pattern

5.6.1 Practical Example: Marketing Strategy Validation

5.6.2 Hands-On Practice

5.7 Refinement Pattern

5.7.1 Practical Example: Study Explanation Improvement

5.7.2 Hands-On Practice

5.8 Style Transfer Pattern

5.8.1 Practical Example: Technical Explanation Adaptation

5.8.2 Hands-On Practice

5.9 Comment Driven Generation

5.9.1 Practical Example: Research Paper Introduction Refinement

5.9.2 Hands-On Practice

5.10 Summary

6 Prompt Templates

6.1 Practical Example 1: Product Description Generator

6.1.1 Prompt Template Components

6.1.2 Prompt Template

6.1.3 Prompt (Variables Injected)

6.2 Practical Example 2: Personalized Workout Plan Generator

6.2.1 Prompt Template Components

6.2.2 Prompt Template

6.2.3 Prompt (Variables Injected)

6.3 Hands-On Practice

6.3.1 Exercise 1: Customer Support Email

6.3.2 Exercise 2: Market Research Analysis

6.3.3 Exercise 3: Technical Documentation Template

6.3.4 Exercise 4: Investment Analysis Template

6.3.5 Exercise 5: Content Marketing Strategy Template

6.3.6 Exercise 6: Learning Module Design Template

6.4 Summary

7 Prompt Types

7.1 Chat Model Architecture

7.1.1 Three Message Types

7.1.2 What You Design: System and User Prompts

7.2 System Prompts

7.2.1 Purpose and Principles

7.2.2 Quick Reference: Mapping Chapter 3 Elements to Chat Prompts

7.2.3 Practical Example: Linear Issue Assistant

7.2.4 Practical Example: Healthcare Patient Summary Assistant

7.2.5 Hands-On Practice

7.3 User Prompts

7.3.1 Types and Templates

7.3.2 Practical Example: Customer Testimonial to Case Study Assistant

7.3.3 Hands-On Practice

7.4 Prompt Decomposition Framework

7.4.1 The Core Question

7.4.2 Worked Example: Decomposing Step-by-Step

7.4.3 Quick Decisions

7.4.4 When to Use Architectural Separation

7.4.5 Common Mistakes

7.4.6 Hands-On Practice

7.5 Summary

8 Prompt Sampling

9 Advanced Prompt Patterns

10 Prompt Security & Guardrails

11 Prompt Management & Observability

PART 2: APPLIED AI ENGINEERING

12 Text Generation

13 Workflows I – Chaining (Part 1)

14 Workflows II – Routing (Part 2)

15 Retrieval Augmented Generation (RAG)

16 Agents

17 Evaluation

18 Optimization

19 Context Engineering

20 Vibe Coding

Overview

1 AI Engineering - The Blueprint

This chapter introduces AI Engineering as the disciplined practice of building production systems with modern AI, contrasting it with one-off prompt crafting. It explains the demo-to-production gap: what works in a single chat often fails at scale without architecture to manage quality, latency, cost, and risk. The blueprint centers on five integrated layers—prompt routing, retrieval-augmented generation (RAG), structured prompts, autonomous agents, and operational infrastructure—showing when simple prompting suffices and when engineering is required for integration, consistency, security, and economic viability.

Real-world outcomes illustrate the stakes. An airline’s chatbot fiasco stemmed from missing guardrails and monitoring, while Klarna’s assistant succeeded by combining routing to the right model, grounding answers in company data, layered validation, and continuous evaluation. The chapter walks a single customer query through the full pipeline: classification and routing to balance cost and capability; semantic retrieval to ground responses; structured prompting to control tone and format; automated checks for policy compliance and hallucinations; and confidence-based escalation to humans. The message is clear: reliability emerges from modular layers that isolate failures and make systems observable and maintainable.

Beyond principles, the chapter provides a pragmatic diagnostic lens that maps symptoms to architectural fixes—runaway costs to routing, hallucinations to missing RAG, inconsistent outputs to weak prompts, policy violations to absent validation, multi-step brittleness to workflow or agent gaps, and vulnerabilities to missing security controls. Case studies in support, legal analysis, and operations show measurable gains in accuracy, speed, and cost. Finally, the roadmap frames how foundational prompt skills evolve into production AI systems, extending software engineering fundamentals with AI-specific patterns to deliver trustworthy, scalable applications.

The Demo-to-Production Gap

Production AI System Architecture

Summary

Ad-hoc prompting collapses at production scale - Air Canada's chatbot hallucinated policies costing $3.2M, while Klarna's engineered system handled 2.3M conversations monthly through systematic architecture, not better prompts.
The demo-to-production gap emerges at scale - single-case success fails when serving thousands daily, exposing edge cases, context limits, cost explosions, and security vulnerabilities invisible in testing.
Even simple tasks hide engineering complexity - product descriptions need parameterized templates, structured schemas, validation frameworks, and performance monitoring to sustain quality beyond initial demos.
Production reliability comes from layered defenses - routing cuts costs 60-80%, RAG eliminates hallucinations through verified grounding, validation catches errors like the $3,650 in unauthorized gift cards promised to 73 customers.
Behind successful interactions lies invisible infrastructure - Sarah's two-minute payment resolution required routing, knowledge retrieval, synthesis guardrails, validation, and confidence scoring that simpler approaches cannot provide.
This blueprint transforms isolated techniques into production systems - you'll build architectures that prevent Air Canada's disasters while achieving Klarna's scale, handling thousands of daily interactions with measurable reliability.

FAQ

What is AI Engineering and how is it different from Prompt Engineering?

AI Engineering is software engineering that incorporates modern AI (LLMs, embeddings, vector databases) to solve problems with unstructured data. It extends software fundamentals—architecture, testing, error handling, monitoring—with AI-specific patterns like retrieval, validation, routing, and agents. Prompt Engineering focuses on communicating effectively with models; AI Engineering builds production systems around those prompts for reliability, scalability, cost control, and governance.

When should I move beyond simple prompting to AI Engineering?

Adopt AI Engineering when any of the following apply: outputs must feed databases or trigger APIs; quality must be consistent across thousands of users; failures have consequences (customer-facing, financial, legal, medical); costs matter at scale; or you face security threats like prompt injection or data exfiltration. Simple prompting is fine for personal productivity and low-stakes drafts that humans review.

What are the five architectural layers in the blueprint?

The blueprint comprises: 1) Prompt Routing—send each request to the right model/pipeline by topic and complexity; 2) RAG (Retrieval Augmented Generation)—ground responses in authoritative documents via semantic search; 3) Prompt Engineering—structured instructions, formats, and templates for consistent outputs; 4) Autonomous Agents—multi-step, tool-using workflows for complex tasks; 5) Operational Infrastructure—evaluation, monitoring, cost optimization, security, and lifecycle management.

How does prompt routing reduce cost without hurting quality?

Routing classifies requests (e.g., simple vs. complex) and directs them to the most appropriate, cost-effective resource, reserving advanced models for truly hard cases. In practice this typically cuts costs by 60–80%. The chapter’s example shows monthly spend around $2,580 with routing versus $15,000 if every query hits premium models—while maintaining quality by escalating complex queries when needed.

What is RAG and how does it reduce hallucinations?

RAG (Retrieval Augmented Generation) searches a knowledge base for relevant, up-to-date documents (via a vector database for semantic similarity) and injects them into the prompt. Grounding generation in authoritative sources—with citations—dramatically reduces hallucinations and prevents invented policies or promises.

What validation and quality controls should sit between the model and the user?

Use a validation layer that includes: policy compliance checks; hallucination detection (e.g., model-as-a-judge verifying claims appear in sources); tone and style checks; citation validation; and confidence scoring with thresholds that trigger human review. These guardrails catch costly errors before they reach customers.

How does a single customer query flow through the production pipeline?

A typical flow: the router classifies the query and selects the pipeline; RAG retrieves relevant policies and procedures; structured prompts synthesize a grounded, on-brand response with citations; the validation layer checks policy alignment, groundedness, tone, and citations; finally, the system delivers a response with a confidence score, auto-escalating to human review if below threshold.

How do I diagnose common failures in production AI systems?

Match symptoms to layers: high API costs → routing inefficiency; hallucinations → missing/weak RAG; inconsistent quality → prompt engineering gaps; policy violations/operational risk → insufficient validation; failure on multi-step tasks → inadequate chaining/agent design; security incidents from malicious inputs → missing defensive patterns and privilege separation.

How do I scale a simple content-generation prompt into a production workflow?

Move from an ad-hoc prompt to a parameterized template with explicit instructions and output formats; enforce structured outputs (e.g., JSON) for database insertion; add validation to catch regressions; monitor costs and performance; apply routing to choose models per item; and architect around context limits with retrieval and batching. This enables thousands of items/day within budget and consistent quality.

What real results can AI Engineering deliver compared to ad-hoc prompting?

Case studies show transformative outcomes: customer support improved accuracy and response times with routing, RAG, and validation; legal document analysis dropped from ~40 hours to ~2 hours with ~95% extraction accuracy under human review; manufacturing intake achieved near real-time processing with ~98% accuracy via schemas, validation, retries, and API integration. Systematic architecture—not ad-hoc prompts—drives reliability, scale, and cost efficiency.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $35.99

you save $12.00 (25%)

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more