Learn AI Data Engineering in a Month of Lunches you own this product

David Melillo

MEAP began September 2025
Last updated February 2026
Publication in Summer 2026 (estimated)

ISBN 9781633435728
225 pages (estimated)

Included with a Manning Online subscription

printed in black & white

resources: Source code Book forum Source code on GitHub

table of content

Part 1: Core Concepts of Data Engineering with AI

1 Before You Begin

1.1 Why AI Matters to Data Engineering

1.2 Is This Book for You?

1.2.1 The Many Uses for AI

1.2.2 The Many Flavors of AI

1.3 How to Use This Book

1.3.1 The Main Chapters

1.3.2 Hands-on Labs

1.3.3 Chapter Setup Files

1.4 Setting Up Your Environment

1.4.1 Installing PostgreSQL and pgAdmin

1.4.2 Installing Jupyter Lab for Python Work

1.4.3 Creating an OpenAI Account

1.5 Being Immediately Effective with AI and Data Engineering

2 Advantages & Disadvantages of Using a Coding Companion

2.1 Mental Model: The Data Engineer and the Coding Companion

2.2 Advantages of Using an AI/LLM Coding Companion

2.2.1 Rapid Code Generation for Data Engineering Tasks

2.3 Disadvantages of Using a Coding Companion

2.3.1 Introduction to the Pagila Dataset

2.3.2 Example: Asking a Simple Question

2.4 Lab

2.5 Lab Answers

3 Using a Coding Companion with SQL

3.1 Zero-Shot Prompting

3.2 Few-Shot Prompting

3.3 Chain-of-Thought Prompting

3.4 Self-Consistency Prompting

3.5 Tree-of-Thought Prompting

3.6 Role-Playing, Domain Priming, Prompt Chaining and Beyond

3.7 Lab

3.8 Lab Answers

4 Using a Coding Companion with Python

4.1 Interacting with APIs Using AI Coding Companions & Python

4.1.1 Fetching Data from an API

4.1.2 Enhancing API Calls with AI Coding Companions and API Documentation

4.2 Unnesting Complex JSON Objects with AI Companions & Python

4.2.1 Simple Example: Flattening a Single Nested Field

4.2.2 Complex Example: Extracting Deeply Nested & Combined Fields

4.3 Using AI to Implement Regex Patterns

4.3.1 Extracting Phone Numbers from Text

4.3.2 Normalizing Phone Numbers with Regex and AI

4.3.3 Extracting Number Components into a DataFrame

4.4 Lab

4.5 Lab Answers

5 Using the OpenAI API in Data Workflows

5.1 Initial Setup and Data Extraction

5.2 Preprocessing Articles

5.3 Using ChatGPT for Sentiment Analysis

5.3.1 Understanding the ChatGPT API and Chat Completions Endpoint

5.3.2 Raw API Response Processing

5.4 Iteration - Normalizing Sentiment Output, Logging & Consolidation

5.4.1 Normalizing Sentiment Output

5.4.2 Logging & Consolidation

5.5 Lab

5.6 Lab Answers

Part 2: Data Cleaning & Transformation Pipelines with AI

6 AI & Data Quality

6.1 Identifying Data Quality Issues

6.2 Fixing Data Quality Issues

6.2.1 Understanding Data Classes

6.2.2 Using response_format

6.2.3 Working with Multiple Messages

6.3 Fixing Structural and Format Issues

6.4 Lab

6.5 Lab Answers

7 AI and Advanced Data Transformations

7.1 Complex Text Processing with Regular Expressions

7.2 Handling Hierarchical and Nested Data Structures

7.3 Entity Resolution

7.4 Time Series and Date-Time Transformations

7.5 Lab

7.6 Lab Answers

8 AI and The Data Lifecycle

8.1 From AI insights to data pipelines

8.1.1 Evolving AI Integration

8.1.2 Understanding ETL and ELT

8.2 Extracting News Data with AI

8.2.1 Extracting the Raw API Payload

8.2.2 Extracting Data with AI

8.3 Transforming News Data with AI

8.3.1 The Transformation Prompt

8.3.2 The AI Data Engineering Code Harness

8.3.3 The Transformation Pipeline

8.4 Loading News Data with AI

8.4.1 The Contract and Prompt

8.4.2 Response Handling

8.5 Lab

8.6 Lab Answers

9 Data Cleaning and Transformation Pipelines in Practice

9.1 Data Orchestration

9.1.1 Apache Airflow

9.1.2 Beyond Scheduling

9.1.3 Task Framework

9.2 Event Driven Architecture

9.2.1 What are events?

9.2.2 Pub/Sub and Beyond

9.3 Pipelines in Practice

9.3.1 Inspecting the Data & Inferring the Schema

9.3.2 Extracting the basics

9.3.3 Data Quality Transformations

9.3.4 Advanced Transformations

9.3.5 Analysis

9.4 Lab

9.5 Lab Answers

Part 3: Generating Data with AI

10 Introduction to Web Scraping

11 Identifying Opportunities for AI-Generated Data

12 Handling Unstructured Data with AI

13 Data Scraping & AI

Part 4: Data Cleaning & Transformation Pipelines with AI

14 Introduction to Agentic Workflows for Data Engineers

15 Generating Subject Matter Expertise with AI

16 SME and Agentic Workflows: Decision Paths and Data Activation

17 Practical Application: AI-Driven Outreach for Marketing and Sales

Appendices

Appendix A: Setting Up Your Environment

Appendix B: Prompt Engineering Reference

Appendix C: Using the OpenAI API

Appendix D: Dataset Index

Appendix E: Troubleshooting Common Errors

Overview

1 Before You Begin

Artificial intelligence has emerged as a defining technological shift, comparable to the rise of the internet and cloud computing. Unlike earlier AI waves limited by compute and rigid rules, today’s scalable models and abundant data are delivering real-world value across industries—raising new questions for creatives, educators, and software professionals alike. This book sets aside debates about replacement to emphasize augmentation: AI enhances human expertise. In data engineering specifically, as AI abstracts repetitive infrastructure tasks, engineers are expected to move closer to business impact, focusing on logic, insight, and outcomes while collaborating with analysts and data scientists across the data lifecycle.

For data engineers, AI already functions as a capable coding companion—generating and scaffolding code, proposing pipeline designs, interfacing naturally with popular Python libraries, and even critiquing prompts or debugging implementations. Its role stretches from automating ingestion and transformations to enforcing data quality, converting unstructured inputs into structured formats, and flagging anomalies. The same tools accelerate adjacent work: they suggest features for data scientists, speed exploratory analysis, translate questions into SQL for analysts, and streamline reporting. Beyond the data stack, familiar applications span assistants, transportation, healthcare, media, finance, translation, and e-commerce, while in data engineering AI also aids governance by detecting inconsistencies, enforcing policies, and generating synthetic datasets for testing.

This book is intended for practitioners who work with data and want to move beyond casual prompting toward programmatic AI for ingestion, transformation, and enrichment at scale. It’s useful to experienced engineers seeking automation, analysts and scientists extracting structure from messy sources, and AI builders operationalizing workflows; while familiarity with SQL, Python, and AI concepts helps, the guidance is hands-on and accessible. Organized in a “Month of Lunches” cadence, chapters progress from coding companions and prompt engineering to transformations, feature extraction, automation, structured data extraction, agentic workflows, and production-grade patterns. Each chapter includes a short lab and a practical setup guide to reduce friction when configuring tools such as a SQL database, a Python notebook environment, and an AI API. By the end, you’ll treat AI not as a shortcut, but as a multi-tool for rapid development, automation of drudgery, and informed human oversight where it matters most.

Being Immediately Effective with AI and Data Engineering

This book is about practical application. While many books dive deep into LLM architectures and AI theory, this book is about making you effective immediately.

By the end of the first few chapters, you’ll be using AI to generate and validate SQL queries, clean and transform datasets, extract insights from unstructured data, automate feature engineering, and integrate AI into your data pipelines. This book is designed to be hands-on, applied, and immediately useful. Let’s get started!

FAQ

What is the main message of “Before You Begin” in Learn AI Data Engineering in a Month of Lunches?

Chapter 1 frames modern AI as a force-multiplier for humans, not a replacement. It encourages using AI to automate drudgery so you can focus on creativity, critical thinking, and business impact—especially in data engineering. Agentic systems are acknowledged, but the emphasis is on human-in-the-loop workflows.

Why does AI matter specifically to data engineering?

AI shifts data engineers closer to business logic by offloading repetitive and infrastructure-heavy tasks. It speeds ingestion and transformation, assists with data quality (e.g., anomaly flagging), converts unstructured inputs to structured formats, and helps maintain cost-effective, scalable workflows.

How can AI help me write, scaffold, and review data engineering code?

Tools like ChatGPT, GitHub Copilot, and Claude can generate scripts, scaffold ETL/ELT pipelines, and provide natural-language interfaces to libraries (pandas, NumPy, scikit-learn). They can also critique prompts, debug issues, and compare implementation options across frameworks—acting as coding companions and reviewers.

How does AI assist different data personas (engineers, scientists, analysts)?

Data engineers: automate pipeline steps, assist coding, flag anomalies, convert unstructured to structured data.
Data scientists: suggest features, speed up EDA, summarize trends, prototype models and hypotheses.
Data analysts: translate English to SQL, automate analysis and summaries, accelerate dashboards, flag trends/anomalies.

Who is this book for, and what prior knowledge helps?

It’s for data engineers, analysts, data scientists, and AI enthusiasts who want to go beyond chat-based tools to programmatic ingestion, transformation, and enrichment at scale. Familiarity with SQL, Python, and basic AI concepts helps, but the hands-on approach keeps it accessible.

How is the book structured, and how should I use it?

It follows the Month of Lunches format: about 40 minutes of reading plus 20 minutes of practice per chapter. Early chapters cover AI coding companions and prompt engineering; middle chapters focus on transformations and automation; later chapters explore structured extraction, agentic workflows, and programmatic AI applications.

What hands-on labs and setup files are provided?

Nearly every chapter includes a lab to build real AI-enhanced data workflows. Each has a dedicated setup guide in the companion GitHub repo with prerequisites, install steps, environment variables, API key management, datasets, and troubleshooting. Browse the setup/ directory for chapter-by-chapter guides.

What tools do I need to install before starting?

You’ll set up PostgreSQL and pgAdmin for SQL, Jupyter Lab for Python, and an OpenAI account for AI tasks. Setup guides: PostgreSQL/pgAdmin: postgres_setup.md, Jupyter Lab: jupyter_setup.md, OpenAI: openai_setup.md.

Which AI models does the book use, and what are the alternatives?

The book primarily uses OpenAI GPT models for their strong alignment with data engineering workflows. It also surveys alternatives—Anthropic Claude, Google Gemini (Vertex AI), Meta LLaMA, Mistral, xAI Grok, Cohere Command R, and AI21—so you can choose models based on strengths like safety, multimodality, openness, RAG focus, or ecosystem fit.

What outcomes should I expect after finishing the book?

You’ll treat AI as a practical multi-tool: rapidly prototype data workflows, automate tedious tasks, extract structured data from unstructured sources, improve data quality and governance, and know where human judgment adds the most value.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $41.99

you save $14.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $41.99

you save $14.00 (25%)

eBook

pdf, ePub, online

$55.99 $41.99

you save $14.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more