Deep Learning with Python, Third Edition you own this product

François Chollet and Matthew Watson

September 2025
ISBN 9781633436589
648 pages

Included with a Manning Online subscription

printed in color

available in Korean, Russian, Simplified Chinese

catalog / Data Science / Deep Learning

table of content

1 What is deep learning?

1.1 Artificial intelligence, machine learning, and deep learning

1.2 Artificial intelligence

1.3 Machine learning

1.4 Learning rules and representations from data

1.5 The “deep” in “deep learning”

1.6 Understanding how deep learning works, in three figures

1.7 What makes deep learning different

1.8 The age of generative AI

1.9 What deep learning has achieved so far

1.10 Beware of the short-term hype

1.11 Summer can turn to winter

1.12 The promise of AI

2 The mathematical building blocks of neural networks

2.1 A first look at a neural network

2.2 Data representations for neural networks

2.2.1 Scalars (rank-0 tensors)

2.2.2 Vectors (rank-1 tensors)

2.2.3 Matrices (rank-2 tensors)

2.2.4 Rank-3 tensors and higher-rank tensors

2.2.5 Key attributes

2.2.6 Manipulating tensors in NumPy

2.2.7 The notion of data batches

2.2.8 Real-world examples of data tensors

2.3 The gears of neural networks: Tensor operations

2.3.1 Element-wise operations

2.3.2 Broadcasting

2.3.3 Tensor product

2.3.4 Tensor reshaping

2.3.5 Geometric interpretation of tensor operations

2.3.6 A geometric interpretation of deep learning

2.4 The engine of neural networks: Gradient-based optimization

2.4.1 What’s a derivative?

2.4.2 Derivative of a tensor operation: The gradient

2.4.3 Stochastic gradient descent

2.4.4 Chaining derivatives: The Backpropagation algorithm

2.5 Looking back at our first example

2.5.1 Reimplementing our first example from scratch

2.5.2 Running one training step

2.5.3 The full training loop

2.5.4 Evaluating the model

2.6 Summary

3 Introduction to TensorFlow, PyTorch, JAX, and Keras

3.1 A brief history of deep learning frameworks

3.2 How these frameworks relate to each other

3.3 Introduction to TensorFlow

3.3.1 First steps with TensorFlow

3.3.2 An end-to-end example: A linear classifier in pure TensorFlow

3.3.3 What makes the TensorFlow approach unique

3.4 Introduction to PyTorch

3.4.1 First steps with PyTorch

3.4.2 An end-to-end example: A linear classifier in pure PyTorch

3.4.3 What makes the PyTorch approach unique

3.5 Introduction to JAX

3.5.1 First steps with JAX

3.5.2 Tensors in JAX

3.5.3 Random number generation in JAX

3.5.4 An end-to-end example: A linear classifier in pure JAX

3.5.5 What makes the JAX approach unique

3.6 Introduction to Keras

3.6.1 First steps with Keras

3.6.2 Layers: The building blocks of deep learning

3.6.3 From layers to models

3.6.4 The “compile” step: Configuring the learning process

3.6.5 Picking a loss function

3.6.6 Understanding the fit method

3.6.7 Monitoring loss and metrics on validation data

3.6.8 Inference: Using a model after training

4 Classification and regression

4.1 Classifying movie reviews: A binary classification example

4.1.1 The IMDb dataset

4.1.2 Preparing the data

4.1.3 Building your model

4.1.4 Validating your approach

4.1.5 Using a trained model to generate predictions on new data

4.1.6 Further experiments

4.1.7 Wrapping up

4.2 Classifying newswires: A multiclass classification example

4.2.1 The Reuters dataset

4.2.2 Preparing the data

4.2.3 Building your model

4.2.4 Validating your approach

4.2.5 Generating predictions on new data

4.2.6 A different way to handle the labels and the loss

4.2.7 The importance of having sufficiently large intermediate layers

4.2.8 Further experiments

4.2.9 Wrapping up

4.3 Predicting house prices: A regression example

4.3.1 The California Housing Price dataset

4.3.2 Preparing the data

4.3.3 Building your model

4.3.4 Validating your approach using K-fold validation

4.3.5 Generating predictions on new data

4.3.6 Wrapping up

5 Fundamentals of machine learning

5.1 Generalization: The goal of machine learning

5.1.1 Underfitting and overfitting

5.1.2 The nature of generalization in deep learning

5.2 Evaluating machine-learning models

5.2.1 Training, validation, and test sets

5.2.2 Beating a common-sense baseline

5.2.3 Things to keep in mind about model evaluation

5.3 Improving model fit

5.3.1 Tuning key gradient descent parameters

5.3.2 Using better architecture priors

5.3.3 Increasing model capacity

5.4 Improving generalization

5.4.1 Dataset curation

5.4.2 Feature engineering

5.4.3 Using early stopping

5.4.4 Regularizing your model

6 The universal workflow of machine learning

6.1 Defining the task

6.1.1 Framing the problem

6.1.2 Collecting a dataset

6.1.3 Understanding your data

6.1.4 Choosing a measure of success

6.2 Developing a model

6.2.1 Preparing the data

6.2.2 Choosing an evaluation protocol

6.2.3 Beating a baseline

6.2.4 Scaling up: Developing a model that overfits

6.2.5 Regularizing and tuning your model

6.3 Deploying your model

6.3.1 Explaining your work to stakeholders and setting expectations

6.3.2 Shipping an inference model

6.3.3 Monitoring your model in the wild

6.3.4 Maintaining your model

7 A deep dive on Keras

7.1 A spectrum of workflows

7.2 Different ways to build Keras models

7.2.1 The Sequential model

7.2.2 The Functional API

7.2.3 Subclassing the Model class

7.2.4 Mixing and matching different components

7.2.5 Remember: Use the right tool for the job

7.3 Using built-in training and evaluation loops

7.3.1 Writing your own metrics

7.3.2 Using callbacks

7.3.3 Writing your own callbacks

7.3.4 Monitoring and visualization with TensorBoard

7.4 Writing your own training and evaluation loops

7.4.1 Training vs. inference

7.4.2 Writing custom training step functions

7.4.3 Low-level usage of metrics

7.4.4 Using fit() with a custom training loop

7.4.5 Handling metrics in a custom train_step()

8 Image classification

8.1 Introduction to ConvNets

8.1.1 The convolution operation

8.1.2 The max-pooling operation

8.2 Training a ConvNet from scratch on a small dataset

8.2.1 The relevance of deep learning for small-data problems

8.2.2 Downloading the data

8.2.3 Building your model

8.2.4 Data preprocessing

8.2.5 Using data augmentation

8.3 Using a pretrained model

8.3.1 Feature extraction with a pretrained model

8.3.2 Fine-tuning a pretrained model

9 ConvNet architecture patterns

9.1 Modularity, hierarchy, and reuse

9.2 Residual connections

9.3 Batch normalization

9.4 Depthwise separable convolutions

9.5 Putting it together: A mini Xception-like model

9.6 Beyond convolution: Vision Transformers

10 Interpreting what ConvNets learn

10.1 Visualizing intermediate activations

10.2 Visualizing ConvNet filters

10.2.1 Gradient ascent in TensorFlow

10.2.2 Gradient ascent in PyTorch

10.2.3 Gradient ascent in JAX

10.2.4 The filter visualization loop

10.3 Visualizing heatmaps of class activation

10.3.1 Getting the gradient of the top class: TensorFlow version

10.3.2 Getting the gradient of the top class: PyTorch version

10.3.3 Getting the gradient of the top class: JAX version

10.3.4 Displaying the class activation heatmap

10.4 Visualizing the latent space of a ConvNet

11 Image segmentation

11.1 Computer vision tasks

11.1.1 Types of image segmentation

11.2 Training a segmentation model from scratch

11.2.1 Downloading a segmentation dataset

11.2.2 Building and training the segmentation model

11.3 Using a pretrained segmentation model

11.3.1 Downloading the Segment Anything Model

11.3.2 How Segment Anything works

11.3.3 Preparing a test image

11.3.4 Prompting the model with a target point

11.3.5 Prompting the model with a target box

12 Object detection

12.1 Single-stage vs. two-stage object detectors

12.1.1 Two-stage R-CNN detectors

12.1.2 Single-stage detectors

12.2 Training a YOLO model from scratch

12.2.1 Downloading the COCO dataset

12.2.2 Creating a YOLO model

12.2.3 Readying the COCO data for the YOLO model

12.2.4 Training the YOLO model

12.3 Using a pretrained RetinaNet detector

13 Timeseries forecasting

13.1 Different kinds of timeseries tasks

13.2 A temperature forecasting example

13.2.1 Preparing the data

13.2.2 A commonsense, non-machine-learning baseline

13.2.3 Let’s try a basic machine learning model

13.2.4 Let’s try a 1D convolutional model

13.3 Recurrent neural networks

13.3.1 Understanding recurrent neural networks

13.3.2 A recurrent layer in Keras

13.3.3 Getting the most out of recurrent neural networks

13.3.4 Using recurrent dropout to fight overfitting

13.3.5 Stacking recurrent layers

13.3.6 Using bidirectional RNNs

13.4 Going even further

14 Text classification

14.1 A brief history of natural language processing

14.2 Preparing text data

14.2.1 Character and word tokenization

14.2.2 Subword tokenization

14.3 Sets vs. sequences

14.3.1 Loading the IMDb classification dataset

14.4 Set models

14.4.1 Training a bag-of-words model

14.4.2 Training a bigram model

14.5 Sequence models

14.5.1 Training a recurrent model

14.5.2 Understanding word embeddings

14.5.3 Using a word embedding

14.5.4 Pretraining a word embedding

14.5.5 Using the pretrained embedding for classification

15 Language models and the Transformer

15.1 The language model

15.1.1 Training a Shakespeare language model

15.1.2 Generating Shakespeare

15.2 Sequence-to-sequence learning

15.2.1 English-to-Spanish translation

15.2.2 Sequence-to-sequence learning with RNNs

15.3 The Transformer architecture

15.3.1 Dot-product attention

15.3.2 Transformer encoder block

15.3.3 Transformer decoder block

15.3.4 Sequence-to-sequence learning with a Transformer

15.3.5 Embedding positional information

15.4 Classification with a pretrained Transformer

15.4.1 Pretraining a Transformer encoder

15.4.2 Loading a pretrained Transformer

15.4.3 Preprocessing IMDb movie reviews

15.4.4 Fine-tuning a pretrained Transformer

15.5 What makes the Transformer effective?

16 Text generation

16.1 A brief history of sequence generation

16.2 Training a mini-GPT

16.2.1 Building the model

16.2.2 Pretraining the model

16.2.3 Generative decoding

16.2.4 Sampling strategies

16.3 Using a pretrained LLM

16.3.1 Text generation with the Gemma model

16.3.2 Instruction fine-tuning

16.3.3 Low-Rank Adaptation (LoRA)

16.4 Going further with LLMs

16.4.1 Reinforcement Learning with Human Feedback (RLHF)

16.4.2 Multimodal LLMs

16.4.3 Retrieval Augmented Generation (RAG)

16.4.4 “Reasoning” models

16.5 Where are LLMs heading next?

17 Image generation

17.1 Deep learning for image generation

17.1.1 Sampling from latent spaces of images

17.1.2 Variational autoencoders

17.1.3 Implementing a VAE with Keras

17.2 Diffusion models

17.2.1 The Oxford Flowers dataset

17.2.2 A U-Net denoising autoencoder

17.2.3 The concepts of diffusion time and diffusion schedule

17.2.4 The training process

17.2.5 The generation process

17.2.6 Visualizing results with a custom callback

17.2.7 It’s go time!

17.3 Text-to-image models

17.3.1 Exploring the latent space of a text-to-image model

18 Best practices for the real world

18.1 Getting the most out of your models

18.1.1 Hyperparameter optimization

18.1.2 Model ensembling

18.2 Scaling up model training with multiple devices

18.2.1 Multi-GPU training

18.2.2 Distributed training in practice

18.2.3 TPU training

18.3 Speeding up training and inference with lower-precision computation

18.3.1 Understanding floating-point precision

18.3.2 Float16 inference

18.3.3 Mixed-precision training

18.3.4 Using loss scaling with mixed precision

18.3.5 Beyond mixed precision: float8 training

18.3.6 Faster inference with quantization

19 The future of AI

19.1 The limitations of deep learning

19.1.1 Deep learning models struggle to adapt to novelty

19.1.2 Deep learning models are highly sensitive to phrasing and other distractors

19.1.3 Deep learning models struggle to learn generalizable programs

19.1.4 The risk of anthropomorphizing machine-learning models

19.2 Scale isn’t all you need

19.2.1 Automatons vs. intelligent agents

19.2.2 Local generalization vs. extreme generalization

19.2.3 The purpose of intelligence

19.2.4 Climbing the spectrum of generalization

19.3 How to build intelligence

19.3.1 The kaleidoscope hypothesis

19.3.2 The essence of intelligence: Abstraction acquisition and recombination

19.3.3 The importance of setting the right target

19.3.4 A new target: On-the-fly adaptation

19.3.5 ARC Prize

19.3.6 The test-time adaptation era

19.3.7 ARC-AGI 2

19.4 The missing ingredients: Search and symbols

19.4.1 The two poles of abstraction

19.4.2 Cognition as a combination of both kinds of abstraction

19.4.3 Why deep learning isn’t a complete answer to abstraction generation

19.4.4 An alternative approach to AI: Program synthesis

19.4.5 Blending deep learning and program synthesis

19.4.6 Modular component recombination and lifelong learning

19.4.7 The long-term vision

20 Conclusions

20.1 Key concepts in review

20.1.1 Various approaches to artificial intelligence

20.1.2 What makes deep learning special within the field of machine learning

20.1.3 How to think about deep learning

20.1.4 Key enabling technologies

20.1.5 The universal machine learning workflow

20.1.6 Key network architectures

20.2 Limitations of deep learning

20.3 What might lie ahead

20.4 Staying up to date in a fast-moving field

20.4.1 Practice on real-world problems using Kaggle

20.4.2 Read about the latest developments on arXiv

20.4.3 Explore the Keras ecosystem

20.5 Final words

Overview

1 What is deep learning?

Artificial intelligence has surged into public consciousness, often accompanied by sweeping promises and dire warnings. This chapter cuts through the noise by situating deep learning within the broader landscape of AI and machine learning, clarifying what each term means and why deep learning has become so influential. It frames the reader’s role as a future practitioner who can distinguish genuine progress from hype, and it sets out to answer what deep learning has achieved, how significant those achievements are, and where the field is heading.

AI is presented as the effort to automate intellectual tasks, historically dominated by symbolic systems before shifting to machine learning, where models are trained on examples rather than programmed with rules. The chapter emphasizes representation learning: transforming raw data into forms that make tasks easier, and explains deep learning as stacking many such transformations in neural networks. It outlines how models learn by adjusting weights to minimize a loss via backpropagation, and why deep learning matters: it automates feature engineering (simplicity), scales with data and compute (scalability), and supports continual adaptation and reuse (versatility). It also introduces the rise of foundation models and generative AI, powered by self-supervised learning and prompting, which leverage vast unlabeled data to produce broadly useful capabilities.

The chapter surveys breakthroughs—from human-level perception and translation to code assistants, photorealistic image generation, improved recommendation, autonomous driving, and scientific and medical applications—while urging caution about short-term expectations. It distinguishes today’s systems as powerful cognitive automation rather than general intelligence, noting limits in adaptability and warning that hype cycles can trigger funding winters, as history shows. Although a correction to current exuberance is plausible, the core ideas and methods of deep learning are poised to endure and continue transforming workflows across domains.

Artificial intelligence, machine learning, and deep learning HTML: class=''small-image''

Machine learning: a new programming paradigm HTML: class=''small-image''

Some sample data HTML: class=''extra-small-image''

Coordinate change HTML: class=''large-image''

A deep neural network for digit classification

Deep representations learned by a digit-classification model HTML: class=''large-image''

A neural network is parameterized by its weights.

A loss function measures the quality of the network’s output. HTML: class=''small-image''

The loss score is used as a feedback signal to adjust the weights. HTML: class=''small-image''

The promise of AI

Although we may have unrealistic short-term expectations for AI, the long-term picture is looking bright. We’re only getting started in applying deep learning to many important problems for which it could prove transformative, from medical diagnoses to digital assistants.

In 2017, in this very book, I wrote:

Right now, it may seem hard to believe that AI could have a large impact on our world, because it isn’t yet widely deployed – much as, back in 1995, it would have been difficult to believe in the future impact of the internet. Back then, most people didn’t see how the internet was relevant to them and how it was going to change their lives. The same is true for deep learning and AI today. But make no mistake: AI is coming. In a not-so-distant future, AI will be your assistant, even your friend; it will answer your questions, help educate your kids, and watch over your health. It will deliver your groceries to your door and drive you from point A to point B. It will be your interface to an increasingly complex and information-intensive world. And, even more important, AI will help humanity as a whole move forward, by assisting human scientists in new breakthrough discoveries across all scientific fields, from genomics to mathematics.

Fast-forward to 2024, most of these things have come either come true or are on the verge of coming true – and this is just the beginning.

Tens of millions of people are using AI chatbots like ChapGPT, Gemini, or Claude as assistants on a daily basis. In fact, question-answering and ''educating your kids'' (homework assistance) have turned out to be the top applications of these chatbots! AI is already their go-to interface to the world’s information.
Tens of thousands of people interact with AI ''friends'' in applications such as Character.ai
Fully autonomous driving is already deployed at scale in two cities: San Francisco and Phoenix. And those autonomous Waymo cars are powered by Keras models!
AI is making major strides towards helping accelerate science. The AlphaFold model from DeepMind is helping biologists predict protein structures with unprecedented accuracy. Renowned mathematician Terence Tao believes that by around 2026, AI could become a reliable co-author in mathematical research and other fields when used appropriately.

The AI revolution, once a distant vision, is now rapidly unfolding before our eyes. On the way, we may face a few setbacks and maybe even a new AI winter – in much the same way the internet industry was overhyped in 1998–1999 and suffered from a crash that dried up investment throughout the early 2000s. But we’ll get there eventually. AI will end up being applied to nearly every process that makes up our society and our daily lives, much like the internet is today.

Don’t believe the short-term hype, but do believe in the long-term vision. It may take a while for AI to be deployed to its true potential – a potential the full extent of which no one has yet dared to dream – but AI is coming, and it will transform our world in a fantastic way.

FAQ

What’s the difference between AI, machine learning, and deep learning?

Artificial intelligence (AI) is the broad goal of automating intellectual tasks typically done by humans. Machine learning (ML) is a subfield of AI where systems learn rules from data instead of being explicitly programmed. Deep learning (DL) is a subfield of ML that learns many successive layers of representations using neural networks.

Why did symbolic AI and expert systems fall short for many real‑world problems?

Symbolic AI relied on hand‑crafted rules and explicit knowledge bases. It worked for tightly defined, logical tasks (like chess) but proved brittle for fuzzy, high‑dimensional problems such as vision, speech, and translation. Machine learning supplanted it by learning directly from examples rather than relying on ever‑growing rule sets.

What does “learning representations” mean, and why does it matter?

A representation is just another way to encode data so the task becomes easier. ML searches a predefined space of transformations (the hypothesis space) and uses feedback to pick those that bring inputs closer to desired outputs. Good representations turn hard problems into simpler ones that can be solved with straightforward rules.

What makes learning “deep” in deep learning?

“Deep” refers to stacking many layers of learned transformations. Each layer distills information a bit more, forming hierarchical features. Shallow approaches typically learn only one or two layers, while modern deep networks may have tens or hundreds, all learned from data.

How does a neural network actually learn during training?

Each layer has parameters called weights. A loss function measures how far predictions are from targets. An optimizer uses backpropagation to adjust weights to reduce loss. Starting from random weights, repeating this training loop over many examples gradually yields a model that maps inputs to targets well.

What makes deep learning especially impactful today?

Deep learning combines:

Simplicity: It automates feature engineering, replacing complex pipelines with end‑to‑end models.
Scalability: It trains efficiently on GPUs/TPUs with mini‑batches and benefits from hardware advances.
Versatility and reusability: Trained models can be updated with more data and repurposed across tasks (e.g., foundation models).

What is generative AI, and how does self‑supervised learning fit in?

Generative AI models (often large foundation models) learn to reconstruct or predict parts of their inputs (e.g., the next word, a denoised image). Because targets come from the input itself, they can use vast unlabeled datasets—this is self‑supervised learning. After training, they can be adapted to many tasks via prompting with little or no retraining.

What has deep learning achieved so far?

Deep learning has driven breakthroughs across perception, language, generation, and control, including:

Versatile chatbots and coding assistants
Photorealistic image generation
Human‑level image, speech, and handwriting recognition
Much improved translation and text‑to‑speech
Autonomous driving deployments
Stronger recommender systems
Superhuman play in Go, Chess, and Poker

Should we believe the hype about imminent AGI and massive productivity booms?

Approach bold claims with skepticism. Today’s systems are best viewed as cognitive automation—powerful tools that operationalize skills from data—not as autonomous intelligence. They excel where tasks are well specified or richly exemplified, but adaptability to the truly unknown remains a key gap.

Could there be another “AI winter,” and what might it look like?

AI has seen two prior winters after hype cycles in the 1970s and late 1980s. A broad retreat seems less likely now, but some deflation is plausible because investment currently outpaces revenues. If a downturn comes, it may be mild: progress is real and widely useful, even if short‑term expectations prove overinflated.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$63.99 $41.59

you save $22.40 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$63.99 $41.59

you save $22.40 (35%)

include audio $24.99 $16.24

eBook

pdf, ePub, online

$63.99 $41.59

you save $22.40 (35%)

include audio $24.99 $16.24

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more