Kafka for Architects you own this product

Event-driven architecture, logs, microservices, real-time event processing

Katya Gorshkova

MEAP began November 2024
Last updated September 2025
Publication in December 2025 (estimated)

ISBN 9781633436411
375 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / Big Data / Stream Processing

table of content

1 Getting to know Kafka as an Architect

1.1 How an architect sees Kafka

1.1.1 Event-driven architecture

1.1.2 Handling myriads of data

1.2 Field notes: Journey of an event-driven project

1.3 Key players in the Kafka ecosystem

1.3.1 Brokers and clients

1.3.2 Managing cluster metadata

1.4 Architectural principles

1.4.1 The Publish-Subscribe pattern

1.4.2 Reliable delivery

1.4.3 The commit log

1.5 Designing and managing data flows

1.5.1 Schema Registry: handling data contracts

1.5.2 Kafka Connect: data replication without code

1.5.3 Data transformation: streaming frameworks

1.6 Impacting operations and infrastructure

1.6.1 Kafka tuning and maintenance

1.6.2 On-premise and cloud options

1.6.3 Solutions from other cloud providers

1.7 Applying Kafka in enterprise

1.7.1 Using Kafka for sending messages

1.7.2 Using Kafka for storing data

1.7.3 How Kafka is different

1.8 Field notes: Getting started with a Kafka project

1.9 Online Resources

1.10 Summary

2 Kafka Cluster Data Architecture

2.1 Field notes: From sketch to project—topics, partitions, and keys

2.2 Inside the Kafka cluster

2.3 Core concepts of data processing

2.3.1 Partitioning the topic

2.3.2 Processing data concurrently

2.3.3 Ordering within a topic

2.4 Field notes: Capturing the architecture of topics, partitions, and beyond

2.4.1 Introducing AsyncAPI

2.5 Replicating partitions

2.5.1 Replica leaders and followers

2.5.2 Choosing replication factor and minimal number of in-sync replicas

2.5.3 Field notes: Extending topic configuration with replication information

2.5.4 Architecture Notes: Configuring Topics

2.6 Inside the topic

2.6.1 Messages: keys, values and headers

2.6.2 Field notes: First draft for documenting messages in AsyncAPI

2.6.3 Message batches and offsets

2.6.4 Physical representation of a topic

2.6.5 Data retention

2.6.6 Selecting the number of partitions

2.6.7 Field notes: Configuring topic metadata

2.6.8 Architecture Points: Advanced Topic Configuration

2.7 Compacted topics

2.7.1 The idea of compaction

2.7.2 How compaction works

2.7.3 Making decisions about compaction policy

2.7.4 Architecture Points: Compaction

2.8 Online resources

2.9 Summary

3 Kafka Clients and Message Production

3.1 Field notes: Patterns and pitfalls for Kafka clients

3.2 Communicating with Kafka

3.2.1 Configuring clients

3.2.2 Connecting to Kafka

3.2.3 Serializing and deserializing data

3.2.4 Setting quotas

3.2.5 Field notes: Setting up Customer360 project

3.2.6 Architecture Points: Initial Producer Configuration

3.3 Sending a message

3.3.1 Partitioning strategy

3.3.2 Field notes: Partitioning strategy for Customer360

3.3.3 Acknowledgement strategies

3.3.4 Field notes: Acknowledgment strategy for Customer360

3.3.5 Batches and timeouts

3.3.6 Field notes: Configuring producer for Customer360

3.3.7 Common producer issues

3.3.8 Architecture Points: advanced producer configuration

3.4 Online resources

3.5 Summary

4 Creating Consumer Applications

4.1 Field notes: Consumer patterns and trade-offs

4.2 Organizing consumer applications

4.3 Receiving a message

4.3.1 Reading data in parallel

4.3.2 Field notes: Setting initial consumer configuration for Customer 360 project

4.3.3 Group Leader and Group Coordinator

4.3.4 Committing the offsets

4.3.5 Field notes: Specifying the strategy for committing offsets for Customer360

4.3.6 Creating batches

4.3.7 Timeouts and partition rebalance

4.3.8 Static Group Membership

4.3.9 Partition assignment strategies

4.3.10 The Next-Gen Consumer Rebalance Protocol

4.3.11 Subscriptions and assignment

4.3.12 Reading data from compacted topics

4.3.13 Field notes: Consumer considerations for the Customer360 project

4.4 Common consumer issues

4.5 Data compression

4.6 Accessing Kafka through REST Proxy

4.7 Online resources

4.8 Summary

5 Kafka in Real World Use Cases

5.1 Field notes: When to choose Kafka—and when not to

5.2 Navigating real-world implementation

5.2.1 Event-driven microservices

5.2.2 Data integration

5.2.3 Collecting logs

5.2.4 Real-time data processing

5.3 Differences with other messaging platforms

5.3.1 Publish-subscribe model

5.3.2 Partitioned data

5.3.3 Lack of broker-side logic

5.3.4 Sequential data access

5.3.5 Message persistence

5.3.6 Limitations in handling large messages

5.3.7 Scalability and high throughput

5.3.8 Fault tolerance

5.3.9 Batch processing

5.3.10 Lack of global ordering

5.4 Kafka Alternatives

5.4.1 RabbitMQ

5.4.2 Apache Pulsar

5.4.3 Solutions from cloud providers

5.5 Online Resources

5.6 Summary

6 Defining Data Contracts

6.1 Field notes: Turning business facts into schemas

6.2 How Kafka handles event structure

6.3 Designing events

6.3.1 Challenges in event design

6.3.2 Fact and delta events: representing state changes

6.3.3 Composite, atomic, and aggregate events: representing event structure

6.3.4 Pulling state on notification

6.3.5 Evolution of types

6.3.6 Mapping events to Kafka messages

6.3.7 Field notes: Data strategies for Customer 360 project

6.4 Event Governance

6.4.1 Data Formats

6.4.2 Field notes: selecting data format for Customer 360 project

6.4.3 Data ownership

6.4.4 Organizing data and communicating changes

6.4.5 Field notes: Designing events for Customer 360 project

6.5 Schema Registry

6.5.1 Schema Registry in Kafka ecosystem

6.5.2 Registering schemas

6.5.3 A Concept of Subject

6.5.4 Compatibility Rules

6.5.5 Alternatives to Schema Registry

6.5.6 Handling data contracts without the centralized server

6.5.7 Commercial extensions for data contracts

6.6 Common problems in handling data contracts

6.6.1 Absence of server-side validation

6.6.2 Handling incompatible changes for non-compacted topics

6.6.3 Migrating state

6.6.4 Automatic registration of schemas

6.7 Online Resources

6.8 Summary

7 Kafka Interaction Patterns

7.1 Field notes: When Kafka helps—and when it hurts

7.1 Using Kafka in Microservices

7.1.1 Smart endpoints and dumb pipes

7.1.2 Request-response pattern

7.1.3 CQRS pattern

7.1.4 Event sourcing with snapshotting

7.1.5 Having “hot” and “cold” data

7.2 Field notes: Implementing data mesh with Kafka

7.2.1 Motivations for data mesh

7.2.2 Domain ownership

7.2.3 Data as a product

7.2.4 Federated governance

7.2.5 Self-Serve platform

7.3 Using Kafka Connect

7.3.1 Kafka Connect at a glance

7.3.2 Internal Kafka Connect architecture

7.3.3 Converters

7.3.4 Single message transformations

7.3.5 Source connectors

7.3.6 Sink connectors

7.3.7 Changes in incoming data structure

7.3.8 Integrating Kafka and databases

7.3.9 Field notes: Creating connector for Customer 360

7.3.10 Common Kafka Connect problems

7.4 Ensuring delivery guarantee

7.4.1 Producer idempotence

7.4.2 Understanding Kafka transactions

7.4.3 Transactional outbox pattern

7.5 Online resources

7.6 Summary

8 Designing Streaming Applications

8.1 Field notes: Transforming data in motion

8.2 Introducing Kafka Streams

8.2.1 ETL, ELT and Stream Processing

8.2.2 Introduction to the Kafka Streams framework

8.2.3 Benefits of using Kafka Streams

8.3 Field notes: Sketching out Customer360 with Kafka Streams

8.4 Processing data

8.4.1 Stateless Operations

8.4.2 Stateful operations

8.4.3 Processing API

8.4.4 Kafka Streams internal architecture

8.4.5 Windowing operations

8.4.6 Joining streams

8.4.7 Field notes: Implementing CustomerJoinService

8.4.8 Interactive Queries

8.5 Alternative Solutions

8.5.1 Confluent ksqlDB

8.5.2 Apache Flink

8.5.3 Solutions from Cloud Providers

8.6 Common streaming application issues

8.6.1 Memory and disk capacity planning

8.6.2 Incorrect topic partitioning

8.6.3 Out-of-order data

8.6.4 Late arriving data

8.6.5 State Store initialization

8.6.6 Monitoring and debugging challenges

8.7 Online resources

8.8 Summary

9 Managing Kafka within the Enterprise

9.1 Field notes: From prototype to deployment

9.2 Managing metadata

9.2.1 Introducing KRaft controllers

9.2.2 Example of cluster configuration

9.2.3 Failover scenarios

9.2.4 Using Zookeeper

9.3 Choosing a deployment solution

9.3.1 Choosing between on-premises and cloud Kafka deployment

9.3.2 Hybrid Approach

9.3.3 Choosing the Right Deployment for a Customer 360 Project

9.4 Protecting Kafka

9.4.1 Kafka security overview

9.4.2 Encrypting using TLS

9.4.3 Authentication

9.4.4 Authorization

9.4.5 Protecting data at rest

9.4.6 Enabling security in the Customer360 project

9.5 Online resources

9.6 Summary

10 Organizing a Kafka Project

10.1 Defining Kafka Project Requirements

10.1.1 Field notes: Use-Case Intake and Requirements

10.1.2 Identifying event-driven workflows

10.1.3 Turning business workflows into events

10.1.4 Gathering functional requirements for Kafka topics

10.1.5 Identifying non-functional requirements

10.2 Maintaining Cluster Structure

10.2.1 Using tools

10.2.2 Using GitOps for Kafka configurations

10.2.3 Using the Kafka Admin API

10.2.4 Setting up environments

10.2.5 Field notes: Choosing a solution for Customer360 project

10.3 Testing Kafka applications

10.3.1 Unit testing

10.3.2 Integration testing

10.3.3 Performance tests

10.4 Online Resources

10.5 Summary

11 Operating Kafka

11.1 Cluster evolution and upgrades

11.1.1 Adding brokers and distributing the load

11.1.2 Removing a broker from the cluster

11.1.3 Upgrading clients

11.1.4 Data mobility

11.2 Monitoring Kafka cluster

11.2.1 Types of metrics in monitoring

11.2.2 Kafka monitoring objects

11.2.3 Ownership of Monitoring Responsibilities

11.2.4 Monitoring Stacks and Tools

11.3 Performance Tuning Clinic

11.3.1 Balancing throughput and latency

11.3.2 Balancing data safety and up-time

11.4 Disaster Recovery & failover

11.4.1 RTO/RPO Engineering

11.5 Online Resources

11.6 Summary

12 What’s next for Kafka

12.1 Kafka’s origins: A path to event backbone

12.2 Kafka as an orchestration platform

12.3 Integration with new runtimes

12.3.1 Kafka with WebAssembly

12.3.2 Serverless Kafka

12.3.3 Kafka at the edge

12.4 Diskless Kafka: decoupling storage from brokers

12.5 Kafka in AI/ML world

12.5.1 Incremental learning

12.5.2 Feature engineering in motion

12.5.3 Kafka and AI agents

12.6 Summary

Overview

1 Getting to know Kafka as an Architect

Modern architects are moving from brittle point-to-point integrations toward event-driven systems that decouple producers from consumers and turn real-time data into immediate action. This chapter positions Apache Kafka as the backbone for that shift: a durable, high-throughput event streaming platform that fans out events to many consumers with low latency, enabling use cases such as fraud detection, personalized experiences, and operational automation. Rather than focusing on code, the chapter frames the architectural choices—fit, event design, patterns, and governance—that determine sustainable Kafka adoption.

From an architect’s perspective, the move from synchronous request-response (e.g., REST) to event-driven architecture (EDA) trades tight coupling for autonomy and resilience. Services publish events about state changes; interested consumers react asynchronously, tolerate downtime, and scale independently. This freedom introduces new responsibilities: handling eventual consistency, idempotency, and ordering, while balancing latency and operational overhead against business needs.

Kafka’s core builds reliability through a distributed cluster of brokers and a log-centric storage model. Producers write messages that are durably persisted and replicated; consumers pull and can replay data, enabling recovery and reprocessing. Delivery is governed by acknowledgments and retention policies. Cluster metadata and coordination are managed by controllers (KRaft), ensuring high availability through leader election and fault detection. The commit log underpins immutability and ordering, turning streams of changes into a source of truth that can be rewound.

Treating data as a contract is central. Schema Registry externalizes message structure and versioning, enforcing compatibility while keeping brokers fast and schema-agnostic. Kafka Connect moves data between Kafka and external systems via configurable connectors, reducing custom code for ingestion and delivery. For transformation and routing, streaming frameworks (such as Kafka Streams or Flink) implement stateless and stateful operations, joins, and exactly-once workflows—placing business logic in a dedicated processing layer instead of bloating producers or consumers.

Operationally, the chapter highlights sizing (topics, partitions, replication), monitoring, security, and governance as first-order design concerns. Teams must weigh on-premises control against managed-cloud convenience, considering performance tuning, upgrade cadence, tooling limits, and total cost of ownership. Finally, it clarifies where Kafka fits: both as a reliable message backbone and as an event store for patterns like event sourcing, while noting its limits as a general-purpose query engine. The result is a pragmatic lens for deciding when and how to introduce Kafka—and how to guide an enterprise through the organizational and technical changes that follow.

1.10 Summary

FAQ

When should I choose Kafka and event-driven architecture over synchronous request–response APIs?

Kafkas pub-sub model excels when multiple services must react to the same change, when low-latency fan-out is needed, and when data volumes are high. It reduces brittle, chained dependencies and enables autonomous evolution of services. Choose it for real-time use cases like fraud detection, personalization, operational alerts, and when asynchronous resilience matters more than tightly coordinated request flows.

What new challenges does an event-driven approach introduce, and how are they addressed?

EDA trades tight coupling for eventual consistency, idempotency, and out-of-order delivery concerns. Architects handle these with local copies of data, careful keying and partitioning for ordering where needed, idempotent consumers, and replay to recover from failures. Accepting temporary divergence and designing for convergence is key.

What are the core components of Kafka and how do they interact?

Producers publish messages to topics hosted by brokers; brokers persist and replicate messages for durability; consumers pull (poll) messages from brokers. Applications can be both producers and consumers. Messages are written to disk, and tiered storage can offload older data to cheaper storage while keeping recent data local.

How does Kafka provide reliable delivery and fault tolerance?

Producers receive acknowledgments from brokers and retry on failure; brokers persist messages to disk and replicate them across the cluster so another broker can take over on failure. Consumers track progress and can resume after interruptions, and replay allows reprocessing within the configured retention period.

What is Kafka’s commit log and why does it matter architecturally?

Kafka appends messages to an ordered, immutable log. This preserves arrival order, supports durable history, and enables replay to rebuild state or recover from errors. Corrections are made by emitting new events, not by mutating or deleting existing ones.

How is cluster metadata managed, and what role does KRaft play?

KRaft controllers manage the metadata log, broker registrations, partition assignments, and broker health via heartbeats. One controller is active while others are hot standbys, ensuring fast failover. Servers can run as brokers, controllers, or both; this control plane replaces the need for external coordination in modern deployments.

Why do I need a Schema Registry if Kafka brokers don’t enforce structure?

Brokers treat messages as opaque bytes for performance, so structure and compatibility must be managed elsewhere. Schema Registry stores versioned schemas; producers register or reference a schema ID embedded in messages, and consumers fetch the schema to deserialize. Compatibility checks and versioning make evolving contracts safer across teams.

What is Kafka Connect, and when should I use it instead of custom code?

Kafka Connect is a configuration-driven framework for moving data between Kafka and external systems using pluggable source/sink connectors. It reduces custom producer/consumer code for common integrations (databases, warehouses, object stores) and runs as its own scalable cluster. Use it to operationalize data pipelines quickly and consistently.

Where should transformation or routing logic live in a Kafka-based architecture?

Options include the producer (emit specialized events), each consumer (filter locally), or a dedicated processing layer. Streaming frameworks like Kafka Streams or Apache Flink implement the processing layer, enabling content-based routing, filtering, joins, aggregations, and stateful logic with low latency and exactly-once semantics.

Can Kafka replace a database for storing state?

Kafka can retain events indefinitely and serve as the system of record for event-sourcing, letting services rebuild state by replaying change logs. However, it’s not optimized for ad hoc queries or complex filtering, so most architectures pair Kafka with databases or projections tailored to query needs. Think of Kafka as the durable event backbone, not a general-purpose query store.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $36.39

you save $19.60 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $36.39

you save $19.60 (35%)

eBook

pdf, ePub, online

$55.99 $36.39

you save $19.60 (35%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more