Overview

1 Getting to know Kafka as an architect

Kafka for Architects opens by positioning Kafka as a foundation for modern, event-driven systems, explaining why architects must think beyond running clusters to designing for decoupling, resilience, and long-term sustainability. The chapter introduces the principles of event-driven architecture, the publish-subscribe model, and Kafka’s log-centric design, and it frames where Kafka excels—high-throughput pipelines, real-time analytics, and enterprise integration. It also outlines the broader ecosystem and the kinds of strategic tradeoffs architects will face when deciding how and when to apply Kafka.

The text contrasts traditional synchronous, request-response integrations with event-driven patterns that favor autonomy, low latency, and fan-out, while acknowledging new responsibilities around eventual consistency, ordering, and idempotency. It summarizes Kafka’s core building blocks—producers, brokers, and consumers; durable storage with replication; acknowledgments and replay; and controllers managing cluster metadata—to show how reliability and scalability are achieved. These concepts are tied to real-world needs such as fraud detection, recommendations, and telemetry, where massive volumes and near-real-time processing make Kafka a compelling choice.

Design and operations receive equal emphasis: schemas act as data contracts managed externally via Schema Registry; Kafka Connect moves data between systems through configuration rather than custom code; and stream processing frameworks (e.g., Kafka Streams, Flink) transform, enrich, and route events in motion. The chapter highlights operational concerns—sizing, monitoring, security, disaster recovery, and the on-premises versus managed-service decision—then discusses two primary usage modes: durable event delivery and long-retention logs that enable event sourcing and state reconstruction, while noting why Kafka does not replace databases. It closes by distilling what makes Kafka different: its immutable, replicated commit log, ecosystem breadth, and suitability as a scalable, reliable backbone for data-centric architectures.

Request-response design pattern
The EDA style of communication: systems communicate by publishing events that describe changes, allowing others to react asynchronously.
The key components in the Kafka ecosystem are producers, brokers, and consumers.
Structure of a Kafka cluster: brokers handle client traffic; KRaft controllers manage metadata and coordination
Publish-subscribe example: CustomerService publishes a “customer updated” event to a channel; all subscribers receive it independently.
Acknowledgments: Once the cluster accepts a message, it sends an acknowledgement to the service. If no acknowledgment arrives within the timeout, the service treats the send as failed and retries.
Working with Schema Registry: Schemas are managed by a separate Schema Registry cluster; messages carry only a schema ID, which clients use to fetch (and cache) the writer schema.
The Kafka Connect architecture: connectors integrate Kafka with external systems, moving data in and out.
An example of a streaming application. RoutingService implements content-based routing, consuming messages from Addresses and, based on their contents (e.g., address type), publishing them to ShippingAddresses or BillingAddresses.

Summary

  • There are two primary communication patterns between services: request-response and event-driven architecture.
  • In the event-driven approach, services communicate by triggering events.
  • The key components of the Kafka ecosystem include brokers, producers, consumers, Schema Registry, Kafka Connect, and streaming applications.
  • Cluster metadata management is handled by KRaft controllers.
  • Kafka is versatile and well-suited for various industries and use cases, including real-time data processing, log aggregation, and microservices communication.
  • Kafka components can be deployed both on-premises and in the cloud.
  • The platform supports two main use cases: message delivery and state storage.

FAQ

What business problems does Kafka address compared to traditional request-response (REST) integrations?Kafka decouples producers and consumers, enabling low-latency, asynchronous fan-out without brittle point-to-point dependencies. This reduces coordination complexity and the risk of cascading failures common in chained synchronous calls. Services can evolve and scale independently, improving flexibility, resilience, and autonomy.
How does event-driven architecture (EDA) work with Kafka?In EDA, producers publish events to a channel (topic), and any number of consumers independently subscribe and react. Communication is asynchronous and reliable, so senders and receivers can be offline at different times. This model supports local copies of data, replay, and independent scaling—while introducing tradeoffs such as latency, idempotency, out-of-order handling, and eventual consistency.
What are producers, brokers, and consumers in Kafka, and how do they interact?Producers send messages to a Kafka cluster of brokers; brokers persist and replicate messages; consumers pull messages from brokers by subscribing to topics. Kafka uses a pull model for consumption and stores messages durably (with options for tiered storage). A single application can act as both producer and consumer.
What are KRaft controllers and what role do they play in a Kafka cluster?KRaft controllers manage cluster metadata and coordination. One controller is active while others are hot standbys. Brokers send heartbeats to the active controller; if a broker fails or the controller becomes unavailable, a new active controller is elected. Controllers maintain the metadata log for changes like topic creation and partition assignments.
How does Kafka ensure reliable delivery and fault tolerance?Producers receive acknowledgments from brokers and retry on timeouts. Messages are persisted and replicated across brokers to survive node failures. Consumers track progress and can resume after outages; they can also replay retained data to reprocess messages as needed within configured retention periods.
What is Kafka’s commit log and why is it important?The commit log is an append-only, ordered sequence of messages stored by brokers. It preserves arrival order and provides immutability—messages aren’t edited or deleted individually. Consumers can replay the log to reconstruct state, making it a strong fit for event sourcing and auditability.
If brokers don’t interpret messages, how are data contracts enforced?Schema Registry stores immutable, versioned schemas and assigns each an ID. Producers embed the schema ID in messages; consumers fetch (and cache) schemas to deserialize correctly. Compatibility checks help evolve schemas without breaking consumers, keeping producers and consumers aligned as data changes.
What is Kafka Connect and when should I use it instead of custom code?Kafka Connect is a framework for configuring data movement between Kafka and external systems (databases, warehouses, object stores) via pluggable source and sink connectors. It runs as a separate cluster and lets teams build pipelines through configuration, avoiding custom producer/consumer code—for example, streaming DB changes into Kafka with a JDBC source and writing them to a target DB with a sink.
Where should event transformations happen, and what tools help?Transformation can occur in the producer, in each consumer, or in a dedicated processing layer. For complex or stateful logic (e.g., filtering, joins, aggregations, windowing, content-based routing), use stream processing frameworks like Kafka Streams or Apache Flink. These integrate with Kafka and support exactly-once semantics.
What operational and deployment decisions should architects plan for?Key considerations include on-premises versus managed cloud deployments, monitoring, scaling, tuning, security (encryption, authentication, authorization), disaster recovery, and cost estimation (infrastructure, development, operations, and potential licensing). Managed services reduce administration but may limit version control, deep tuning, and tool choices. Alternatives like Pulsar, Kinesis, Pub/Sub, and Event Hubs also exist, and hybrid approaches are possible.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Kafka for Architects ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Kafka for Architects ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Kafka for Architects ebook for free