Overview

1 Introduction

This chapter introduces the book’s mission: to equip practitioners to build low-latency applications with clear definitions, practical techniques, and end-to-end mental models. It defines latency as the time between a cause and its observed effect, a lens that applies from user interactions to kernel-level packet handling. The text motivates why latency matters: slowdowns degrade user experience, compound across layers, and can determine business outcomes. It sets the stage by contrasting latency with related concepts and by positioning the work as a practical, comprehensive guide that balances theory with hands-on diagnosis and remediation.

Latency is measured in time units and manifests across the stack, from nanoseconds in CPU caches and DRAM to microseconds for SSD/NVMe access and milliseconds across networks (e.g., intercontinental round trips). Physical limits, notably the speed of light, constrain how low some latencies can go. Real examples illustrate compounding delays: a web page load spans DNS, network transit, server work, dependent services, and client rendering; even a light switch reveals variability and user-perceived lag. Human perception anchors targets: roughly 100 ms feels instantaneous, ~1 s can still feel responsive with feedback, and operations beyond ~10 s need progress cues or streaming to keep users engaged.

The chapter distinguishes latency from bandwidth and throughput: bandwidth is capacity, throughput is achieved delivery rate, while latency is the delay per request—and improving one can trade off another. It highlights the principle that bandwidth can often be added, but high latency is harder to mask, and it illustrates latency–throughput trade-offs via pipelining: higher overall throughput can increase per-item latency. Beyond user experience and real-time requirements (hard vs. soft), efficiency drives latency work as free hardware speedups have waned with the end of Dennard scaling and the shift to multicore; software efficiency matters more than ever. Finally, it addresses latency–energy trade-offs: techniques like busy polling can minimize delay but raise power draw, while sleep/wake strategies save energy at the cost of responsiveness—the optimal choice depends on workload patterns and goals.

60 ms Length of a nanosecond. Source: https://americanhistory.si.edu/collections/search/object/nmah_692464
Processing without pipelining. We first perform step W (washing) fully and only then perform step D (drying). As the time to complete W is 30 minutes and the time to complete D is 60 minutes, each step takes 90 minutes in total. Therefore, we say that the latency to wash and dry clothes is 90 minutes and the throughput is 1/90 loads of laundry washed per minute.
Processing with pipelining. We perform step W (washing) in full, but as soon as it completes, we start another step W. In parallel, we perform step D (drying) for the previous step W. If we ignore the initial step where there is no completed step W, the time to complete a load of laundry is 120 minutes because W and D run in parallel, but we’re bottlenecked by D, making latency worse than without pipelining. However, due to pipelining, we have now increased throughput to 1/60 loads of laundry per minute, which means that we can complete four loads of laundry in the same time as non-pipelined does three.

Summary

  • Latency is the time delay between a cause and its observed effect.
  • Latency is measured in units of time.
  • You need to understand the latency constants of your system when designing for low latency.
  • Latency matters because people expect a real-time experience.
  • When optimizing for latency, there are sometimes throughput and energy efficiency trade-offs.

FAQ

What is latency?Latency is the time delay between a cause and its observed effect. Practically, it is the elapsed time from when an action is initiated to when its outcome becomes observable. The exact “cause” and “effect” depend on context, which is why measuring latency varies by scenario.
What contributes to end-to-end latency in a web request?From pressing Enter in the browser to seeing the page, latency accumulates across many steps: DNS lookup, TCP/TLS setup, client–server network transit, server processing (including calls to databases or external services), response transfer, and browser rendering (including executing JavaScript and issuing additional requests). Each step adds to the total user-perceived delay.
How is latency measured and what are typical scales?Latency is measured in units of time. Typical scales span nanoseconds (CPU caches and DRAM), microseconds (NVMe/SSD access), and milliseconds (network round trips). For example, DRAM access is ~100 ns, NVMe reads ~10 μs, and a New York–London round trip ~60 ms.
What physical limits constrain how low latency can go?The speed of light sets a hard lower bound on how fast information can travel. Real systems are slower than the theoretical limit (for example, light in fiber travels slower than in a vacuum). This is why distance and co-location matter for tight latency targets.
Why does latency matter for user experience?Human perception provides useful thresholds: responses under ~100 ms feel instantaneous; around 1 s still feels fast but noticeable; beyond ~10 s feels slow and often requires feedback like progress indicators. Even small degradations can reduce engagement and conversions, so lower latency can translate into business gains.
What is the difference between latency, bandwidth, and throughput?Latency is the time it takes for a request to go from start to observable finish. Bandwidth is the maximum data capacity of a channel per unit time. Throughput is the actual achieved data or request rate. Bandwidth sets an upper bound on throughput, but both are distinct from latency.
Why can you add bandwidth but still be stuck with latency?You can increase bandwidth by adding more links or capacity, which can raise throughput. High latency, however, is often inherent to the path, distance, or protocol constraints; it cannot be “scaled out” the same way and must be addressed at its sources.
What is the trade-off between latency and throughput (pipelining example)?Pipelining can increase throughput by overlapping stages, but it may increase the time to complete a single item. In the laundry analogy, washing and drying in parallel raises loads-per-hour (throughput) but can make the time for one load longer (latency). Systems often must choose which to optimize.
How do hard and soft real-time requirements differ?Hard real-time systems must meet strict deadlines; missing one is a system failure (for example, pacemakers or safety-critical sensors). Soft real-time systems tolerate occasional misses with quality degradation (for example, audio/video streaming), not catastrophic failure.
How does optimizing for low latency interact with energy efficiency?Latency and energy goals can conflict. Techniques like busy polling reduce scheduling delays and improve latency but may waste power when idle. Sleep–wake strategies save energy but add wake-up latency. For frequent, predictable workloads, busy polling can be both faster and more energy-efficient overall; for sporadic traffic, sleeping often saves energy at the cost of higher latency.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Latency ebook for free
choose your plan

team

monthly
annual
$49.99
$399.99
only $33.33 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Latency ebook for free