Manning Early Access Program (MEAP)
Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Today's AI models demand a lot of memory, compute, and server horsepower--which quickly translates into cost. Quantization and Fast Inference show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment--all with minimal accuracy loss.
From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully-compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4.
what's inside
Applying post-training quantization to production models
Deploying efficiently on CPUs, edge devices, and mobile
Framework-agnostic techniques and real cross-framework parity testing
Flowcharts and checklists for efficient decision making
about the reader
For ML engineers and researchers experienced in Python.
about the author
Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.
Introductory offer Save 50% for a limited time!
eBook
pdf, ePub, online
$47.99
$23.99
you save $24.00 (50%)
Introductory offer Save 50% for a limited time!
print
includes eBook
$59.99
$29.99
you save $30.00 (50%)
with subscription
free or 50% off
$24.99
pro $24.99 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!