Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.

4 of 12 chapters available

Resources

chapter briefs Source code Book forum Source code on Github more

Become a
Reviewer

Help us create great books

Quantization and Fast Inference

you own this product

A practitioner’s guide to efficient AI

Vivek Kalyanarangan

MEAP began May 2026
Last updated June 2026
Publication in Early 2027 (estimated)

ISBN 9781633433915
350 pages (estimated)

Included with a Manning Online subscription

printed in black & white

catalog / Data Science / Deep Learning

resources: Source code Book forum Source code on Github

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $35.99

you save $12.00 (25%)

Look inside

Today's AI models demand a lot of memory, compute, and server horsepower—which quickly translates into cost. Quantization and Fast Inference shows you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment—all with minimal accuracy loss.

From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4.

what's inside

Applying post-training quantization to production models
Deploying efficiently on CPUs, edge devices, and mobile
Framework-agnostic techniques and real cross-framework parity testing
Flowcharts and checklists for efficient decision making

about the reader

For ML engineers and researchers experienced in Python.

about the author

Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $35.99

you save $12.00 (25%)

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime
renews annually, pause or cancel renewal anytime
Quantization and Fast Inference ebook for free

team

monthly

annual

$49.99

$499.99
only $41.67 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime
renews annually, pause or cancel renewal anytime
Quantization and Fast Inference ebook for free

more seats?

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime
renews annually, pause or cancel renewal anytime
Quantization and Fast Inference ebook for free

team

monthly

annual

$49.99

$499.99
only $41.67 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime
renews annually, pause or cancel renewal anytime
Quantization and Fast Inference ebook for free

more seats?

Quantization and Fast Inference

pro $24.99 per month

lite $19.99 per month

team

pro $24.99 per month

lite $19.99 per month

team

what's inside

about the reader

about the author

pro $24.99 per month

lite $19.99 per month

team

Add to Reading List

related titles

related titles

pro

team

pro

team