Quantization and Fast Inference: A practitioner's guide to efficient AI
Paperback
$59.99
Premium Members save an extra 10% and all Members collect stamps to save with Rewards. 10 stamps = $5.Learn More
Get the eBook free when you register your print book at Manning.
Today's AI models demand a lot of memory, compute, and server horsepower—which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment—all with minimal accuracy loss.
From quantiz...
Today's AI models demand a lot of memory, compute, and server horsepower—which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment—all with minimal accuracy loss.
From quantiz...






















