Next-Level Inference: Why Your Single-Node vLLM Setup Needs Prefill-Decode Disaggregation
·22 min read
TL;DR: Prefill and decode fight over the same GPUs, causing ITL spikes under load. We show how to disaggregate them on a single 8-GPU MI300X node using AMD's MORI-IO connector — achieving 2.5x...