3 篇文档带有标签「prefill」

Prefill 与 Decode

推理两阶段的计算特征差异（compute-bound vs memory-bound）及 PD 分离动机

核心要点：

本章节范围：把训完 + 对齐完的 LLM 用起来生成 token，这一阶段的工程关键：prefill / decode 二段特征 → KV cache 管理 → sampling 算法 → 量化加速。是模型从"实验室能跑" 到"生产能扛 100K QPS" 的工程主线。