NVIDIA Dynamo¶
NVIDIA Dynamo is an open-source framework for distributed LLM inference that can run vLLM on Kubernetes with flexible serving architectures (e.g. aggregated/disaggregated, optional router/planner).
For Kubernetes deployment instructions and examples (including vLLM), see the Deploying Dynamo on Kubernetes guide.
Background reading: InfoQ news coverage — NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference.