Skip to content

NVIDIA Dynamo¶

NVIDIA Dynamo is an open-source framework for distributed LLM inference that can run vLLM on Kubernetes with flexible serving architectures (e.g. aggregated/disaggregated, optional router/planner).

For Kubernetes deployment instructions and examples (including vLLM), see the Deploying Dynamo on Kubernetes guide.

Background reading: InfoQ news coverage — NVIDIA Dynamo simplifies Kubernetes deployment for LLM inference.