vllm.distributed.kv_transfer.kv_connector_agent
A centralized entrypoint to perform distributed KV cache transfer.
This implementation is a shim wrapper on two APIs exposed by kv_connector
:
1. send_kv_caches_and_hidden_states
2. `recv_kv_caches_and_hidden_states
KVTransferAgent
¶
A class designated for distributed KV transfer
Target use cases
- Disaggregated prefill
- Remote KV cache storage
Source code in vllm/distributed/kv_transfer/kv_connector_agent.py
__init__
¶
__init__(rank: int, local_rank: int, config: VllmConfig)
Source code in vllm/distributed/kv_transfer/kv_connector_agent.py
close
¶
recv_kv_caches_and_hidden_states
¶
recv_kv_caches_and_hidden_states(
model_executable: Module,
model_input: ModelInputForGPUWithSamplingMetadata,
kv_caches: list[Tensor],
) -> tuple[
Union[Tensor, IntermediateTensors],
bool,
ModelInputForGPUWithSamplingMetadata,
]
Source code in vllm/distributed/kv_transfer/kv_connector_agent.py
send_kv_caches_and_hidden_states
¶
send_kv_caches_and_hidden_states(
model_executable: Module,
model_input: ModelInputForGPUWithSamplingMetadata,
kv_caches: list[Tensor],
hidden_or_intermediate_states: Union[
Tensor, IntermediateTensors
],
) -> None