vllm.distributed.kv_transfer.kv_connector.v1.hf3fs.utils.gather_scatter_helper ¶
CopyBufferAllocator ¶
Memory pool for tensor buffers to avoid frequent allocation/deallocation.
Source code in vllm/distributed/kv_transfer/kv_connector/v1/hf3fs/utils/gather_scatter_helper.py
gather_kv_caches ¶
gather_kv_caches(
kv_caches_ptrs: Tensor,
total_token_in_kvcache: int,
dst_tensor: Tensor,
token_indices: list[int],
is_mla: bool = False,
) -> None
Gather KV cache data from KV cache storage to destination tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kv_caches_ptrs | Tensor | Tensor of KV cache pointers (one per layer) | required |
total_token_in_kvcache | int | Total number of tokens in KV cache | required |
dst_tensor | Tensor | Destination tensor to store gathered data - MHA format: [num_layers, 2, num_tokens_in_block, hidden_size] - MLA format: [num_layers, num_tokens_in_block, hidden_size] | required |
token_indices | list[int] | List of token positions to gather | required |
is_mla | bool | Whether using MLA model format | False |
Source code in vllm/distributed/kv_transfer/kv_connector/v1/hf3fs/utils/gather_scatter_helper.py
scatter_kv_caches ¶
scatter_kv_caches(
kv_caches_ptrs: Tensor,
total_token_in_kvcache: int,
src_tensor: Tensor,
token_indices: list[int],
is_mla: bool = False,
) -> None
Scatter KV cache data from source tensor to KV cache storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kv_caches_ptrs | Tensor | Tensor of KV cache pointers (one per layer) | required |
total_token_in_kvcache | int | Total number of tokens in KV cache | required |
src_tensor | Tensor | Source tensor containing data to scatter - MHA format: [num_layers, 2, num_tokens_in_block, hidden_size] - MLA format: [num_layers, num_tokens_in_block, hidden_size] | required |
token_indices | list[int] | List of token positions to update | required |
is_mla | bool | Whether using MLA model format | False |