vllm.attention.ops.hpu_paged_attn
HPUPagedAttention
¶
Source code in vllm/attention/ops/hpu_paged_attn.py
copy_blocks
staticmethod
¶
Source code in vllm/attention/ops/hpu_paged_attn.py
get_kv_cache_shape
staticmethod
¶
get_supported_head_sizes
staticmethod
¶
split_kv_cache
staticmethod
¶
swap_blocks
staticmethod
¶
swap_blocks(
src_kv_cache: Tuple[Tensor, Tensor],
dst_kv_cache: Tuple[Tensor, Tensor],
src_to_dsts: Tensor,
) -> None
Source code in vllm/attention/ops/hpu_paged_attn.py
write_to_paged_cache
staticmethod
¶
write_to_paged_cache(
key: Tensor,
value: Tensor,
key_cache: Tensor,
value_cache: Tensor,
slot_mapping: Tensor,
kv_cache_dtype: str,
is_prompt: bool,
) -> None
Source code in vllm/attention/ops/hpu_paged_attn.py
HPUPagedAttentionMetadata
dataclass
¶
Metadata for PagedAttention.