vllm.attention.backends
Modules:
Name | Description |
---|---|
abstract |
|
blocksparse_attn |
|
cpu_mla |
|
dual_chunk_flash_attn |
Attention layer with Dual chunk flash attention and sparse attention. |
flash_attn |
Attention layer with FlashAttention. |
flashinfer |
|
flashmla |
|
hpu_attn |
|
ipex_attn |
Attention layer with torch scaled_dot_product_attention |
mla |
|
pallas |
|
placeholder_attn |
|
rocm_aiter_mla |
|
rocm_flash_attn |
Attention layer ROCm GPUs. |
torch_sdpa |
Attention layer with torch scaled_dot_product_attention |
triton_mla |
|
utils |
Attention backend utils |
xformers |
Attention layer with xFormers and PagedAttention. |