Skip to content

vllm.attention.backends

Modules:

Name Description
abstract
blocksparse_attn
cpu_mla
dual_chunk_flash_attn

Attention layer with Dual chunk flash attention and sparse attention.

flash_attn

Attention layer with FlashAttention.

flashinfer
flashmla
hpu_attn
ipex_attn

Attention layer with torch scaled_dot_product_attention

mla
pallas
placeholder_attn
rocm_aiter_mla
rocm_flash_attn

Attention layer ROCm GPUs.

torch_sdpa

Attention layer with torch scaled_dot_product_attention

triton_mla
utils

Attention backend utils

xformers

Attention layer with xFormers and PagedAttention.