vllm.v1.attention.backends.fa_utils ¶
flash_attn_supports_mla ¶
Source code in vllm/v1/attention/backends/fa_utils.py
get_flash_attn_version ¶
Source code in vllm/v1/attention/backends/fa_utils.py
is_flash_attn_varlen_func_available ¶
is_flash_attn_varlen_func_available() -> bool
Check if flash_attn_varlen_func is available.
This function determines whether the flash_attn_varlen_func imported at module level is a working implementation or a stub.
Platform-specific sources: - CUDA: vllm.vllm_flash_attn.flash_attn_varlen_func - XPU: ipex_ops.flash_attn_varlen_func - ROCm: upstream flash_attn.flash_attn_varlen_func (if available)
Note: This is separate from the AITER flash attention backend (rocm_aiter_fa.py) which uses rocm_aiter_ops.flash_attn_varlen_func. The condition to use AITER is handled separately via _aiter_ops.is_aiter_found_and_supported().
Returns:
| Name | Type | Description |
|---|---|---|
bool | bool | True if a working flash_attn_varlen_func implementation is available. |