vllm.attention.selector
_cached_get_attn_backend
cached
¶
_cached_get_attn_backend(
head_size: int,
dtype: dtype,
kv_cache_dtype: Optional[str],
block_size: int,
is_attention_free: bool,
is_blocksparse: bool = False,
use_v1: bool = False,
use_mla: bool = False,
) -> Type[AttentionBackend]
Source code in vllm/attention/selector.py
backend_name_to_enum
¶
Convert a string backend name to a _Backend enum value.
Returns: * _Backend: enum value if backend_name is a valid in-tree type * None: otherwise it's an invalid in-tree type or an out-of-tree platform is loaded.
Source code in vllm/attention/selector.py
get_attn_backend
¶
get_attn_backend(
head_size: int,
dtype: dtype,
kv_cache_dtype: Optional[str],
block_size: int,
is_attention_free: bool,
is_blocksparse: bool = False,
use_mla: bool = False,
) -> Type[AttentionBackend]
Selects which attention backend to use and lazily imports it.
Source code in vllm/attention/selector.py
get_env_variable_attn_backend
¶
Get the backend override specified by the vLLM attention backend environment variable, if one is specified.
Returns:
- _Backend enum value if an override is specified
- None otherwise
Source code in vllm/attention/selector.py
get_global_forced_attn_backend
¶
Get the currently-forced choice of attention backend, or None if auto-selection is currently enabled.
global_force_attn_backend
¶
Force all attention operations to use a specified backend.
Passing None
for the argument re-enables automatic
backend selection.,
Arguments:
- attn_backend: backend selection (None to revert to auto)
Source code in vllm/attention/selector.py
global_force_attn_backend_context_manager
¶
Globally force a vLLM attention backend override within a context manager, reverting the global attention backend override to its prior state upon exiting the context manager.
Arguments:
- attn_backend: attention backend to force
Returns:
- Generator