vllm.v1.attention.backends.mla.prefill ¶
Modules:
| Name | Description |
|---|---|
base | Abstract base class for MLA prefill backends. |
flash_attn | FlashAttention backend for MLA prefill. |
flashinfer | FlashInfer backend for MLA prefill. |
registry | Registry for MLA prefill backends. |
selector | Selector for MLA prefill backends. |
trtllm_ragged | TRT-LLM Ragged backend for MLA prefill. |
MLAPrefillBackend ¶
Bases: ABC
Abstract base class for MLA prefill backends.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
prepare_metadata ¶
prepare_metadata(
prefill_metadata: MLACommonPrefillMetadata,
) -> None
Prepare backend-specific metadata before the forward pass.
Called by the metadata builder after constructing the prefill metadata.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
MLAPrefillBackendEnum ¶
Bases: Enum
Enumeration of all supported MLA prefill backends.
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_class ¶
get_class() -> type[MLAPrefillBackend]
get_mla_prefill_backend ¶
get_mla_prefill_backend(
vllm_config: VllmConfig,
) -> type[MLAPrefillBackend]
Select the MLA prefill backend based on configuration and device.
This function first checks for explicit user preferences via mla_prefill_backend in AttentionConfig, then falls back to automatic priority-based selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vllm_config | VllmConfig | The vLLM configuration. | required |
Returns:
| Type | Description |
|---|---|
type[MLAPrefillBackend] | The selected prefill backend class. |