vllm.v1.worker.utils
gather_mm_placeholders
¶
Reconstructs the embeddings from the placeholder tokens.
This is the operation of [scatter_mm_placeholders][].
Source code in vllm/v1/worker/utils.py
initialize_kv_cache_for_kv_sharing
¶
initialize_kv_cache_for_kv_sharing(
shared_kv_cache_layers: dict[str, str],
kv_cache_groups: list[KVCacheGroupSpec],
kv_caches: dict[str, Tensor],
) -> None
Sets up KV cache sharing by reusing the allocated KV caches in kv_caches
for layers that do not allocate its own KV cache, based on the mapping in
shared_kv_cache_layers
. Adds these layers to the corresponding KV cache
group, which is needed to ensure that attention metadata is assigned later.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
shared_kv_cache_layers
|
dict[str, str]
|
Layer pairings for cross-layer KV sharing.
If an Attention layer |
required |
kv_cache_groups
|
list[KVCacheGroupSpec]
|
The KV cache groups of the model. |
required |
kv_caches
|
dict[str, Tensor]
|
The allocated kv_caches with layer names as keys. Note that layers in shared_kv_cache_layers.keys() are not originally included as it only contains layers which have its own KV cache allocation. |
required |
Source code in vllm/v1/worker/utils.py
sanity_check_mm_encoder_outputs
¶
sanity_check_mm_encoder_outputs(
mm_embeddings: MultiModalEmbeddings,
expected_num_items: int,
) -> None
Perform sanity checks for the result of
vllm.model_executor.models.SupportsMultiModal.get_multimodal_embeddings
.
Source code in vllm/v1/worker/utils.py
scatter_mm_placeholders
¶
Scatter the multimodal embeddings into a contiguous tensor that represents the placeholder tokens.
vllm.multimodal.processing.PromptUpdateDetails.is_embed
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embeds
|
Tensor
|
The multimodal embeddings.
Shape: |
required |
is_embed
|
Optional[Tensor]
|
A boolean mask indicating which positions in the placeholder
tokens need to be filled with multimodal embeddings.
Shape: |
required |