vllm.model_executor.models.utils
WeightsMapping
module-attribute
¶
If a key maps to a value of None
, the corresponding weight is ignored.
_model_to_pp_missing_layer_names
module-attribute
¶
AutoWeightsLoader
¶
Helper class to load weights into a torch.nn.Module
. It is able
to automatically detect child modules and parameters while iterating over
the weights only once.
The weight loading logic for individual modules can be overridden
by defining a load_weights
method.
Similarly, the weight loading logic for individual parameters can be
overridden by defining a weight_loader
method.
Detailed weight loading information can be viewed by setting the
environment variable VLLM_LOGGING_LEVEL=DEBUG
.
Source code in vllm/model_executor/models/utils.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
ROTARY_EMBEDS_UNUSED_WEIGHTS
class-attribute
instance-attribute
¶
ROTARY_EMBEDS_UNUSED_WEIGHTS = [
"rotary_emb.inv_freq",
"rotary_emb.cos_cached",
"rotary_emb.sin_cached",
]
ignore_unexpected_prefixes
instance-attribute
¶
__init__
¶
__init__(
module: Module,
*,
skip_prefixes: Optional[list[str]] = None,
skip_substrs: Optional[list[str]] = None,
ignore_unexpected_prefixes: Optional[list[str]] = None,
) -> None
Source code in vllm/model_executor/models/utils.py
_add_loadable_non_param_tensors
¶
Add tensor names that are not in the model params that may be in the safetensors, e.g., batch normalization stats.
Source code in vllm/model_executor/models/utils.py
_can_ignore_unexpected
¶
_can_skip
¶
_get_qualname
¶
_groupby_prefix
¶
_groupby_prefix(
weights: Iterable[tuple[str, Tensor]],
) -> Iterable[tuple[str, Iterable[tuple[str, Tensor]]]]
Source code in vllm/model_executor/models/utils.py
_load_module
¶
_load_module(
base_prefix: str,
module: Module,
weights: Iterable[tuple[str, Tensor]],
) -> Iterable[str]
Source code in vllm/model_executor/models/utils.py
_load_param
¶
_load_param(
base_prefix: str,
param: Parameter,
weights: Iterable[tuple[str, Tensor]],
) -> Iterable[str]
Source code in vllm/model_executor/models/utils.py
load_weights
¶
load_weights(
weights: Iterable[tuple[str, Tensor]],
*,
mapper: Optional[WeightsMapper] = None,
) -> set[str]
Source code in vllm/model_executor/models/utils.py
PPMissingLayer
¶
Bases: Identity
A placeholder layer for missing layers in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
WeightsMapper
dataclass
¶
Maps the name of each weight if they match the following patterns.
Source code in vllm/model_executor/models/utils.py
orig_to_new_prefix
class-attribute
instance-attribute
¶
orig_to_new_prefix: WeightsMapping = field(
default_factory=dict
)
orig_to_new_substr
class-attribute
instance-attribute
¶
orig_to_new_substr: WeightsMapping = field(
default_factory=dict
)
orig_to_new_suffix
class-attribute
instance-attribute
¶
orig_to_new_suffix: WeightsMapping = field(
default_factory=dict
)
__init__
¶
__init__(
orig_to_new_substr: WeightsMapping = dict(),
orig_to_new_prefix: WeightsMapping = dict(),
orig_to_new_suffix: WeightsMapping = dict(),
) -> None
_map_name
¶
Source code in vllm/model_executor/models/utils.py
apply
¶
apply_dict
¶
_embedding_count_expression
¶
_embedding_count_expression(
embeddings: NestedTensors,
) -> str
Constructs a debugging representation of the number of embeddings in the NestedTensors.
Source code in vllm/model_executor/models/utils.py
_flatten_embeddings
¶
_flatten_embeddings(embeddings: NestedTensors) -> Tensor
Recursively flattens and concatenates NestedTensors on all but the last dimension.
Source code in vllm/model_executor/models/utils.py
_merge_multimodal_embeddings
¶
_merge_multimodal_embeddings(
inputs_embeds: Tensor,
is_multimodal: Tensor,
multimodal_embeddings: NestedTensors,
) -> Tensor
Merge multimodal_embeddings
into inputs_embeds
by overwriting the
positions in inputs_embeds
corresponding to placeholder tokens in
input_ids
.
Note
This updates inputs_embeds
in place.
Source code in vllm/model_executor/models/utils.py
cast_overflow_tensors
¶
Source code in vllm/model_executor/models/utils.py
embed_multimodal
¶
embed_multimodal(
input_ids: Tensor,
multimodal_token_id: int,
get_text_embeds: Callable[[Tensor], Tensor],
multimodal_embeds: NestedTensors,
) -> Tensor
Embed token IDs and multimodal inputs and combine their embeddings.
multimodal_token_id
is used to determine whether a token ID should
be embedded using get_text_embeds
or get_multimodal_embeds
.
Compared to merge_multimodal_embeddings`, this avoids running
get_text_embedson
input_ids[input_ids == multimodal_token_id]``
which causes issues when the placeholder token ID exceeds the
vocabulary size of the language model.
Source code in vllm/model_executor/models/utils.py
extract_layer_index
¶
Extract the layer index from the module name. Examples: - "encoder.layers.0" -> 0 - "encoder.layers.1.self_attn" -> 1 - "2.self_attn" -> 2 - "model.encoder.layers.0.sub.1" -> ValueError
Source code in vllm/model_executor/models/utils.py
fast_topk
¶
Source code in vllm/model_executor/models/utils.py
flatten_bn
¶
flatten_bn(
x: Union[list[Tensor], Tensor], *, concat: bool = False
) -> Union[list[Tensor], Tensor]
Flatten the B
and N
dimensions of batched multimodal inputs.
The input tensor should have shape ``(B, N, ...)```.
Source code in vllm/model_executor/models/utils.py
get_pp_missing_layer_names
¶
Get the names of the missing layers in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
init_vllm_registered_model
¶
init_vllm_registered_model(
vllm_config: VllmConfig,
*,
prefix: str = "",
hf_config: Optional[PretrainedConfig] = None,
architectures: Optional[list[str]] = None,
) -> Module
Helper function to initialize an inner model registered to vLLM, based on the arguments passed to the outer vLLM model.
Source code in vllm/model_executor/models/utils.py
is_pp_missing_parameter
¶
Check if a parameter is missing in a pipeline parallel model.
Source code in vllm/model_executor/models/utils.py
make_empty_intermediate_tensors_factory
¶
Source code in vllm/model_executor/models/utils.py
make_layers
¶
make_layers(
num_hidden_layers: int, layer_fn: LayerFn, prefix: str
) -> tuple[int, int, ModuleList]
Make a list of layers with the given layer function, taking pipeline parallelism into account.
Source code in vllm/model_executor/models/utils.py
maybe_offload_to_cpu
¶
Source code in vllm/model_executor/models/utils.py
maybe_prefix
¶
Add a prefix to a name if the prefix is non-empty.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The prefix to add. If empty, no prefix will be added. |
required |
name
|
str
|
The name to potentially prefix. |
required |
Returns:
Type | Description |
---|---|
str
|
The string "prefix.name" if prefix was non-empty, otherwise just "name". |
Source code in vllm/model_executor/models/utils.py
merge_multimodal_embeddings
¶
merge_multimodal_embeddings(
input_ids: Tensor,
inputs_embeds: Tensor,
multimodal_embeddings: NestedTensors,
placeholder_token_id: Union[int, list[int]],
) -> Tensor
Merge multimodal_embeddings
into inputs_embeds
by overwriting the
positions in inputs_embeds
corresponding to placeholder tokens in
input_ids
.
placeholder_token_id
can be a list of token ids (e.g, token ids
of img_start, img_break, and img_end tokens) when needed: This means
the order of these tokens in the input_ids
MUST MATCH the order of
their embeddings in multimodal_embeddings
since we need to
slice-merge instead of individually scattering.
For example, if input_ids is "TTTTTSIIIBIIIBIIIETTT", where - T is text token - S is image start token - I is image embedding token - B is image break token - E is image end token.
Then the image embeddings (that correspond to I's) from vision encoder must be padded with embeddings of S, B, and E in the same order of input_ids for a correct embedding merge.
Note
This updates inputs_embeds
in place.
Source code in vllm/model_executor/models/utils.py
merge_multimodal_embeddings_from_map
¶
merge_multimodal_embeddings_from_map(
inputs_embeds: Tensor,
multimodal_embeddings: NestedTensors,
placeholder_map: IndexMap,
) -> Tensor
Merge multimodal_embeddings
into inputs_embeds
using the provided
placeholder map .
Note
This updates inputs_embeds
in place.