vllm.model_executor.model_loader.utils
Utilities for selecting and loading models.
ParamMapping
dataclass
¶
A class to handle parameter mapping for model weight loading. It creates a bidirectional mapping between packed parameters and their constituent parts.
Source code in vllm/model_executor/model_loader/utils.py
inverse_packed_mapping
class-attribute
instance-attribute
¶
__init__
¶
__init__(
packed_mapping: dict[str, list[str]],
inverse_packed_mapping: dict[
str, tuple[str, int]
] = dict(),
) -> None
__post_init__
¶
Source code in vllm/model_executor/model_loader/utils.py
get_sub_modules
¶
configure_quant_config
¶
configure_quant_config(
quant_config: QuantizationConfig,
model_class: type[Module],
)
Pass packed_modules_mapping by reference to quant_config so that quant_config can properly match fused modules
Note that model attributes are passed by reference to quant_config, enabling them to be updated by model_class.new (ex. chatglm, qwen)
Once the SupportsQuant
mixin has been added to all models, this
function can be removed
Source code in vllm/model_executor/model_loader/utils.py
device_loading_context
¶
Source code in vllm/model_executor/model_loader/utils.py
get_architecture_class_name
¶
get_architecture_class_name(
model_config: ModelConfig,
) -> str
get_model_architecture
¶
get_model_architecture(
model_config: ModelConfig,
) -> tuple[type[Module], str]
Source code in vllm/model_executor/model_loader/utils.py
get_model_cls
¶
get_model_cls(model_config: ModelConfig) -> type[Module]
initialize_model
¶
initialize_model(
vllm_config: VllmConfig,
*,
prefix: str = "",
model_class: Optional[type[Module]] = None,
model_config: Optional[ModelConfig] = None,
) -> Module
Initialize a model with the given configurations.
Source code in vllm/model_executor/model_loader/utils.py
process_weights_after_loading
¶
process_weights_after_loading(
model: Module,
model_config: ModelConfig,
target_device: device,
) -> None
Source code in vllm/model_executor/model_loader/utils.py
resolve_transformers_arch
¶
resolve_transformers_arch(
model_config: ModelConfig, architectures: list[str]
)