vllm.model_executor.model_loader.tensorizer_loader
TensorizerLoader
¶
Bases: BaseModelLoader
Model loader using CoreWeave's tensorizer library.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
__init__
¶
__init__(load_config: LoadConfig)
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_get_weights_iterator
¶
_load_model_serialized_cpu
¶
_load_model_serialized_cpu(
vllm_config: VllmConfig,
) -> Module
Load a serialized model with tensorizer to the CPU.
This is only necessary when the model isn't vLLM-tensorized (see examples/others/tensorize_vllm_model.py) This should still be faster than default HuggingFace loading, but will be slower than loading a vLLM-tensorized model.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_patch_tensorizer_config
¶
_patch_tensorizer_config(
model_config: ModelConfig,
) -> TensorizerConfig
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
_verify_config
¶
_verify_config(
model_config: ModelConfig,
parallel_config: ParallelConfig,
)
download_model
¶
download_model(model_config: ModelConfig) -> None
load_model
¶
load_model(
vllm_config: VllmConfig, model_config: ModelConfig
) -> Module
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
load_weights
¶
load_weights(
model: Module, model_config: ModelConfig
) -> None
Load serialized model weights with tensorizer.
Expects a vLLM-tensorized model. See the examples/others/tensorize_vllm_model.py example script for serializing vLLM models.
Source code in vllm/model_executor/model_loader/tensorizer_loader.py
save_model
staticmethod
¶
save_model(
model: Module,
tensorizer_config: Union[TensorizerConfig, dict],
) -> None