vllm.model_executor.offloader.base ¶
Base classes for model parameter offloading.
logger module-attribute ¶
logger = init_logger(__name__)
class relation:
BaseOffloader (ABC) * implemented by: UVAOffloader * implemented by: PrefetchOffloader * uses: _ModuleOffloader * uses: _BaseParamOffloader (ABC) * implemented by: _CpuParamOffloader
BaseOffloader ¶
Bases: ABC
Base class for model parameter offloading strategies.
Offloaders control how model parameters are stored and loaded during inference. Different strategies trade memory for compute/transfer time.
Source code in vllm/model_executor/offloader/base.py
join_after_forward ¶
post_init ¶
Called after model construction completes.
Offloaders can use this to: - Finalize parameter storage - Start initial prefetching - Allocate shared resources
sync_prev_onload ¶
wrap_modules abstractmethod ¶
Wrap modules with offloading logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
modules_generator | Generator[Module, None, None] | Generator yielding modules to potentially offload. | required |
Returns:
| Type | Description |
|---|---|
list[Module] | List of modules, potentially with offloading hooks installed. |
Source code in vllm/model_executor/offloader/base.py
NoopOffloader ¶
Bases: BaseOffloader
No-op offloader that returns modules as-is without any offloading.
Source code in vllm/model_executor/offloader/base.py
wrap_modules ¶
create_offloader ¶
create_offloader(
offload_config: OffloadConfig,
) -> BaseOffloader
Create an offloader based on the offload configuration.
Uses the explicit offload_backend selector. When set to "auto", selects prefetch if offload_group_size > 0, UVA if cpu_offload_gb > 0, otherwise noop.
Source code in vllm/model_executor/offloader/base.py
get_offloader ¶
get_offloader() -> BaseOffloader
set_offloader ¶
set_offloader(instance: BaseOffloader) -> None