vllm.prompt_adapter.worker_manager
LRUCacheWorkerPromptAdapterManager
¶
Bases: WorkerPromptAdapterManager
WorkerPromptAdapterManager that manages prompt_adapter models on the worker side.
Uses an LRU Cache. Every request, the requested prompt_adapters will be loaded (unless they are already loaded) and least recently used prompt_adapters will be unloaded if the cache is above capacity.
Source code in vllm/prompt_adapter/worker_manager.py
_prompt_adapter_manager_cls
class-attribute
instance-attribute
¶
_prompt_adapter_manager_cls: Type[
LRUCachePromptAdapterModelManager
] = LRUCachePromptAdapterModelManager
_apply_adapters
¶
_apply_adapters(
prompt_adapter_requests: Set[PromptAdapterRequest],
) -> None
Source code in vllm/prompt_adapter/worker_manager.py
add_adapter
¶
add_adapter(
prompt_adapter_request: PromptAdapterRequest,
) -> bool
Source code in vllm/prompt_adapter/worker_manager.py
create_prompt_adapter_manager
¶
Source code in vllm/prompt_adapter/worker_manager.py
WorkerPromptAdapterManager
¶
Bases: AbstractWorkerManager
WorkerPromptAdapterManager that manages prompt_adapter models on the worker side.
Every request, the requested prompt_adapters will be loaded (unless they are already loaded), and every other prompt_adapter will be unloaded.
Source code in vllm/prompt_adapter/worker_manager.py
_manager_cls
class-attribute
instance-attribute
¶
_manager_cls: Type[PromptAdapterModelManager] = (
PromptAdapterModelManager
)
__init__
¶
__init__(
max_num_seqs: int,
max_num_batched_tokens: int,
device: device,
prompt_adapter_config: PromptAdapterConfig,
prompt_adapter_model_cls: Type[
PromptAdapterModel
] = PromptAdapterModel,
)
Source code in vllm/prompt_adapter/worker_manager.py
_apply_adapters
¶
_load_adapter
¶
_load_adapter(
prompt_adapter_request: PromptAdapterRequest,
) -> PromptAdapterModel
Source code in vllm/prompt_adapter/worker_manager.py
add_adapter
¶
add_dummy_prompt_adapter
¶
add_dummy_prompt_adapter(
prompt_adapter_request: PromptAdapterRequest,
) -> bool