vllm.engine.multiprocessing
Modules:
Name | Description |
---|---|
client |
|
engine |
|
REQUEST_OUTPUTS_T
module-attribute
¶
REQUEST_OUTPUTS_T = Union[
List[RequestOutput],
RPCAdapterLoadedResponse,
RPCIsSleepingResponse,
RPCError,
]
RPC_REQUEST_T
module-attribute
¶
RPC_REQUEST_T = Union[
RPCProcessRequest,
RPCAbortRequest,
RPCStartupRequest,
RPCUProfileRequest,
RPCLoadAdapterRequest,
RPCResetMultiModalCacheRequest,
RPCResetPrefixCacheRequest,
RPCSleepRequest,
RPCWakeUpRequest,
RPCIsSleepingRequest,
]
MQEngineDeadError
¶
Bases: RuntimeError
RPCAbortRequest
dataclass
¶
RPCAdapterLoadedResponse
dataclass
¶
RPCError
dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingRequest
dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCIsSleepingResponse
dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCLoadAdapterRequest
dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
RPCProcessRequest
dataclass
¶
Source code in vllm/engine/multiprocessing/__init__.py
lora_request
class-attribute
instance-attribute
¶
lora_request: Optional[LoRARequest] = lora_request
prompt_adapter_request
class-attribute
instance-attribute
¶
prompt_adapter_request: Optional[PromptAdapterRequest] = (
prompt_adapter_request
)
trace_headers
class-attribute
instance-attribute
¶
__init__
¶
__init__(
prompt: PromptType,
params: Union[SamplingParams, PoolingParams],
request_id: str,
lora_request: Optional[LoRARequest] = None,
trace_headers: Optional[Mapping[str, str]] = None,
prompt_adapter_request: Optional[
PromptAdapterRequest
] = None,
priority: int = 0,
) -> None
Source code in vllm/engine/multiprocessing/__init__.py
RPCResetMultiModalCacheRequest
¶
RPCResetPrefixCacheRequest
dataclass
¶
RPCSleepRequest
¶
RPCStartupRequest
¶
RPCStartupResponse
dataclass
¶
RPCUProfileRequest
¶
RPCWakeUpRequest
dataclass
¶
ENGINE_DEAD_ERROR
¶
ENGINE_DEAD_ERROR(
error: Optional[BaseException] = None,
) -> MQEngineDeadError