vllm.v1.engine
Modules:
Name | Description |
---|---|
async_llm |
|
coordinator |
|
core |
|
core_client |
|
detokenizer |
|
exceptions |
|
llm_engine |
|
logprobs |
|
mm_input_cache |
|
output_processor |
|
parallel_sampling |
|
processor |
|
utils |
|
EngineCoreEvent
¶
Bases: Struct
A timestamped engine core event associated with a request.
The timestamp is a monotonic timestamps and is used for by the engine frontend to calculate intervals between engine core events. These timestamps should not be compared with timestamps from other processes.
Source code in vllm/v1/engine/__init__.py
new_event
classmethod
¶
new_event(
event_type: EngineCoreEventType,
timestamp: Optional[float] = None,
) -> EngineCoreEvent
EngineCoreEventType
¶
EngineCoreOutput
¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
kv_transfer_params
class-attribute
instance-attribute
¶
new_prompt_logprobs_tensors
class-attribute
instance-attribute
¶
new_prompt_logprobs_tensors: Optional[LogprobsTensors] = (
None
)
EngineCoreOutputs
¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
scheduler_stats
class-attribute
instance-attribute
¶
scheduler_stats: Optional[SchedulerStats] = None
EngineCoreRequest
¶
Bases: Struct
Source code in vllm/v1/engine/__init__.py
EngineCoreRequestType
¶
Bases: Enum
Request types defined as hex byte strings, so it can be sent over sockets without separate encoding step.
Source code in vllm/v1/engine/__init__.py
FinishReason
¶
Bases: IntEnum
Reason a request finished - stop, length, or abort.
Int rather than Str for more compact serialization.
stop - a stop string was emitted length - max_tokens was consumed, or max_model_len was reached abort - aborted for another reason
Source code in vllm/v1/engine/__init__.py
UtilityOutput
¶
Bases: Struct