vllm.v1.executor.ray_distributed_executor
FutureWrapper
¶
Bases: Future
A wrapper around a Ray output reference to meet the interface of .execute_model().
Source code in vllm/v1/executor/ray_distributed_executor.py
__init__
¶
RayDistributedExecutor
¶
Bases: RayDistributedExecutor
, Executor
Ray distributed executor using Ray Compiled Graphs.
Source code in vllm/v1/executor/ray_distributed_executor.py
max_concurrent_batches
property
¶
max_concurrent_batches: int
Ray distributed executor supports pipeline parallelism, meaning that it allows PP size batches to be executed concurrently.
execute_model
¶
execute_model(
scheduler_output,
) -> Union[ModelRunnerOutput, Future[ModelRunnerOutput]]
Execute the model on the Ray workers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scheduler_output
|
The scheduler output to execute. |
required |
Returns:
Type | Description |
---|---|
Union[ModelRunnerOutput, Future[ModelRunnerOutput]]
|
The model runner output. |