vllm.worker.cpu_pooling_model_runner
CPUPoolingModelRunner
¶
Bases: CPUModelRunnerBase[ModelInputForCPUWithPoolingMetadata]
Source code in vllm/worker/cpu_pooling_model_runner.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
_builder_cls
class-attribute
instance-attribute
¶
_builder_cls: Type[ModelInputForCPUBuilder] = (
ModelInputForCPUBuilder
)
_model_input_cls
class-attribute
instance-attribute
¶
_model_input_cls: Type[
ModelInputForCPUWithPoolingMetadata
] = ModelInputForCPUWithPoolingMetadata
_prepare_pooling
¶
_prepare_pooling(
seq_group_metadata_list: List[SequenceGroupMetadata],
prompt_lens: List[int],
) -> PoolingMetadata
Prepare PoolingMetadata for the sequence group metadata list.
Source code in vllm/worker/cpu_pooling_model_runner.py
execute_model
¶
execute_model(
model_input: ModelInputForCPUWithPoolingMetadata,
kv_caches: List[Tensor],
intermediate_tensors: Optional[
IntermediateTensors
] = None,
num_steps: int = 1,
) -> Optional[
Union[List[PoolerOutput], IntermediateTensors]
]
Source code in vllm/worker/cpu_pooling_model_runner.py
make_model_input_from_broadcasted_tensor_dict
¶
make_model_input_from_broadcasted_tensor_dict(
tensor_dict: Dict[str, Any],
) -> ModelInputForCPUWithPoolingMetadata
Source code in vllm/worker/cpu_pooling_model_runner.py
prepare_model_input
¶
prepare_model_input(
seq_group_metadata_list: Optional[
List[SequenceGroupMetadata]
],
virtual_engine: int = 0,
finished_requests_ids: Optional[List[str]] = None,
) -> ModelInputForCPUWithPoolingMetadata
Source code in vllm/worker/cpu_pooling_model_runner.py
ModelInputForCPUWithPoolingMetadata
dataclass
¶
Bases: ModelInputForCPU
Used by the CPUPoolingModelRunner.
Source code in vllm/worker/cpu_pooling_model_runner.py
pooling_metadata
class-attribute
instance-attribute
¶
pooling_metadata: Optional[PoolingMetadata] = None
__init__
¶
__init__(
input_tokens: Optional[Tensor] = None,
input_positions: Optional[Tensor] = None,
token_type_ids: Optional[Tensor] = None,
attn_metadata: Optional[AttentionMetadata] = None,
multi_modal_kwargs: Optional[
BatchedTensorInputs
] = None,
virtual_engine: Optional[int] = None,
seq_lens: Optional[List[int]] = None,
query_lens: Optional[List[int]] = None,
lora_mapping: Optional[LoRAMapping] = None,
lora_requests: Optional[Set[LoRARequest]] = None,
pooling_metadata: Optional[PoolingMetadata] = None,
) -> None