vllm.spec_decode.proposer_worker_base
NonLLMProposerWorkerBase
¶
Bases: ProposerWorkerBase
, ABC
Proposer worker which does not use a model with kvcache
Source code in vllm/spec_decode/proposer_worker_base.py
determine_num_available_blocks
¶
execute_model
¶
execute_model(
execute_model_req: Optional[ExecuteModelRequest] = None,
) -> List[SamplerOutput]
ProposerWorkerBase
¶
Bases: LoRANotSupportedWorkerBase
, SpeculativeProposer
Interface for proposer workers
Source code in vllm/spec_decode/proposer_worker_base.py
sampler_output
abstractmethod
¶
sampler_output(
execute_model_req: ExecuteModelRequest,
sample_len: int,
seq_ids_with_bonus_token_in_last_step: Set[int],
) -> Tuple[Optional[List[SamplerOutput]], bool]