vllm.spec_decode.interfaces
SpeculativeProposals
dataclass
¶
Datastructure used to represent proposal tokens from some proposer. It also tracks how many speculative tokens each sequence has.
Source code in vllm/spec_decode/interfaces.py
SpeculativeProposer
¶
Bases: ABC
Source code in vllm/spec_decode/interfaces.py
get_spec_proposals
abstractmethod
¶
get_spec_proposals(
execute_model_req: ExecuteModelRequest,
seq_ids_with_bonus_token_in_last_step: Set[int],
) -> SpeculativeProposals
Source code in vllm/spec_decode/interfaces.py
SpeculativeScorer
¶
Bases: ABC
Source code in vllm/spec_decode/interfaces.py
__init__
¶
__init__(
scorer_worker: WorkerBase,
device: Union[device, str],
vocab_size: int,
)
Source code in vllm/spec_decode/interfaces.py
score_proposals
abstractmethod
¶
score_proposals(
execute_model_req: ExecuteModelRequest,
proposals: SpeculativeProposals,
) -> SpeculativeScores
SpeculativeScores
dataclass
¶
Datastructure used to represent the scores of speculative tokens according to the scoring model.