vllm.engine.output_processor.single_step
SingleStepOutputProcessor
¶
Bases: SequenceGroupOutputProcessor
SequenceGroupOutputProcessor which handles "output processing" logic, which happens after the model returns generated token ids and before scheduling of the next batch. Output processing logic includes detokenization, and determining if a sequence is finished (e.g. via max len or eos token).
The SingleStepOutputProcessor is specialized to the case where the model emits at most a single token per invocation, which precludes configurations such as speculative decoding or multi-step decoding. This enables beam search sampling, which requires forking/finishing/freeing sequences in a way that is currently difficult to schedule multiple steps ahead of time.
Source code in vllm/engine/output_processor/single_step.py
__init__
¶
__init__(
scheduler_config: SchedulerConfig,
detokenizer: Detokenizer,
scheduler: List[Scheduler],
seq_counter: Counter,
stop_checker: StopChecker,
)
Source code in vllm/engine/output_processor/single_step.py
_process_sequence_group_outputs
¶
_process_sequence_group_outputs(
seq_group: SequenceGroup,
outputs: SequenceGroupOutput,
is_async: bool,
) -> None
Source code in vllm/engine/output_processor/single_step.py
process_outputs
¶
process_outputs(
sequence_group: SequenceGroup,
outputs: List[SequenceGroupOutput],
is_async: bool,
) -> None
Append all new tokens to sequences in the sequence group. Fork any surviving beam candidates; free any unsurviving ones.
Invokes detokenizer to detokenize new tokens, and also marks sequences as finished if they meet stop conditions.
is_async - Indicates whether this postprocessor runs in parallel with the GPU forward pass and is processing tokens from the previous step. If this is true, then no tokens need to be appended since it is already done externally (before the next schedule() call)
Source code in vllm/engine/output_processor/single_step.py
process_prompt_logprob
¶
process_prompt_logprob(
seq_group: SequenceGroup,
outputs: List[SequenceGroupOutput],
) -> None
Process prompt logprobs associated with one step of a single-step- scheduled computation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seq_group
|
SequenceGroup
|
the output is associated with this
|
required |
outputs
|
List[SequenceGroupOutput]
|
the
|
required |
Source code in vllm/engine/output_processor/single_step.py
single_step_process_prompt_logprob
¶
single_step_process_prompt_logprob(
sg_output_proc: SequenceGroupOutputProcessor,
seq_group: SequenceGroup,
output: CompletionSequenceGroupOutput,
) -> None
Process prompt logprobs associated with the
SequenceGroupOutput
for a given step.
Do nothing if the output has no prompt logprobs.
Account for the fact that transformers do not compute first-token logprobs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sg_output_proc
|
SequenceGroupOutputProcessor
|
|
required |
seq_group
|
SequenceGroup
|
the output is associated with this
|
required |
output
|
CompletionSequenceGroupOutput
|
the |
required |