vllm.engine.output_processor.stop_checker
StopChecker
¶
LLMEngine helper class which separates out the logic involving stop checking. This checks things such as: whether the eos token was emitted, whether the max_tokens has been consumed, whether a stop string has been emitted, or if we have exceeded the max model len.
Source code in vllm/engine/output_processor/stop_checker.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
__init__
¶
__init__(
max_model_len: int,
get_tokenizer_for_seq: Callable[
[Sequence], AnyTokenizer
],
)
Source code in vllm/engine/output_processor/stop_checker.py
_get_max_model_len
¶
_get_max_model_len(lora_req: Optional[LoRARequest])
check_stop_strings
staticmethod
¶
check_stop_strings(
output_text: str,
new_char_count: int,
stop: List[str],
include_in_output: bool,
) -> Optional[Tuple[str, int]]
Check if any stop strings are matched and truncate sequence output text accordingly.
Returns tuple (stop_string, offset) if matched or else None.
Where stop_string is the matched stop string and offset is the length to which output_text should be truncated, or -1 for no truncation.
Source code in vllm/engine/output_processor/stop_checker.py
maybe_stop_sequence
¶
maybe_stop_sequence(
seq: Sequence,
new_char_count: int,
sampling_params: SamplingParams,
lora_req: Optional[LoRARequest] = None,
) -> None
Stop the finished sequences.
new_char_count is the number of chars added to the sequence's output text for the newly generated token