vllm.transformers_utils.detokenizer_utils
INITIAL_INCREMENTAL_DETOKENIZATION_OFFSET
module-attribute
¶
_convert_tokens_to_string_with_added_encoders
¶
_convert_tokens_to_string_with_added_encoders(
tokenizer: AnyTokenizer,
output_tokens: list[str],
skip_special_tokens: bool,
spaces_between_special_tokens: bool,
) -> str
Source code in vllm/transformers_utils/detokenizer_utils.py
_replace_none_with_empty
¶
convert_ids_list_to_tokens
¶
convert_ids_list_to_tokens(
tokenizer: AnyTokenizer, token_ids: list[int]
) -> list[str]
Detokenize the input ids individually.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokenizer
|
AnyTokenizer
|
tokenizer used by model under test |
required |
token_ids
|
list[int]
|
convert these tokens (Python list form) |
required |
Returns:
Type | Description |
---|---|
list[str]
|
Python list of token string representations |
Source code in vllm/transformers_utils/detokenizer_utils.py
convert_prompt_ids_to_tokens
¶
convert_prompt_ids_to_tokens(
tokenizer: AnyTokenizer,
prompt_ids: list[int],
skip_special_tokens: bool = False,
) -> tuple[list[str], int, int]
Converts the prompt ids to tokens and returns the tokens and offsets for incremental detokenization.
Note that not all tokens are converted to strings. Only the tokens that are necessary for incremental detokenization are converted to strings.
Source code in vllm/transformers_utils/detokenizer_utils.py
detokenize_incrementally
¶
detokenize_incrementally(
tokenizer: AnyTokenizer,
all_input_ids: list[int],
prev_tokens: Optional[list[str]],
prefix_offset: int,
read_offset: int,
skip_special_tokens: bool = False,
spaces_between_special_tokens: bool = True,
) -> tuple[list[str], str, int, int]
Detokenizes the input ids incrementally and returns the new tokens and the new text.
If prev_tokens
is None, this function will convert the input ids to
tokens and return the tokens and the new text. Otherwise, it will return the
new tokens and the new text.
This function will also return the new prefix offset and the new read offset to be used in the next iteration.
The offsets are necessary to defeat cleanup algorithms in the decode which decide to add a space or not depending on the surrounding ids.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tokenizer
|
AnyTokenizer
|
The tokenizer to use. |
required |
all_input_ids
|
list[int]
|
The input ids. The last id is the new token id. |
required |
prev_tokens
|
Optional[list[str]]
|
The previous tokens. If None, this function will convert the input ids to tokens and return the tokens and the new text. |
required |
prefix_offset
|
int
|
The prefix offset. |
required |
read_offset
|
int
|
The read offset. |
required |
skip_special_tokens
|
bool
|
Whether to skip special tokens. |
False
|
spaces_between_special_tokens
|
bool
|
Whether to add spaces between special tokens. |
True
|
Source code in vllm/transformers_utils/detokenizer_utils.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|