vllm.tokenizers.deepseek_v4_encoding ¶
DeepSeek-V4 Encoding
A self-contained implementation for encoding/decoding DeepSeek-V4 chat messages with tool calling, thinking mode, and quick instruction task support.
_drop_thinking_messages ¶
Drop reasoning and non-essential messages before the last user message.
Behavior: - Messages with role in ["user", "system", "tool", "latest_reminder"] are always kept. - Messages at or after the last user index are always kept. - Assistant messages before the last user get reasoning removed. - Developer messages before the last user are dropped entirely.
Source code in vllm/tokenizers/deepseek_v4_encoding.py
_read_until_stop ¶
Read text from index until one of the stop strings is found.
Returns:
| Type | Description |
|---|---|
Tuple[int, str, Optional[str]] | Tuple of (new_index, content_before_stop, matched_stop_string_or_None). |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
decode_dsml_to_arguments ¶
Decode DSML parameters back to a tool call dict.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name | str | Name of the tool. | required |
tool_args | Dict[str, Tuple[str, str]] | Dict mapping param_name -> (value, is_string_flag). | required |
Returns:
| Type | Description |
|---|---|
Dict[str, str] | Dict with "name" and "arguments" (JSON string) keys. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
encode_arguments_to_dsml ¶
Encode tool call arguments into DSML parameter format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_call | Dict[str, Any] | Dict with "name" and "arguments" keys. | required |
Returns:
| Type | Description |
|---|---|
str | DSML-formatted parameter string. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
encode_messages ¶
encode_messages(
messages: List[Dict[str, Any]],
thinking_mode: str,
context: Optional[List[Dict[str, Any]]] = None,
drop_thinking: bool = True,
add_default_bos_token: bool = True,
reasoning_effort: Optional[str] = None,
) -> str
Encode a list of messages into the DeepSeek-V4 prompt format.
This is the main entry point for encoding conversations. It handles: - BOS token insertion - Thinking mode with optional reasoning content dropping - Tool message merging into user messages - Multi-turn conversation context
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages | List[Dict[str, Any]] | List of message dicts to encode. | required |
thinking_mode | str | Either "chat" or "thinking". | required |
context | Optional[List[Dict[str, Any]]] | Optional preceding context messages (already encoded prefix). | None |
drop_thinking | bool | If True, drop reasoning from earlier assistant turns (only keep reasoning for messages after the last user message). | True |
add_default_bos_token | bool | Whether to prepend BOS token at conversation start. | True |
reasoning_effort | Optional[str] | Optional reasoning effort level ("max", "high", or None). | None |
Returns:
| Type | Description |
|---|---|
str | The encoded prompt string. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
find_last_user_index ¶
Find the index of the last user/developer message.
Source code in vllm/tokenizers/deepseek_v4_encoding.py
merge_tool_messages ¶
Merge tool messages into the preceding user message using content_blocks format.
DeepSeek-V4 does not have a standalone "tool" role; instead, tool results are encoded as
This function converts a standard OpenAI-format conversation (with separate "tool" role messages) into V4 format where tool results are merged into user messages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages | List[Dict[str, Any]] | List of message dicts in OpenAI format. | required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]] | Processed message list with tool messages merged into user messages. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
parse_message_from_completion_text ¶
Parse a model completion text into a structured assistant message.
This function takes the raw text output from the model (a single assistant turn) and extracts: - reasoning (thinking block) - content (summary/response) - tool_calls (if any)
NOTE: This function is designed to parse only correctly formatted strings and will raise ValueError for malformed output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | The raw completion text (including EOS token). | required |
thinking_mode | str | Either "chat" or "thinking". | required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any] | Dict with keys: "role", "content", "reasoning", "tool_calls". |
Dict[str, Any] | tool_calls are in OpenAI format. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
parse_tool_calls ¶
Parse DSML tool calls from text starting at the given index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index | int | Starting position in text. | required |
text | str | The full text to parse. | required |
Returns:
| Type | Description |
|---|---|
int | Tuple of (new_index, last_stop_token, list_of_tool_call_dicts). |
Optional[str] | Each tool call dict has "name" and "arguments" keys. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
render_message ¶
render_message(
index: int,
messages: List[Dict[str, Any]],
thinking_mode: str,
drop_thinking: bool = True,
reasoning_effort: Optional[str] = None,
) -> str
Render a single message at the given index into its encoded string form.
This is the core function that converts each message in the conversation into the DeepSeek-V4 format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index | int | Index of the message to render. | required |
messages | List[Dict[str, Any]] | Full list of messages in the conversation. | required |
thinking_mode | str | Either "chat" or "thinking". | required |
drop_thinking | bool | Whether to drop reasoning content from earlier turns. | True |
reasoning_effort | Optional[str] | Optional reasoning effort level ("max", "high", or None). | None |
Returns:
| Type | Description |
|---|---|
str | Encoded string for this message. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | |
render_tools ¶
Render tool schemas into the system prompt format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tools | List[Dict[str, Union[str, Dict[str, Any]]]] | List of tool schema dicts (each with name, description, parameters). | required |
Returns:
| Type | Description |
|---|---|
str | Formatted tools section string. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
sort_tool_results_by_call_order ¶
Sort tool_result blocks within user messages by the order of tool_calls in the preceding assistant message.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages | List[Dict[str, Any]] | Preprocessed message list (after merge_tool_messages). | required |
Returns:
| Type | Description |
|---|---|
List[Dict[str, Any]] | Message list with sorted tool result blocks. |
Source code in vllm/tokenizers/deepseek_v4_encoding.py
to_json ¶
tool_calls_from_openai_format ¶
Convert OpenAI-format tool calls to internal format.
Source code in vllm/tokenizers/deepseek_v4_encoding.py
tool_calls_to_openai_format ¶
Convert internal tool calls to OpenAI format.