vllm.model_executor.guided_decoding.outlines_decoding
JSON_GRAMMAR
module-attribute
¶
JSON_GRAMMAR = '\n?start: object | array\n\n?value: object\n| array\n| UNESCAPED_STRING\n| SIGNED_NUMBER -> number\n| "true" -> true\n| "false" -> false\n| "null" -> null\n\narray : "[" [value ("," value)*] "]"\nobject : "{" [pair ("," pair)*] "}"\npair : UNESCAPED_STRING ":" value\n\n%import common.UNESCAPED_STRING\n%import common.SIGNED_NUMBER\n%import common.WS\n\n%ignore WS\n'
GuidedDecodingMode
¶
Bases: Enum
Source code in vllm/model_executor/guided_decoding/outlines_decoding.py
_get_guide_and_mode
¶
_get_guide_and_mode(
guided_params: GuidedDecodingParams,
) -> Union[
tuple[str, GuidedDecodingMode], tuple[None, None]
]
Source code in vllm/model_executor/guided_decoding/outlines_decoding.py
_get_logits_processor
¶
_get_logits_processor(
guide: str,
tokenizer: PreTrainedTokenizerBase,
mode: GuidedDecodingMode,
whitespace_pattern: Union[str, None],
reasoner: Optional[ReasoningParser],
) -> Union[
JSONLogitsProcessor,
RegexLogitsProcessor,
CFGLogitsProcessor,
]
Source code in vllm/model_executor/guided_decoding/outlines_decoding.py
get_local_outlines_guided_decoding_logits_processor
¶
get_local_outlines_guided_decoding_logits_processor(
guided_params: GuidedDecodingParams,
tokenizer: PreTrainedTokenizerBase,
reasoner: Optional[ReasoningParser],
) -> Union[
JSONLogitsProcessor,
RegexLogitsProcessor,
CFGLogitsProcessor,
None,
]
Given an OpenAI-compatible request, check for guided decoding parameters and get the necessary logits processor for the given guide. We cache logit processors by (guide, tokenizer), and on cache hit we make a shallow copy to reuse the same underlying FSM.
Source code in vllm/model_executor/guided_decoding/outlines_decoding.py
get_outlines_guided_decoding_logits_processor
async
¶
get_outlines_guided_decoding_logits_processor(
guided_params: GuidedDecodingParams,
tokenizer: PreTrainedTokenizerBase,
reasoner: Optional[ReasoningParser],
) -> Union[
JSONLogitsProcessor,
RegexLogitsProcessor,
CFGLogitsProcessor,
None,
]
Given an OpenAI-compatible request, check for guided decoding parameters and get the necessary logits processor for the given guide. We cache logit processors by (guide, tokenizer), and on cache hit we make a shallow copy to reuse the same underlying FSM.