vllm.entrypoints.utils
VLLM_SUBCMD_PARSER_EPILOG
module-attribute
¶
VLLM_SUBCMD_PARSER_EPILOG = "Tip: Use `vllm [serve|run-batch|bench <bench_type>] --help=<keyword>` to explore arguments from help.\n - To view a argument group: --help=ModelConfig\n - To view a single argument: --help=max-num-seqs\n - To search by keyword: --help=max\n - To list all groups: --help=listgroup"
_validate_truncation_size
¶
_validate_truncation_size(
max_model_len: int,
truncate_prompt_tokens: Optional[int],
tokenization_kwargs: Optional[dict[str, Any]] = None,
) -> Optional[int]
Source code in vllm/entrypoints/utils.py
cli_env_setup
¶
Source code in vllm/entrypoints/utils.py
decrement_server_load
¶
get_max_tokens
¶
get_max_tokens(
max_model_len: int,
request: Union[
ChatCompletionRequest, CompletionRequest
],
input_length: int,
default_sampling_params: dict,
) -> int
Source code in vllm/entrypoints/utils.py
listen_for_disconnect
async
¶
Returns if a disconnect message is received
Source code in vllm/entrypoints/utils.py
load_aware_call
¶
Source code in vllm/entrypoints/utils.py
show_filtered_argument_or_group_from_help
¶
show_filtered_argument_or_group_from_help(
parser: ArgumentParser, subcommand_name: list[str]
)
Source code in vllm/entrypoints/utils.py
with_cancellation
¶
Decorator that allows a route handler to be cancelled by client disconnections.
This does not use request.is_disconnected, which does not work with middleware. Instead this follows the pattern from starlette.StreamingResponse, which simultaneously awaits on two tasks- one to wait for an http disconnect message, and the other to do the work that we want done. When the first task finishes, the other is cancelled.
A core assumption of this method is that the body of the request has already been read. This is a safe assumption to make for fastapi handlers that have already parsed the body of the request into a pydantic model for us. This decorator is unsafe to use elsewhere, as it will consume and throw away all incoming messages for the request while it looks for a disconnect message.
In the case where a StreamingResponse
is returned by the handler, this
wrapper will stop listening for disconnects and instead the response object
will start listening for disconnects.