vllm.entrypoints.api_server
NOTE: This API server is used only for demonstrating usage of AsyncEngine
and simple performance benchmarks. It is not intended for production use.
For production use, we recommend using our OpenAI compatible server.
We are also not going to accept PRs modifying this file, please
change vllm/entrypoints/openai/api_server.py
instead.
_generate
async
¶
_generate(
request_dict: dict, raw_request: Request
) -> Response
Source code in vllm/entrypoints/api_server.py
generate
async
¶
Generate completion for the request.
The request should be a JSON object with the following fields:
- prompt: the prompt to use for the generation.
- stream: whether to stream the results or not.
- other fields: the sampling parameters (See SamplingParams
for details).
Source code in vllm/entrypoints/api_server.py
health
async
¶
init_app
async
¶
init_app(
args: Namespace,
llm_engine: Optional[AsyncLLMEngine] = None,
) -> FastAPI
Source code in vllm/entrypoints/api_server.py
run_server
async
¶
run_server(
args: Namespace,
llm_engine: Optional[AsyncLLMEngine] = None,
**uvicorn_kwargs: Any,
) -> None