vllm.inputs.llm ¶
Schema and utilities for input prompts to the LLM API.
DecoderOnlyPrompt module-attribute ¶
DecoderOnlyPrompt: TypeAlias = (
str
| TextPrompt
| list[int]
| TokensPrompt
| EmbedsPrompt
)
Schema of a prompt for a decoder-only model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt) - An embeddings prompt (
EmbedsPrompt)
For encoder-decoder models, passing a singleton prompt is shorthand for passing ExplicitEncoderDecoderPrompt(encoder_prompt=prompt, decoder_prompt=None).
DecoderPrompt module-attribute ¶
DecoderPrompt: TypeAlias = (
str | TextPrompt | list[int] | TokensPrompt
)
Schema of a prompt for the decoder part of an encoder-decoder model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt)
Note
Multi-modal inputs are not supported for decoder prompts.
EncoderDecoderPrompt module-attribute ¶
EncoderDecoderPrompt: TypeAlias = (
EncoderPrompt | ExplicitEncoderDecoderPrompt
)
Schema for a prompt for an encoder-decoder model.
You can pass a singleton encoder prompt, in which case the decoder prompt is considered to be None (i.e., infer automatically).
EncoderPrompt module-attribute ¶
EncoderPrompt: TypeAlias = (
str | TextPrompt | list[int] | TokensPrompt
)
Schema of a prompt for the encoder part of a encoder-decoder model:
- A text prompt (string or
TextPrompt) - A tokenized prompt (list of token IDs, or
TokensPrompt)
ModalityData module-attribute ¶
Either a single data item, or a list of data items. Can only be None if UUID is provided.
The number of data items allowed per modality is restricted by --limit-mm-per-prompt.
MultiModalDataDict module-attribute ¶
MultiModalDataDict: TypeAlias = Mapping[
str, ModalityData[Any]
]
A dictionary containing an entry for each modality type to input.
The built-in modalities are defined by MultiModalDataBuiltins.
MultiModalUUIDDict module-attribute ¶
A dictionary containing user-provided UUIDs for items in each modality. If a UUID for an item is not provided, its entry will be None and MultiModalHasher will compute a hash for the item.
The UUID will be used to identify the item for all caching purposes (input processing caching, embedding caching, prefix caching, etc).
PromptType module-attribute ¶
PromptType: TypeAlias = (
DecoderOnlyPrompt | EncoderDecoderPrompt
)
Schema for any prompt, regardless of model type.
This is the input format accepted by most LLM APIs.
SingletonPrompt module-attribute ¶
SingletonPrompt: TypeAlias = (
DecoderOnlyPrompt | EncoderPrompt | DecoderPrompt
)
Schema for a single prompt. This is as opposed to a data structure which encapsulates multiple prompts, such as ExplicitEncoderDecoderPrompt.
DataPrompt ¶
Bases: _PromptOptions
Represents generic inputs that are converted to PromptType by IO processor plugins.
Source code in vllm/inputs/llm.py
EmbedsPrompt ¶
Bases: _PromptOptions
Schema for a prompt provided via token embeddings.
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token embeddings, if available.
ExplicitEncoderDecoderPrompt ¶
Bases: TypedDict
Schema for a pair of encoder and decoder singleton prompts.
Note
This schema is not valid for decoder-only models.
Source code in vllm/inputs/llm.py
decoder_prompt instance-attribute ¶
decoder_prompt: DecoderPrompt | None
The prompt for the decoder part of the model.
Passing None will cause the prompt to be inferred automatically.
encoder_prompt instance-attribute ¶
encoder_prompt: EncoderPrompt
The prompt for the encoder part of the model.
MultiModalDataBuiltins ¶
Bases: TypedDict
Type annotations for modality types predefined by vLLM.
Source code in vllm/inputs/llm.py
vision_chunk instance-attribute ¶
vision_chunk: ModalityData[VisionChunk]
The input visual atom(s) - unified modality for images and video chunks.
TextPrompt ¶
TokensPrompt ¶
Bases: _PromptOptions
Schema for a tokenized prompt.
Source code in vllm/inputs/llm.py
prompt instance-attribute ¶
prompt: NotRequired[str]
The prompt text corresponding to the token IDs, if available.
prompt_token_ids instance-attribute ¶
A list of token IDs to pass to the model.
token_type_ids instance-attribute ¶
token_type_ids: NotRequired[list[int]]
A list of token type IDs to pass to the cross encoder model.
_PromptOptions ¶
Bases: TypedDict
Additional options available to all SingletonPrompt types.
Source code in vllm/inputs/llm.py
cache_salt instance-attribute ¶
cache_salt: NotRequired[str]
Optional cache salt to be used for prefix caching.
mm_processor_kwargs instance-attribute ¶
mm_processor_kwargs: NotRequired[dict[str, Any] | None]
Optional multi-modal processor kwargs to be forwarded to the multimodal input mapper & processor. Note that if multiple modalities have registered mappers etc for the model being considered, we attempt to pass the mm_processor_kwargs to each of them.
multi_modal_data instance-attribute ¶
multi_modal_data: NotRequired[MultiModalDataDict | None]
Optional multi-modal data to pass to the model, if the model supports it.
multi_modal_uuids instance-attribute ¶
multi_modal_uuids: NotRequired[MultiModalUUIDDict]
Optional user-specified UUIDs for multimodal items, mapped by modality. Lists must match the number of items per modality and may contain None. For None entries, the hasher will compute IDs automatically; non-None entries override the default hashes for caching, and MUST be unique per multimodal item.