vllm.model_executor.models

Modules:

Name	Description
`adapters`
`aimv2`
`arctic`	Inference-only Snowflake Arctic model.
`aria`
`aya_vision`
`baichuan`	Inference-only BaiChuan model compatible with HuggingFace weights.
`bamba`	Inference-only Bamba model.
`bart`	PyTorch BART model.
`bert`
`bert_with_rope`
`blip`	Minimal implementation of BlipVisionModel intended to be only used
`blip2`
`bloom`	Inference-only BLOOM model compatible with HuggingFace weights.
`chameleon`
`chatglm`	Inference-only ChatGLM model compatible with THUDM weights.
`clip`	Minimal implementation of CLIPVisionModel intended to be only used
`commandr`	PyTorch Cohere model.
`config`
`constant_size_cache`
`dbrx`
`deepseek`	Inference-only Deepseek model.
`deepseek_mtp`
`deepseek_v2`	Inference-only DeepseekV2/DeepseekV3 model.
`deepseek_vl2`	Inference-only Deepseek-VL2 model compatible with HuggingFace weights.
`dots1`	Inference-only dots1 model.
`eagle`
`ernie45`	Inference-only Erine model compatible with HuggingFace weights.
`ernie45_moe`	Inference-only ErineMoE model compatible with HuggingFace weights.
`exaone`	Inference-only Exaone model compatible with HuggingFace weights.
`fairseq2_llama`	Llama model for fairseq2 weights.
`falcon`	PyTorch Falcon model.
`falcon_h1`	Inference-only FalconH1 model.
`florence2`
`fuyu`	PyTorch Fuyu model.
`gemma`	Inference-only Gemma model compatible with HuggingFace weights.
`gemma2`
`gemma3`
`gemma3_mm`
`gemma3n`
`glm`	Inference-only HF format GLM-4 model compatible with THUDM weights.
`glm4`	Inference-only GLM-4-0414 model compatible with HuggingFace weights.
`glm4_1v`	Inference-only GLM-4V model compatible with HuggingFace weights.
`glm4v`	Inference-only CogAgent model compatible with THUDM weights.
`gpt2`	Inference-only GPT-2 model compatible with HuggingFace weights.
`gpt_bigcode`	Inference-only GPTBigCode model compatible with HuggingFace weights.
`gpt_j`	Inference-only GPT-J model compatible with HuggingFace weights.
`gpt_neox`	Inference-only GPT-NeoX model compatible with HuggingFace weights.
`granite`	Inference-only IBM Granite model compatible with HuggingFace weights.
`granite_speech`	Inference-only IBM Granite speech model.
`granitemoe`	Inference-only GraniteMoe model.
`granitemoehybrid`	Inference-only GraniteMoeHybrid model.
`granitemoeshared`	Inference-only GraniteMoeShared model.
`gritlm`
`grok1`	Inference-only Grok1 model.
`h2ovl`
`hunyuan_v1_moe`	Inference-only HunYuan model compatible with HuggingFace weights.
`idefics2_vision_model`	PyTorch Idefics2 model.
`idefics3`	Inference-only Idefics3 model compatible with HuggingFace weights.
`interfaces`
`interfaces_base`
`intern_vit`
`internlm2`
`internlm2_ve`
`internvl`
`jais`	Inference-only Jais model compatible with HuggingFace weights.
`jamba`	Inference-only Jamba model.
`keye`
`kimi_vl`
`llama`	Inference-only LLaMA model compatible with HuggingFace weights.
`llama4`	Inference-only LLaMA model compatible with HuggingFace weights.
`llama_eagle`
`llama_eagle3`
`llava`
`llava_next`
`llava_next_video`
`llava_onevision`
`mamba`	PyTorch MAMBA model.
`mamba2`	PyTorch MAMBA2 model.
`mamba_cache`
`medusa`
`mimo`	Inference-only MiMo model compatible with HuggingFace weights.
`mimo_mtp`	Inference-only MiMo-MTP model.
`minicpm`	Inference-only MiniCPM model compatible with HuggingFace weights.
`minicpm3`	Inference-only MiniCPM3 model compatible with HuggingFace weights.
`minicpm_eagle`	Inference-only EagleMiniCPM model compatible with HuggingFace weights.
`minicpmo`	Inference-only MiniCPM-O model compatible with HuggingFace weights.
`minicpmv`	Inference-only MiniCPM-V model compatible with HuggingFace weights.
`minimax_cache`
`minimax_text_01`	Inference-only MiniMaxText01 model.
`minimax_vl_01`
`mistral3`
`mixtral`	Inference-only Mixtral model.
`mixtral_quant`	Inference-only Mixtral model.
`mllama`	PyTorch Mllama model.
`mllama4`
`mlp_speculator`
`modernbert`
`module_mapping`
`molmo`
`moonvit`
`mpt`
`nemotron`	Inference-only Nemotron model compatible with HuggingFace weights.
`nemotron_h`	Inference-only NemotronH model.
`nemotron_nas`	Inference-only deci model compatible with HuggingFace weights.
`nvlm_d`
`olmo`	Inference-only OLMo model compatible with HuggingFace weights.
`olmo2`	Inference-only OLMo2 model compatible with HuggingFace weights.
`olmoe`	Inference-only OLMoE model compatible with HuggingFace weights.
`opt`	Inference-only OPT model compatible with HuggingFace weights.
`orion`	Inference-only Orion-14B model compatible with HuggingFace weights.
`ovis`	PyTorch Ovis model.
`paligemma`
`persimmon`	Inference-only persimmon model compatible with HuggingFace weights.
`phi`	Inference-only Phi-1.5 model compatible with HuggingFace weights.
`phi3`	Inference-only Phi3 model code inherit from Llama.py
`phi3_small`
`phi3v`
`phi4mm`
`phi4mm_audio`
`phi4mm_utils`
`phimoe`	Inference-only PhiMoE model.
`pixtral`
`plamo2`	Inference-only PLaMo2 model.
`prithvi_geospatial_mae`	Inference-only IBM/NASA Prithvi Geospatial model.
`qwen`	Inference-only QWen model compatible with HuggingFace weights.
`qwen2`	Inference-only Qwen2 model compatible with HuggingFace weights.
`qwen2_5_omni_thinker`	Inference-only Qwen2.5-Omni model (thinker part).
`qwen2_5_vl`	Inference-only Qwen2.5-VL model compatible with HuggingFace weights.
`qwen2_audio`	Inference-only Qwen2-Audio model compatible with HuggingFace weights.
`qwen2_moe`	Inference-only Qwen2MoE model compatible with HuggingFace weights.
`qwen2_rm`	Inference-only Qwen2-RM model compatible with HuggingFace weights.
`qwen2_vl`	Inference-only Qwen2-VL model compatible with HuggingFace weights.
`qwen3`	Inference-only Qwen3 model compatible with HuggingFace weights.
`qwen3_moe`	Inference-only Qwen3MoE model compatible with HuggingFace weights.
`qwen_vl`	Inference-only Qwen-VL model compatible with HuggingFace weights.
`registry`	Whenever you add an architecture to this page, please also update
`roberta`
`siglip`	Implementation of SiglipVisionModel intended to be only used
`skyworkr1v`
`smolvlm`
`solar`	Inference-only Solar model compatible with HuggingFace weights.
`stablelm`	Inference-only StabeLM (https://github.com/Stability-AI/StableLM)
`starcoder2`	PyTorch Starcoder2 model.
`tarsier`
`telechat2`
`teleflm`
`transformers`	Wrapper around `transformers` models
`ultravox`	PyTorch Ultravox model.
`utils`
`vision`
`whisper`
`zamba2`	PyTorch Zamba2 model implementation for vLLM.

ModelRegistry `module-attribute` ¶

ModelRegistry = _ModelRegistry(
    {
        model_arch: _LazyRegisteredModel(
            module_name=f"vllm.model_executor.models.{mod_relname}",
            class_name=cls_name,
        )
        for (model_arch, (mod_relname, cls_name)) in items()
    }
)

all `module-attribute` ¶

__all__ = [
    "ModelRegistry",
    "VllmModelForPooling",
    "is_pooling_model",
    "VllmModelForTextGeneration",
    "is_text_generation_model",
    "HasInnerState",
    "has_inner_state",
    "SupportsLoRA",
    "supports_lora",
    "SupportsMultiModal",
    "supports_multimodal",
    "SupportsPP",
    "supports_pp",
    "SupportsTranscription",
    "supports_transcription",
    "SupportsV0Only",
    "supports_v0_only",
]

HasInnerState ¶

Bases: Protocol

The interface required for all models that has inner state.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class HasInnerState(Protocol):
    """The interface required for all models that has inner state."""

    has_inner_state: ClassVar[Literal[True]] = True
    """
        A flag that indicates this model has inner state.
        Models that has inner state usually need access to the scheduler_config
        for max_num_seqs, etc. True for e.g. both Mamba and Jamba.
    """

has_inner_state `class-attribute` ¶

has_inner_state: Literal[True] = True

A flag that indicates this model has inner state. Models that has inner state usually need access to the scheduler_config for max_num_seqs, etc. True for e.g. both Mamba and Jamba.

SupportsLoRA ¶

Bases: Protocol

The interface required for all models that support LoRA.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsLoRA(Protocol):
    """The interface required for all models that support LoRA."""

    supports_lora: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports LoRA.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """
    # The `embedding_module` and `embedding_padding_modules`
    # are empty by default.
    embedding_modules: ClassVar[dict[str, str]] = {}
    embedding_padding_modules: ClassVar[list[str]] = []
    packed_modules_mapping: ClassVar[dict[str, list[str]]] = {}

embedding_modules `class-attribute` ¶

embedding_modules: dict[str, str] = {}

embedding_padding_modules `class-attribute` ¶

embedding_padding_modules: list[str] = []

packed_modules_mapping `class-attribute` ¶

packed_modules_mapping: dict[str, list[str]] = {}

supports_lora `class-attribute` ¶

supports_lora: Literal[True] = True

A flag that indicates this model supports LoRA.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

SupportsMultiModal ¶

Bases: Protocol

The interface required for all multi-modal models.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsMultiModal(Protocol):
    """The interface required for all multi-modal models."""

    supports_multimodal: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports multi-modal inputs.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    @classmethod
    def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
        """
        Get the placeholder text for the `i`th `modality` item in the prompt.
        """
        ...

    def get_multimodal_embeddings(self,
                                  **kwargs: object) -> MultiModalEmbeddings:
        """
        Returns multimodal embeddings generated from multimodal kwargs 
        to be merged with text embeddings.

        Note:
            The returned multimodal embeddings must be in the same order as
            the appearances of their corresponding multimodal data item in the
            input prompt.
        """
        ...

    def get_language_model(self) -> torch.nn.Module:
        """
        Returns the underlying language model used for text generation.

        This is typically the `torch.nn.Module` instance responsible for 
        processing the merged multimodal embeddings and producing hidden states

        Returns:
            torch.nn.Module: The core language model component.
        """
        ...

    # Only for models that support v0 chunked prefill
    # TODO(ywang96): Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        ...

    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    ) -> Tensor:
        """
        Returns the input embeddings merged from the text embeddings from 
        input_ids and the multimodal embeddings generated from multimodal 
        kwargs.
        """
        ...

supports_multimodal `class-attribute` ¶

supports_multimodal: Literal[True] = True

A flag that indicates this model supports multi-modal inputs.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

get_language_model ¶

get_language_model() -> Module

Returns the underlying language model used for text generation.

This is typically the torch.nn.Module instance responsible for processing the merged multimodal embeddings and producing hidden states

Returns:

Type	Description
`Module`	torch.nn.Module: The core language model component.

Source code in vllm/model_executor/models/interfaces.py

def get_language_model(self) -> torch.nn.Module:
    """
    Returns the underlying language model used for text generation.

    This is typically the `torch.nn.Module` instance responsible for 
    processing the merged multimodal embeddings and producing hidden states

    Returns:
        torch.nn.Module: The core language model component.
    """
    ...

get_multimodal_embeddings ¶

get_multimodal_embeddings(
    **kwargs: object,
) -> MultiModalEmbeddings

Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.

Note

The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.

Source code in vllm/model_executor/models/interfaces.py

def get_multimodal_embeddings(self,
                              **kwargs: object) -> MultiModalEmbeddings:
    """
    Returns multimodal embeddings generated from multimodal kwargs 
    to be merged with text embeddings.

    Note:
        The returned multimodal embeddings must be in the same order as
        the appearances of their corresponding multimodal data item in the
        input prompt.
    """
    ...

get_placeholder_str `classmethod` ¶

get_placeholder_str(modality: str, i: int) -> Optional[str]

Get the placeholder text for the ith modality item in the prompt.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
    """
    Get the placeholder text for the `i`th `modality` item in the prompt.
    """
    ...

SupportsPP ¶

Bases: Protocol

The interface required for all models that support pipeline parallel.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsPP(Protocol):
    """The interface required for all models that support pipeline parallel."""

    supports_pp: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pipeline parallel.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    def make_empty_intermediate_tensors(
        self,
        batch_size: int,
        dtype: torch.dtype,
        device: torch.device,
    ) -> "IntermediateTensors":
        """Called when PP rank > 0 for profiling purposes."""
        ...

    def forward(
        self,
        *,
        intermediate_tensors: Optional["IntermediateTensors"],
    ) -> Union[Tensor, "IntermediateTensors"]:
        """
        Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
        PP rank > 0.

        Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
        for the last PP rank.
        """
        ...

supports_pp `class-attribute` ¶

supports_pp: Literal[True] = True

A flag that indicates this model supports pipeline parallel.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

forward ¶

forward(
    *, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]

Accept IntermediateTensors when PP rank > 0.

Return IntermediateTensors only for the last PP rank.

Source code in vllm/model_executor/models/interfaces.py

def forward(
    self,
    *,
    intermediate_tensors: Optional["IntermediateTensors"],
) -> Union[Tensor, "IntermediateTensors"]:
    """
    Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
    PP rank > 0.

    Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
    for the last PP rank.
    """
    ...

make_empty_intermediate_tensors ¶

make_empty_intermediate_tensors(
    batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors

Called when PP rank > 0 for profiling purposes.

Source code in vllm/model_executor/models/interfaces.py

def make_empty_intermediate_tensors(
    self,
    batch_size: int,
    dtype: torch.dtype,
    device: torch.device,
) -> "IntermediateTensors":
    """Called when PP rank > 0 for profiling purposes."""
    ...

SupportsTranscription ¶

Bases: Protocol

The interface required for all models that support transcription.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsTranscription(Protocol):
    """The interface required for all models that support transcription."""

    supports_transcription: ClassVar[Literal[True]] = True

    @classmethod
    def get_decoder_prompt(cls, language: str, task_type: str,
                           prompt: str) -> str:
        """Get the decoder prompt for the ASR model."""
        ...

    @classmethod
    def validate_language(cls, language: str) -> bool:
        """Check if the model supports a specific ISO639_1 language."""
        ...

supports_transcription `class-attribute` ¶

supports_transcription: Literal[True] = True

get_decoder_prompt `classmethod` ¶

get_decoder_prompt(
    language: str, task_type: str, prompt: str
) -> str

Get the decoder prompt for the ASR model.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def get_decoder_prompt(cls, language: str, task_type: str,
                       prompt: str) -> str:
    """Get the decoder prompt for the ASR model."""
    ...

validate_language `classmethod` ¶

validate_language(language: str) -> bool

Check if the model supports a specific ISO639_1 language.

Source code in vllm/model_executor/models/interfaces.py

@classmethod
def validate_language(cls, language: str) -> bool:
    """Check if the model supports a specific ISO639_1 language."""
    ...

SupportsV0Only ¶

Bases: Protocol

Models with this interface are not compatible with V1 vLLM.

Source code in vllm/model_executor/models/interfaces.py

@runtime_checkable
class SupportsV0Only(Protocol):
    """Models with this interface are not compatible with V1 vLLM."""

    supports_v0_only: ClassVar[Literal[True]] = True

supports_v0_only `class-attribute` ¶

supports_v0_only: Literal[True] = True

VllmModelForPooling ¶

Bases: VllmModel[T], Protocol[T]

The interface required for all pooling models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForPooling(VllmModel[T], Protocol[T]):
    """The interface required for all pooling models in vLLM."""

    def pooler(
        self,
        hidden_states: T,
        pooling_metadata: "PoolingMetadata",
    ) -> "PoolerOutput":
        """Only called on TP rank 0."""
        ...

pooler ¶

pooler(
    hidden_states: T, pooling_metadata: PoolingMetadata
) -> PoolerOutput

Only called on TP rank 0.

Source code in vllm/model_executor/models/interfaces_base.py

def pooler(
    self,
    hidden_states: T,
    pooling_metadata: "PoolingMetadata",
) -> "PoolerOutput":
    """Only called on TP rank 0."""
    ...

VllmModelForTextGeneration ¶

Bases: VllmModel[T], Protocol[T]

The interface required for all generative models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py

@runtime_checkable
class VllmModelForTextGeneration(VllmModel[T], Protocol[T]):
    """The interface required for all generative models in vLLM."""

    def compute_logits(
        self,
        hidden_states: T,
        sampling_metadata: "SamplingMetadata",
    ) -> Optional[T]:
        """Return `None` if TP rank > 0."""
        ...

compute_logits ¶

compute_logits(
    hidden_states: T, sampling_metadata: SamplingMetadata
) -> Optional[T]

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py

def compute_logits(
    self,
    hidden_states: T,
    sampling_metadata: "SamplingMetadata",
) -> Optional[T]:
    """Return `None` if TP rank > 0."""
    ...

has_inner_state ¶

has_inner_state(model: object) -> TypeIs[HasInnerState]

has_inner_state(
    model: type[object],
) -> TypeIs[type[HasInnerState]]

has_inner_state(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]

Source code in vllm/model_executor/models/interfaces.py

def has_inner_state(
    model: Union[type[object], object]
) -> Union[TypeIs[type[HasInnerState]], TypeIs[HasInnerState]]:
    if isinstance(model, type):
        return isinstance(model, _HasInnerStateType)

    return isinstance(model, HasInnerState)

is_pooling_model ¶

is_pooling_model(
    model: type[object],
) -> TypeIs[type[VllmModelForPooling]]

is_pooling_model(
    model: object,
) -> TypeIs[VllmModelForPooling]

is_pooling_model(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[VllmModelForPooling]],
    TypeIs[VllmModelForPooling],
]

Source code in vllm/model_executor/models/interfaces_base.py

def is_pooling_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForPooling]], TypeIs[VllmModelForPooling]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForPooling)

    return isinstance(model, VllmModelForPooling)

is_text_generation_model ¶

is_text_generation_model(
    model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]

is_text_generation_model(
    model: object,
) -> TypeIs[VllmModelForTextGeneration]

is_text_generation_model(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[VllmModelForTextGeneration]],
    TypeIs[VllmModelForTextGeneration],
]

Source code in vllm/model_executor/models/interfaces_base.py

def is_text_generation_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForTextGeneration]],
           TypeIs[VllmModelForTextGeneration]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForTextGeneration)

    return isinstance(model, VllmModelForTextGeneration)

supports_lora ¶

supports_lora(
    model: type[object],
) -> TypeIs[type[SupportsLoRA]]

supports_lora(model: object) -> TypeIs[SupportsLoRA]

supports_lora(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_lora(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]]:
    result = _supports_lora(model)

    if not result:
        lora_attrs = (
            "packed_modules_mapping",
            "embedding_modules",
            "embedding_padding_modules",
        )
        missing_attrs = tuple(attr for attr in lora_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_lora", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_lora=True`, "
                    "but is missing LoRA-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all LoRA-specific attributes, "
                    "but does not set `supports_lora=True`.", model)

    return result

supports_multimodal ¶

supports_multimodal(
    model: type[object],
) -> TypeIs[type[SupportsMultiModal]]

supports_multimodal(
    model: object,
) -> TypeIs[SupportsMultiModal]

supports_multimodal(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsMultiModal]],
    TypeIs[SupportsMultiModal],
]

Source code in vllm/model_executor/models/interfaces.py

def supports_multimodal(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsMultiModal]], TypeIs[SupportsMultiModal]]:
    if isinstance(model, type):
        return isinstance(model, _SupportsMultiModalType)

    return isinstance(model, SupportsMultiModal)

supports_pp ¶

supports_pp(
    model: type[object],
) -> TypeIs[type[SupportsPP]]

supports_pp(model: object) -> TypeIs[SupportsPP]

supports_pp(
    model: Union[type[object], object],
) -> Union[
    bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_pp(
    model: Union[type[object], object],
) -> Union[bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]]:
    supports_attributes = _supports_pp_attributes(model)
    supports_inspect = _supports_pp_inspect(model)

    if supports_attributes and not supports_inspect:
        logger.warning(
            "The model (%s) sets `supports_pp=True`, but does not accept "
            "`intermediate_tensors` in its `forward` method", model)

    if not supports_attributes:
        pp_attrs = ("make_empty_intermediate_tensors", )
        missing_attrs = tuple(attr for attr in pp_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_pp", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_pp=True`, "
                    "but is missing PP-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all PP-specific attributes, "
                    "but does not set `supports_pp=True`.", model)

    return supports_attributes and supports_inspect

supports_transcription ¶

supports_transcription(
    model: type[object],
) -> TypeIs[type[SupportsTranscription]]

supports_transcription(
    model: object,
) -> TypeIs[SupportsTranscription]

supports_transcription(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsTranscription]],
    TypeIs[SupportsTranscription],
]

Source code in vllm/model_executor/models/interfaces.py

def supports_transcription(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsTranscription]], TypeIs[SupportsTranscription]]:
    if isinstance(model, type):
        return isinstance(model, SupportsTranscription)

    return isinstance(model, SupportsTranscription)

supports_v0_only ¶

supports_v0_only(
    model: type[object],
) -> TypeIs[type[SupportsV0Only]]

supports_v0_only(model: object) -> TypeIs[SupportsV0Only]

supports_v0_only(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]

Source code in vllm/model_executor/models/interfaces.py

def supports_v0_only(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]]:
    if isinstance(model, type):
        return isinstance(model, SupportsV0Only)

    return isinstance(model, SupportsV0Only)

vllm.model_executor.models

ModelRegistry module-attribute ¶

__all__ module-attribute ¶

HasInnerState ¶

has_inner_state class-attribute ¶

SupportsLoRA ¶

embedding_modules class-attribute ¶

embedding_padding_modules class-attribute ¶

packed_modules_mapping class-attribute ¶

supports_lora class-attribute ¶

SupportsMultiModal ¶

supports_multimodal class-attribute ¶

get_language_model ¶

get_multimodal_embeddings ¶

get_placeholder_str classmethod ¶

SupportsPP ¶

supports_pp class-attribute ¶

forward ¶

make_empty_intermediate_tensors ¶

SupportsTranscription ¶

supports_transcription class-attribute ¶

get_decoder_prompt classmethod ¶

validate_language classmethod ¶

SupportsV0Only ¶

supports_v0_only class-attribute ¶

VllmModelForPooling ¶

pooler ¶

VllmModelForTextGeneration ¶

compute_logits ¶

has_inner_state ¶

is_pooling_model ¶

is_text_generation_model ¶

supports_lora ¶

supports_multimodal ¶

supports_pp ¶

supports_transcription ¶

supports_v0_only ¶

ModelRegistry `module-attribute` ¶

all `module-attribute` ¶

has_inner_state `class-attribute` ¶

embedding_modules `class-attribute` ¶

embedding_padding_modules `class-attribute` ¶

packed_modules_mapping `class-attribute` ¶

supports_lora `class-attribute` ¶

supports_multimodal `class-attribute` ¶

get_placeholder_str `classmethod` ¶

supports_pp `class-attribute` ¶

supports_transcription `class-attribute` ¶

get_decoder_prompt `classmethod` ¶

validate_language `classmethod` ¶

supports_v0_only `class-attribute` ¶