Skip to content

vllm.model_executor.models

Modules:

Name Description
adapters
aimv2
arctic

Inference-only Snowflake Arctic model.

aria
aya_vision
baichuan

Inference-only BaiChuan model compatible with HuggingFace weights.

bamba

Inference-only Bamba model.

bart

PyTorch BART model.

bert
bert_with_rope
blip

Minimal implementation of BlipVisionModel intended to be only used

blip2
bloom

Inference-only BLOOM model compatible with HuggingFace weights.

chameleon
chatglm

Inference-only ChatGLM model compatible with THUDM weights.

clip

Minimal implementation of CLIPVisionModel intended to be only used

commandr

PyTorch Cohere model.

config
constant_size_cache
dbrx
deepseek

Inference-only Deepseek model.

deepseek_mtp
deepseek_v2

Inference-only DeepseekV2/DeepseekV3 model.

deepseek_vl2

Inference-only Deepseek-VL2 model compatible with HuggingFace weights.

dots1

Inference-only dots1 model.

eagle
ernie45

Inference-only Erine model compatible with HuggingFace weights.

ernie45_moe

Inference-only ErineMoE model compatible with HuggingFace weights.

exaone

Inference-only Exaone model compatible with HuggingFace weights.

fairseq2_llama

Llama model for fairseq2 weights.

falcon

PyTorch Falcon model.

falcon_h1

Inference-only FalconH1 model.

florence2
fuyu

PyTorch Fuyu model.

gemma

Inference-only Gemma model compatible with HuggingFace weights.

gemma2
gemma3
gemma3_mm
gemma3n
glm

Inference-only HF format GLM-4 model compatible with THUDM weights.

glm4

Inference-only GLM-4-0414 model compatible with HuggingFace weights.

glm4_1v

Inference-only GLM-4V model compatible with HuggingFace weights.

glm4v

Inference-only CogAgent model compatible with THUDM weights.

gpt2

Inference-only GPT-2 model compatible with HuggingFace weights.

gpt_bigcode

Inference-only GPTBigCode model compatible with HuggingFace weights.

gpt_j

Inference-only GPT-J model compatible with HuggingFace weights.

gpt_neox

Inference-only GPT-NeoX model compatible with HuggingFace weights.

granite

Inference-only IBM Granite model compatible with HuggingFace weights.

granite_speech

Inference-only IBM Granite speech model.

granitemoe

Inference-only GraniteMoe model.

granitemoehybrid

Inference-only GraniteMoeHybrid model.

granitemoeshared

Inference-only GraniteMoeShared model.

gritlm
grok1

Inference-only Grok1 model.

h2ovl
hunyuan_v1_moe

Inference-only HunYuan model compatible with HuggingFace weights.

idefics2_vision_model

PyTorch Idefics2 model.

idefics3

Inference-only Idefics3 model compatible with HuggingFace weights.

interfaces
interfaces_base
intern_vit
internlm2
internlm2_ve
internvl
jais

Inference-only Jais model compatible with HuggingFace weights.

jamba

Inference-only Jamba model.

keye
kimi_vl
llama

Inference-only LLaMA model compatible with HuggingFace weights.

llama4

Inference-only LLaMA model compatible with HuggingFace weights.

llama_eagle
llama_eagle3
llava
llava_next
llava_next_video
llava_onevision
mamba

PyTorch MAMBA model.

mamba2

PyTorch MAMBA2 model.

mamba_cache
medusa
mimo

Inference-only MiMo model compatible with HuggingFace weights.

mimo_mtp

Inference-only MiMo-MTP model.

minicpm

Inference-only MiniCPM model compatible with HuggingFace weights.

minicpm3

Inference-only MiniCPM3 model compatible with HuggingFace weights.

minicpm_eagle

Inference-only EagleMiniCPM model compatible with HuggingFace weights.

minicpmo

Inference-only MiniCPM-O model compatible with HuggingFace weights.

minicpmv

Inference-only MiniCPM-V model compatible with HuggingFace weights.

minimax_cache
minimax_text_01

Inference-only MiniMaxText01 model.

minimax_vl_01
mistral3
mixtral

Inference-only Mixtral model.

mixtral_quant

Inference-only Mixtral model.

mllama

PyTorch Mllama model.

mllama4
mlp_speculator
modernbert
module_mapping
molmo
moonvit
mpt
nemotron

Inference-only Nemotron model compatible with HuggingFace weights.

nemotron_h

Inference-only NemotronH model.

nemotron_nas

Inference-only deci model compatible with HuggingFace weights.

nvlm_d
olmo

Inference-only OLMo model compatible with HuggingFace weights.

olmo2

Inference-only OLMo2 model compatible with HuggingFace weights.

olmoe

Inference-only OLMoE model compatible with HuggingFace weights.

opt

Inference-only OPT model compatible with HuggingFace weights.

orion

Inference-only Orion-14B model compatible with HuggingFace weights.

ovis

PyTorch Ovis model.

paligemma
persimmon

Inference-only persimmon model compatible with HuggingFace weights.

phi

Inference-only Phi-1.5 model compatible with HuggingFace weights.

phi3

Inference-only Phi3 model code inherit from Llama.py

phi3_small
phi3v
phi4mm
phi4mm_audio
phi4mm_utils
phimoe

Inference-only PhiMoE model.

pixtral
plamo2

Inference-only PLaMo2 model.

prithvi_geospatial_mae

Inference-only IBM/NASA Prithvi Geospatial model.

qwen

Inference-only QWen model compatible with HuggingFace weights.

qwen2

Inference-only Qwen2 model compatible with HuggingFace weights.

qwen2_5_omni_thinker

Inference-only Qwen2.5-Omni model (thinker part).

qwen2_5_vl

Inference-only Qwen2.5-VL model compatible with HuggingFace weights.

qwen2_audio

Inference-only Qwen2-Audio model compatible with HuggingFace weights.

qwen2_moe

Inference-only Qwen2MoE model compatible with HuggingFace weights.

qwen2_rm

Inference-only Qwen2-RM model compatible with HuggingFace weights.

qwen2_vl

Inference-only Qwen2-VL model compatible with HuggingFace weights.

qwen3

Inference-only Qwen3 model compatible with HuggingFace weights.

qwen3_moe

Inference-only Qwen3MoE model compatible with HuggingFace weights.

qwen_vl

Inference-only Qwen-VL model compatible with HuggingFace weights.

registry

Whenever you add an architecture to this page, please also update

roberta
siglip

Implementation of SiglipVisionModel intended to be only used

skyworkr1v
smolvlm
solar

Inference-only Solar model compatible with HuggingFace weights.

stablelm

Inference-only StabeLM (https://github.com/Stability-AI/StableLM)

starcoder2

PyTorch Starcoder2 model.

tarsier
telechat2
teleflm
transformers

Wrapper around transformers models

ultravox

PyTorch Ultravox model.

utils
vision
whisper
zamba2

PyTorch Zamba2 model implementation for vLLM.

ModelRegistry module-attribute

ModelRegistry = _ModelRegistry(
    {
        model_arch: _LazyRegisteredModel(
            module_name=f"vllm.model_executor.models.{mod_relname}",
            class_name=cls_name,
        )
        for (model_arch, (mod_relname, cls_name)) in items()
    }
)

__all__ module-attribute

__all__ = [
    "ModelRegistry",
    "VllmModelForPooling",
    "is_pooling_model",
    "VllmModelForTextGeneration",
    "is_text_generation_model",
    "HasInnerState",
    "has_inner_state",
    "SupportsLoRA",
    "supports_lora",
    "SupportsMultiModal",
    "supports_multimodal",
    "SupportsPP",
    "supports_pp",
    "SupportsTranscription",
    "supports_transcription",
    "SupportsV0Only",
    "supports_v0_only",
]

HasInnerState

Bases: Protocol

The interface required for all models that has inner state.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class HasInnerState(Protocol):
    """The interface required for all models that has inner state."""

    has_inner_state: ClassVar[Literal[True]] = True
    """
        A flag that indicates this model has inner state.
        Models that has inner state usually need access to the scheduler_config
        for max_num_seqs, etc. True for e.g. both Mamba and Jamba.
    """

has_inner_state class-attribute

has_inner_state: Literal[True] = True

A flag that indicates this model has inner state. Models that has inner state usually need access to the scheduler_config for max_num_seqs, etc. True for e.g. both Mamba and Jamba.

SupportsLoRA

Bases: Protocol

The interface required for all models that support LoRA.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsLoRA(Protocol):
    """The interface required for all models that support LoRA."""

    supports_lora: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports LoRA.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """
    # The `embedding_module` and `embedding_padding_modules`
    # are empty by default.
    embedding_modules: ClassVar[dict[str, str]] = {}
    embedding_padding_modules: ClassVar[list[str]] = []
    packed_modules_mapping: ClassVar[dict[str, list[str]]] = {}

embedding_modules class-attribute

embedding_modules: dict[str, str] = {}

embedding_padding_modules class-attribute

embedding_padding_modules: list[str] = []

packed_modules_mapping class-attribute

packed_modules_mapping: dict[str, list[str]] = {}

supports_lora class-attribute

supports_lora: Literal[True] = True

A flag that indicates this model supports LoRA.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

SupportsMultiModal

Bases: Protocol

The interface required for all multi-modal models.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsMultiModal(Protocol):
    """The interface required for all multi-modal models."""

    supports_multimodal: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports multi-modal inputs.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    @classmethod
    def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
        """
        Get the placeholder text for the `i`th `modality` item in the prompt.
        """
        ...

    def get_multimodal_embeddings(self,
                                  **kwargs: object) -> MultiModalEmbeddings:
        """
        Returns multimodal embeddings generated from multimodal kwargs 
        to be merged with text embeddings.

        Note:
            The returned multimodal embeddings must be in the same order as
            the appearances of their corresponding multimodal data item in the
            input prompt.
        """
        ...

    def get_language_model(self) -> torch.nn.Module:
        """
        Returns the underlying language model used for text generation.

        This is typically the `torch.nn.Module` instance responsible for 
        processing the merged multimodal embeddings and producing hidden states

        Returns:
            torch.nn.Module: The core language model component.
        """
        ...

    # Only for models that support v0 chunked prefill
    # TODO(ywang96): Remove this overload once v0 is deprecated
    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
        attn_metadata: Optional["AttentionMetadata"] = None,
    ) -> Tensor:
        ...

    @overload
    def get_input_embeddings(
        self,
        input_ids: Tensor,
        multimodal_embeddings: Optional[MultiModalEmbeddings] = None,
    ) -> Tensor:
        """
        Returns the input embeddings merged from the text embeddings from 
        input_ids and the multimodal embeddings generated from multimodal 
        kwargs.
        """
        ...

supports_multimodal class-attribute

supports_multimodal: Literal[True] = True

A flag that indicates this model supports multi-modal inputs.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

get_language_model

get_language_model() -> Module

Returns the underlying language model used for text generation.

This is typically the torch.nn.Module instance responsible for processing the merged multimodal embeddings and producing hidden states

Returns:

Type Description
Module

torch.nn.Module: The core language model component.

Source code in vllm/model_executor/models/interfaces.py
def get_language_model(self) -> torch.nn.Module:
    """
    Returns the underlying language model used for text generation.

    This is typically the `torch.nn.Module` instance responsible for 
    processing the merged multimodal embeddings and producing hidden states

    Returns:
        torch.nn.Module: The core language model component.
    """
    ...

get_multimodal_embeddings

get_multimodal_embeddings(
    **kwargs: object,
) -> MultiModalEmbeddings

Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.

Note

The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.

Source code in vllm/model_executor/models/interfaces.py
def get_multimodal_embeddings(self,
                              **kwargs: object) -> MultiModalEmbeddings:
    """
    Returns multimodal embeddings generated from multimodal kwargs 
    to be merged with text embeddings.

    Note:
        The returned multimodal embeddings must be in the same order as
        the appearances of their corresponding multimodal data item in the
        input prompt.
    """
    ...

get_placeholder_str classmethod

get_placeholder_str(modality: str, i: int) -> Optional[str]

Get the placeholder text for the ith modality item in the prompt.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_placeholder_str(cls, modality: str, i: int) -> Optional[str]:
    """
    Get the placeholder text for the `i`th `modality` item in the prompt.
    """
    ...

SupportsPP

Bases: Protocol

The interface required for all models that support pipeline parallel.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsPP(Protocol):
    """The interface required for all models that support pipeline parallel."""

    supports_pp: ClassVar[Literal[True]] = True
    """
    A flag that indicates this model supports pipeline parallel.

    Note:
        There is no need to redefine this flag if this class is in the
        MRO of your model class.
    """

    def make_empty_intermediate_tensors(
        self,
        batch_size: int,
        dtype: torch.dtype,
        device: torch.device,
    ) -> "IntermediateTensors":
        """Called when PP rank > 0 for profiling purposes."""
        ...

    def forward(
        self,
        *,
        intermediate_tensors: Optional["IntermediateTensors"],
    ) -> Union[Tensor, "IntermediateTensors"]:
        """
        Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
        PP rank > 0.

        Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
        for the last PP rank.
        """
        ...

supports_pp class-attribute

supports_pp: Literal[True] = True

A flag that indicates this model supports pipeline parallel.

Note

There is no need to redefine this flag if this class is in the MRO of your model class.

forward

forward(
    *, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]

Accept IntermediateTensors when PP rank > 0.

Return IntermediateTensors only for the last PP rank.

Source code in vllm/model_executor/models/interfaces.py
def forward(
    self,
    *,
    intermediate_tensors: Optional["IntermediateTensors"],
) -> Union[Tensor, "IntermediateTensors"]:
    """
    Accept [`IntermediateTensors`][vllm.sequence.IntermediateTensors] when
    PP rank > 0.

    Return [`IntermediateTensors`][vllm.sequence.IntermediateTensors] only
    for the last PP rank.
    """
    ...

make_empty_intermediate_tensors

make_empty_intermediate_tensors(
    batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors

Called when PP rank > 0 for profiling purposes.

Source code in vllm/model_executor/models/interfaces.py
def make_empty_intermediate_tensors(
    self,
    batch_size: int,
    dtype: torch.dtype,
    device: torch.device,
) -> "IntermediateTensors":
    """Called when PP rank > 0 for profiling purposes."""
    ...

SupportsTranscription

Bases: Protocol

The interface required for all models that support transcription.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsTranscription(Protocol):
    """The interface required for all models that support transcription."""

    supports_transcription: ClassVar[Literal[True]] = True

    @classmethod
    def get_decoder_prompt(cls, language: str, task_type: str,
                           prompt: str) -> str:
        """Get the decoder prompt for the ASR model."""
        ...

    @classmethod
    def validate_language(cls, language: str) -> bool:
        """Check if the model supports a specific ISO639_1 language."""
        ...

supports_transcription class-attribute

supports_transcription: Literal[True] = True

get_decoder_prompt classmethod

get_decoder_prompt(
    language: str, task_type: str, prompt: str
) -> str

Get the decoder prompt for the ASR model.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def get_decoder_prompt(cls, language: str, task_type: str,
                       prompt: str) -> str:
    """Get the decoder prompt for the ASR model."""
    ...

validate_language classmethod

validate_language(language: str) -> bool

Check if the model supports a specific ISO639_1 language.

Source code in vllm/model_executor/models/interfaces.py
@classmethod
def validate_language(cls, language: str) -> bool:
    """Check if the model supports a specific ISO639_1 language."""
    ...

SupportsV0Only

Bases: Protocol

Models with this interface are not compatible with V1 vLLM.

Source code in vllm/model_executor/models/interfaces.py
@runtime_checkable
class SupportsV0Only(Protocol):
    """Models with this interface are not compatible with V1 vLLM."""

    supports_v0_only: ClassVar[Literal[True]] = True

supports_v0_only class-attribute

supports_v0_only: Literal[True] = True

VllmModelForPooling

Bases: VllmModel[T], Protocol[T]

The interface required for all pooling models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py
@runtime_checkable
class VllmModelForPooling(VllmModel[T], Protocol[T]):
    """The interface required for all pooling models in vLLM."""

    def pooler(
        self,
        hidden_states: T,
        pooling_metadata: "PoolingMetadata",
    ) -> "PoolerOutput":
        """Only called on TP rank 0."""
        ...

pooler

pooler(
    hidden_states: T, pooling_metadata: PoolingMetadata
) -> PoolerOutput

Only called on TP rank 0.

Source code in vllm/model_executor/models/interfaces_base.py
def pooler(
    self,
    hidden_states: T,
    pooling_metadata: "PoolingMetadata",
) -> "PoolerOutput":
    """Only called on TP rank 0."""
    ...

VllmModelForTextGeneration

Bases: VllmModel[T], Protocol[T]

The interface required for all generative models in vLLM.

Source code in vllm/model_executor/models/interfaces_base.py
@runtime_checkable
class VllmModelForTextGeneration(VllmModel[T], Protocol[T]):
    """The interface required for all generative models in vLLM."""

    def compute_logits(
        self,
        hidden_states: T,
        sampling_metadata: "SamplingMetadata",
    ) -> Optional[T]:
        """Return `None` if TP rank > 0."""
        ...

compute_logits

compute_logits(
    hidden_states: T, sampling_metadata: SamplingMetadata
) -> Optional[T]

Return None if TP rank > 0.

Source code in vllm/model_executor/models/interfaces_base.py
def compute_logits(
    self,
    hidden_states: T,
    sampling_metadata: "SamplingMetadata",
) -> Optional[T]:
    """Return `None` if TP rank > 0."""
    ...

has_inner_state

has_inner_state(model: object) -> TypeIs[HasInnerState]
has_inner_state(
    model: type[object],
) -> TypeIs[type[HasInnerState]]
has_inner_state(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]
Source code in vllm/model_executor/models/interfaces.py
def has_inner_state(
    model: Union[type[object], object]
) -> Union[TypeIs[type[HasInnerState]], TypeIs[HasInnerState]]:
    if isinstance(model, type):
        return isinstance(model, _HasInnerStateType)

    return isinstance(model, HasInnerState)

is_pooling_model

is_pooling_model(
    model: type[object],
) -> TypeIs[type[VllmModelForPooling]]
is_pooling_model(
    model: object,
) -> TypeIs[VllmModelForPooling]
Source code in vllm/model_executor/models/interfaces_base.py
def is_pooling_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForPooling]], TypeIs[VllmModelForPooling]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForPooling)

    return isinstance(model, VllmModelForPooling)

is_text_generation_model

is_text_generation_model(
    model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]
is_text_generation_model(
    model: object,
) -> TypeIs[VllmModelForTextGeneration]
Source code in vllm/model_executor/models/interfaces_base.py
def is_text_generation_model(
    model: Union[type[object], object],
) -> Union[TypeIs[type[VllmModelForTextGeneration]],
           TypeIs[VllmModelForTextGeneration]]:
    if not is_vllm_model(model):
        return False

    if isinstance(model, type):
        return isinstance(model, VllmModelForTextGeneration)

    return isinstance(model, VllmModelForTextGeneration)

supports_lora

supports_lora(
    model: type[object],
) -> TypeIs[type[SupportsLoRA]]
supports_lora(model: object) -> TypeIs[SupportsLoRA]
supports_lora(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_lora(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]]:
    result = _supports_lora(model)

    if not result:
        lora_attrs = (
            "packed_modules_mapping",
            "embedding_modules",
            "embedding_padding_modules",
        )
        missing_attrs = tuple(attr for attr in lora_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_lora", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_lora=True`, "
                    "but is missing LoRA-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all LoRA-specific attributes, "
                    "but does not set `supports_lora=True`.", model)

    return result

supports_multimodal

supports_multimodal(
    model: type[object],
) -> TypeIs[type[SupportsMultiModal]]
supports_multimodal(
    model: object,
) -> TypeIs[SupportsMultiModal]
supports_multimodal(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsMultiModal]],
    TypeIs[SupportsMultiModal],
]
Source code in vllm/model_executor/models/interfaces.py
def supports_multimodal(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsMultiModal]], TypeIs[SupportsMultiModal]]:
    if isinstance(model, type):
        return isinstance(model, _SupportsMultiModalType)

    return isinstance(model, SupportsMultiModal)

supports_pp

supports_pp(
    model: type[object],
) -> TypeIs[type[SupportsPP]]
supports_pp(model: object) -> TypeIs[SupportsPP]
supports_pp(
    model: Union[type[object], object],
) -> Union[
    bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_pp(
    model: Union[type[object], object],
) -> Union[bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]]:
    supports_attributes = _supports_pp_attributes(model)
    supports_inspect = _supports_pp_inspect(model)

    if supports_attributes and not supports_inspect:
        logger.warning(
            "The model (%s) sets `supports_pp=True`, but does not accept "
            "`intermediate_tensors` in its `forward` method", model)

    if not supports_attributes:
        pp_attrs = ("make_empty_intermediate_tensors", )
        missing_attrs = tuple(attr for attr in pp_attrs
                              if not hasattr(model, attr))

        if getattr(model, "supports_pp", False):
            if missing_attrs:
                logger.warning(
                    "The model (%s) sets `supports_pp=True`, "
                    "but is missing PP-specific attributes: %s",
                    model,
                    missing_attrs,
                )
        else:
            if not missing_attrs:
                logger.warning(
                    "The model (%s) contains all PP-specific attributes, "
                    "but does not set `supports_pp=True`.", model)

    return supports_attributes and supports_inspect

supports_transcription

supports_transcription(
    model: type[object],
) -> TypeIs[type[SupportsTranscription]]
supports_transcription(
    model: object,
) -> TypeIs[SupportsTranscription]
supports_transcription(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsTranscription]],
    TypeIs[SupportsTranscription],
]
Source code in vllm/model_executor/models/interfaces.py
def supports_transcription(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsTranscription]], TypeIs[SupportsTranscription]]:
    if isinstance(model, type):
        return isinstance(model, SupportsTranscription)

    return isinstance(model, SupportsTranscription)

supports_v0_only

supports_v0_only(
    model: type[object],
) -> TypeIs[type[SupportsV0Only]]
supports_v0_only(model: object) -> TypeIs[SupportsV0Only]
supports_v0_only(
    model: Union[type[object], object],
) -> Union[
    TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]
Source code in vllm/model_executor/models/interfaces.py
def supports_v0_only(
    model: Union[type[object], object],
) -> Union[TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]]:
    if isinstance(model, type):
        return isinstance(model, SupportsV0Only)

    return isinstance(model, SupportsV0Only)