vllm.model_executor.models
Modules:
Name | Description |
---|---|
adapters |
|
aimv2 |
|
arctic |
Inference-only Snowflake Arctic model. |
aria |
|
aya_vision |
|
baichuan |
Inference-only BaiChuan model compatible with HuggingFace weights. |
bamba |
Inference-only Bamba model. |
bart |
PyTorch BART model. |
bert |
|
bert_with_rope |
|
blip |
Minimal implementation of BlipVisionModel intended to be only used |
blip2 |
|
bloom |
Inference-only BLOOM model compatible with HuggingFace weights. |
chameleon |
|
chatglm |
Inference-only ChatGLM model compatible with THUDM weights. |
clip |
Minimal implementation of CLIPVisionModel intended to be only used |
commandr |
PyTorch Cohere model. |
config |
|
constant_size_cache |
|
dbrx |
|
deepseek |
Inference-only Deepseek model. |
deepseek_mtp |
|
deepseek_v2 |
Inference-only DeepseekV2/DeepseekV3 model. |
deepseek_vl2 |
Inference-only Deepseek-VL2 model compatible with HuggingFace weights. |
dots1 |
Inference-only dots1 model. |
eagle |
|
ernie45 |
Inference-only Erine model compatible with HuggingFace weights. |
ernie45_moe |
Inference-only ErineMoE model compatible with HuggingFace weights. |
exaone |
Inference-only Exaone model compatible with HuggingFace weights. |
fairseq2_llama |
Llama model for fairseq2 weights. |
falcon |
PyTorch Falcon model. |
falcon_h1 |
Inference-only FalconH1 model. |
florence2 |
|
fuyu |
PyTorch Fuyu model. |
gemma |
Inference-only Gemma model compatible with HuggingFace weights. |
gemma2 |
|
gemma3 |
|
gemma3_mm |
|
gemma3n |
|
glm |
Inference-only HF format GLM-4 model compatible with THUDM weights. |
glm4 |
Inference-only GLM-4-0414 model compatible with HuggingFace weights. |
glm4_1v |
Inference-only GLM-4V model compatible with HuggingFace weights. |
glm4v |
Inference-only CogAgent model compatible with THUDM weights. |
gpt2 |
Inference-only GPT-2 model compatible with HuggingFace weights. |
gpt_bigcode |
Inference-only GPTBigCode model compatible with HuggingFace weights. |
gpt_j |
Inference-only GPT-J model compatible with HuggingFace weights. |
gpt_neox |
Inference-only GPT-NeoX model compatible with HuggingFace weights. |
granite |
Inference-only IBM Granite model compatible with HuggingFace weights. |
granite_speech |
Inference-only IBM Granite speech model. |
granitemoe |
Inference-only GraniteMoe model. |
granitemoehybrid |
Inference-only GraniteMoeHybrid model. |
granitemoeshared |
Inference-only GraniteMoeShared model. |
gritlm |
|
grok1 |
Inference-only Grok1 model. |
h2ovl |
|
hunyuan_v1_moe |
Inference-only HunYuan model compatible with HuggingFace weights. |
idefics2_vision_model |
PyTorch Idefics2 model. |
idefics3 |
Inference-only Idefics3 model compatible with HuggingFace weights. |
interfaces |
|
interfaces_base |
|
intern_vit |
|
internlm2 |
|
internlm2_ve |
|
internvl |
|
jais |
Inference-only Jais model compatible with HuggingFace weights. |
jamba |
Inference-only Jamba model. |
keye |
|
kimi_vl |
|
llama |
Inference-only LLaMA model compatible with HuggingFace weights. |
llama4 |
Inference-only LLaMA model compatible with HuggingFace weights. |
llama_eagle |
|
llama_eagle3 |
|
llava |
|
llava_next |
|
llava_next_video |
|
llava_onevision |
|
mamba |
PyTorch MAMBA model. |
mamba2 |
PyTorch MAMBA2 model. |
mamba_cache |
|
medusa |
|
mimo |
Inference-only MiMo model compatible with HuggingFace weights. |
mimo_mtp |
Inference-only MiMo-MTP model. |
minicpm |
Inference-only MiniCPM model compatible with HuggingFace weights. |
minicpm3 |
Inference-only MiniCPM3 model compatible with HuggingFace weights. |
minicpm_eagle |
Inference-only EagleMiniCPM model compatible with HuggingFace weights. |
minicpmo |
Inference-only MiniCPM-O model compatible with HuggingFace weights. |
minicpmv |
Inference-only MiniCPM-V model compatible with HuggingFace weights. |
minimax_cache |
|
minimax_text_01 |
Inference-only MiniMaxText01 model. |
minimax_vl_01 |
|
mistral3 |
|
mixtral |
Inference-only Mixtral model. |
mixtral_quant |
Inference-only Mixtral model. |
mllama |
PyTorch Mllama model. |
mllama4 |
|
mlp_speculator |
|
modernbert |
|
module_mapping |
|
molmo |
|
moonvit |
|
mpt |
|
nemotron |
Inference-only Nemotron model compatible with HuggingFace weights. |
nemotron_h |
Inference-only NemotronH model. |
nemotron_nas |
Inference-only deci model compatible with HuggingFace weights. |
nvlm_d |
|
olmo |
Inference-only OLMo model compatible with HuggingFace weights. |
olmo2 |
Inference-only OLMo2 model compatible with HuggingFace weights. |
olmoe |
Inference-only OLMoE model compatible with HuggingFace weights. |
opt |
Inference-only OPT model compatible with HuggingFace weights. |
orion |
Inference-only Orion-14B model compatible with HuggingFace weights. |
ovis |
PyTorch Ovis model. |
paligemma |
|
persimmon |
Inference-only persimmon model compatible with HuggingFace weights. |
phi |
Inference-only Phi-1.5 model compatible with HuggingFace weights. |
phi3 |
Inference-only Phi3 model code inherit from Llama.py |
phi3_small |
|
phi3v |
|
phi4mm |
|
phi4mm_audio |
|
phi4mm_utils |
|
phimoe |
Inference-only PhiMoE model. |
pixtral |
|
plamo2 |
Inference-only PLaMo2 model. |
prithvi_geospatial_mae |
Inference-only IBM/NASA Prithvi Geospatial model. |
qwen |
Inference-only QWen model compatible with HuggingFace weights. |
qwen2 |
Inference-only Qwen2 model compatible with HuggingFace weights. |
qwen2_5_omni_thinker |
Inference-only Qwen2.5-Omni model (thinker part). |
qwen2_5_vl |
Inference-only Qwen2.5-VL model compatible with HuggingFace weights. |
qwen2_audio |
Inference-only Qwen2-Audio model compatible with HuggingFace weights. |
qwen2_moe |
Inference-only Qwen2MoE model compatible with HuggingFace weights. |
qwen2_rm |
Inference-only Qwen2-RM model compatible with HuggingFace weights. |
qwen2_vl |
Inference-only Qwen2-VL model compatible with HuggingFace weights. |
qwen3 |
Inference-only Qwen3 model compatible with HuggingFace weights. |
qwen3_moe |
Inference-only Qwen3MoE model compatible with HuggingFace weights. |
qwen_vl |
Inference-only Qwen-VL model compatible with HuggingFace weights. |
registry |
Whenever you add an architecture to this page, please also update |
roberta |
|
siglip |
Implementation of SiglipVisionModel intended to be only used |
skyworkr1v |
|
smolvlm |
|
solar |
Inference-only Solar model compatible with HuggingFace weights. |
stablelm |
Inference-only StabeLM (https://github.com/Stability-AI/StableLM) |
starcoder2 |
PyTorch Starcoder2 model. |
tarsier |
|
telechat2 |
|
teleflm |
|
transformers |
Wrapper around |
ultravox |
PyTorch Ultravox model. |
utils |
|
vision |
|
whisper |
|
zamba2 |
PyTorch Zamba2 model implementation for vLLM. |
ModelRegistry
module-attribute
¶
ModelRegistry = _ModelRegistry(
{
model_arch: _LazyRegisteredModel(
module_name=f"vllm.model_executor.models.{mod_relname}",
class_name=cls_name,
)
for (model_arch, (mod_relname, cls_name)) in items()
}
)
__all__
module-attribute
¶
__all__ = [
"ModelRegistry",
"VllmModelForPooling",
"is_pooling_model",
"VllmModelForTextGeneration",
"is_text_generation_model",
"HasInnerState",
"has_inner_state",
"SupportsLoRA",
"supports_lora",
"SupportsMultiModal",
"supports_multimodal",
"SupportsPP",
"supports_pp",
"SupportsTranscription",
"supports_transcription",
"SupportsV0Only",
"supports_v0_only",
]
HasInnerState
¶
Bases: Protocol
The interface required for all models that has inner state.
Source code in vllm/model_executor/models/interfaces.py
SupportsLoRA
¶
Bases: Protocol
The interface required for all models that support LoRA.
Source code in vllm/model_executor/models/interfaces.py
SupportsMultiModal
¶
Bases: Protocol
The interface required for all multi-modal models.
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal
class-attribute
¶
supports_multimodal: Literal[True] = True
A flag that indicates this model supports multi-modal inputs.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
get_language_model
¶
get_language_model() -> Module
Returns the underlying language model used for text generation.
This is typically the torch.nn.Module
instance responsible for
processing the merged multimodal embeddings and producing hidden states
Returns:
Type | Description |
---|---|
Module
|
torch.nn.Module: The core language model component. |
Source code in vllm/model_executor/models/interfaces.py
get_multimodal_embeddings
¶
get_multimodal_embeddings(
**kwargs: object,
) -> MultiModalEmbeddings
Returns multimodal embeddings generated from multimodal kwargs to be merged with text embeddings.
Note
The returned multimodal embeddings must be in the same order as the appearances of their corresponding multimodal data item in the input prompt.
Source code in vllm/model_executor/models/interfaces.py
get_placeholder_str
classmethod
¶
Get the placeholder text for the i
th modality
item in the prompt.
SupportsPP
¶
Bases: Protocol
The interface required for all models that support pipeline parallel.
Source code in vllm/model_executor/models/interfaces.py
supports_pp
class-attribute
¶
supports_pp: Literal[True] = True
A flag that indicates this model supports pipeline parallel.
Note
There is no need to redefine this flag if this class is in the MRO of your model class.
forward
¶
forward(
*, intermediate_tensors: Optional[IntermediateTensors]
) -> Union[Tensor, IntermediateTensors]
Accept IntermediateTensors
when
PP rank > 0.
Return IntermediateTensors
only
for the last PP rank.
Source code in vllm/model_executor/models/interfaces.py
make_empty_intermediate_tensors
¶
make_empty_intermediate_tensors(
batch_size: int, dtype: dtype, device: device
) -> IntermediateTensors
Called when PP rank > 0 for profiling purposes.
SupportsTranscription
¶
Bases: Protocol
The interface required for all models that support transcription.
Source code in vllm/model_executor/models/interfaces.py
SupportsV0Only
¶
Bases: Protocol
Models with this interface are not compatible with V1 vLLM.
Source code in vllm/model_executor/models/interfaces.py
VllmModelForPooling
¶
Bases: VllmModel[T]
, Protocol[T]
The interface required for all pooling models in vLLM.
Source code in vllm/model_executor/models/interfaces_base.py
pooler
¶
pooler(
hidden_states: T, pooling_metadata: PoolingMetadata
) -> PoolerOutput
VllmModelForTextGeneration
¶
has_inner_state
¶
has_inner_state(model: object) -> TypeIs[HasInnerState]
has_inner_state(
model: type[object],
) -> TypeIs[type[HasInnerState]]
has_inner_state(
model: Union[type[object], object],
) -> Union[
TypeIs[type[HasInnerState]], TypeIs[HasInnerState]
]
Source code in vllm/model_executor/models/interfaces.py
is_pooling_model
¶
is_pooling_model(
model: type[object],
) -> TypeIs[type[VllmModelForPooling]]
is_pooling_model(
model: object,
) -> TypeIs[VllmModelForPooling]
is_pooling_model(
model: Union[type[object], object],
) -> Union[
TypeIs[type[VllmModelForPooling]],
TypeIs[VllmModelForPooling],
]
Source code in vllm/model_executor/models/interfaces_base.py
is_text_generation_model
¶
is_text_generation_model(
model: type[object],
) -> TypeIs[type[VllmModelForTextGeneration]]
is_text_generation_model(
model: object,
) -> TypeIs[VllmModelForTextGeneration]
is_text_generation_model(
model: Union[type[object], object],
) -> Union[
TypeIs[type[VllmModelForTextGeneration]],
TypeIs[VllmModelForTextGeneration],
]
Source code in vllm/model_executor/models/interfaces_base.py
supports_lora
¶
supports_lora(
model: type[object],
) -> TypeIs[type[SupportsLoRA]]
supports_lora(model: object) -> TypeIs[SupportsLoRA]
supports_lora(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsLoRA]], TypeIs[SupportsLoRA]
]
Source code in vllm/model_executor/models/interfaces.py
supports_multimodal
¶
supports_multimodal(
model: type[object],
) -> TypeIs[type[SupportsMultiModal]]
supports_multimodal(
model: object,
) -> TypeIs[SupportsMultiModal]
supports_multimodal(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsMultiModal]],
TypeIs[SupportsMultiModal],
]
Source code in vllm/model_executor/models/interfaces.py
supports_pp
¶
supports_pp(
model: type[object],
) -> TypeIs[type[SupportsPP]]
supports_pp(model: object) -> TypeIs[SupportsPP]
supports_pp(
model: Union[type[object], object],
) -> Union[
bool, TypeIs[type[SupportsPP]], TypeIs[SupportsPP]
]
Source code in vllm/model_executor/models/interfaces.py
supports_transcription
¶
supports_transcription(
model: type[object],
) -> TypeIs[type[SupportsTranscription]]
supports_transcription(
model: object,
) -> TypeIs[SupportsTranscription]
supports_transcription(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsTranscription]],
TypeIs[SupportsTranscription],
]
Source code in vllm/model_executor/models/interfaces.py
supports_v0_only
¶
supports_v0_only(
model: type[object],
) -> TypeIs[type[SupportsV0Only]]
supports_v0_only(model: object) -> TypeIs[SupportsV0Only]
supports_v0_only(
model: Union[type[object], object],
) -> Union[
TypeIs[type[SupportsV0Only]], TypeIs[SupportsV0Only]
]