vllm.model_executor.models.vision
VisionEncoderInfo
¶
Source code in vllm/model_executor/models/vision.py
VisionLanguageConfig
¶
get_vision_encoder_info
¶
get_vision_encoder_info(
hf_config: VisionLanguageConfig,
) -> VisionEncoderInfo
Source code in vllm/model_executor/models/vision.py
get_vit_attn_backend
¶
Get the available attention backend for Vision Transformer.
Source code in vllm/model_executor/models/vision.py
resolve_visual_encoder_outputs
¶
resolve_visual_encoder_outputs(
encoder_outputs: Union[Tensor, list[Tensor]],
feature_sample_layers: Optional[list[int]],
post_layer_norm: Optional[LayerNorm],
max_possible_layers: int,
) -> Tensor
Given the outputs a visual encoder module that may correspond to the output of the last layer, or a list of hidden states to be stacked, handle post normalization and resolve it into a single output tensor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoder_outputs
|
Union[Tensor, list[Tensor]]
|
Output of encoder's last layer or all hidden states. |
required |
feature_sample_layers
|
Optional[list[int]]
|
Optional layer indices to grab from the encoder outputs; if provided, encoder outputs must be a list. |
required |
post_layer_norm
|
Optional[LayerNorm]
|
Post norm to apply to the output of the encoder. |
required |
max_possible_layers
|
int
|
Total layers in the fully loaded visual encoder. |
required |