vllm.model_executor.models.clip
Minimal implementation of CLIPVisionModel intended to be only used within a vision language model.
CLIPAttention
¶
Bases: Module
Multi-headed attention from 'Attention Is All You Need' paper
Source code in vllm/model_executor/models/clip.py
out_proj
instance-attribute
¶
out_proj = RowParallelLinear(
input_size=embed_dim,
output_size=embed_dim,
quant_config=quant_config,
prefix=f"{prefix}.out_proj",
)
qkv_proj
instance-attribute
¶
qkv_proj = QKVParallelLinear(
hidden_size=embed_dim,
head_size=head_dim,
total_num_heads=num_heads,
quant_config=quant_config,
prefix=f"{prefix}.qkv_proj",
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/clip.py
forward
¶
forward(hidden_states: Tensor)
Input shape: Batch x Time x Channel
Source code in vllm/model_executor/models/clip.py
CLIPEncoder
¶
Bases: Module
Transformer encoder consisting of config.num_hidden_layers
self
attention layers. Each layer is a [CLIPEncoderLayer
].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
CLIPVisionConfig
|
CLIPConfig |
required |
Source code in vllm/model_executor/models/clip.py
layers
instance-attribute
¶
layers = ModuleList(
[
CLIPEncoderLayer(
config=config,
quant_config=quant_config,
prefix=f"{prefix}.layers.{layer_idx}",
)
for layer_idx in range(num_hidden_layers)
]
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
num_hidden_layers_override: Optional[int] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward
¶
Source code in vllm/model_executor/models/clip.py
CLIPEncoderInfo
¶
Bases: VisionEncoderInfo[CLIPVisionConfig]
Source code in vllm/model_executor/models/clip.py
CLIPEncoderLayer
¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
mlp
instance-attribute
¶
mlp = CLIPMLP(
config,
quant_config=quant_config,
prefix=f"{prefix}.mlp",
)
self_attn
instance-attribute
¶
self_attn = CLIPAttention(
config,
quant_config=quant_config,
prefix=f"{prefix}.self_attn",
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward
¶
Source code in vllm/model_executor/models/clip.py
CLIPMLP
¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
fc1
instance-attribute
¶
fc1 = ColumnParallelLinear(
hidden_size,
intermediate_size,
bias=True,
quant_config=quant_config,
prefix=f"{prefix}.fc1",
)
fc2
instance-attribute
¶
fc2 = RowParallelLinear(
intermediate_size,
hidden_size,
bias=True,
quant_config=quant_config,
prefix=f"{prefix}.fc2",
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward
¶
CLIPVisionEmbeddings
¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
patch_embedding
instance-attribute
¶
patch_embedding = Conv2d(
in_channels=num_channels,
out_channels=embed_dim,
kernel_size=patch_size,
stride=patch_size,
bias=False,
)
__init__
¶
Source code in vllm/model_executor/models/clip.py
forward
¶
Source code in vllm/model_executor/models/clip.py
CLIPVisionModel
¶
Bases: Module
, SupportsQuant
Source code in vllm/model_executor/models/clip.py
packed_modules_mapping
class-attribute
instance-attribute
¶
vision_model
instance-attribute
¶
vision_model = CLIPVisionTransformer(
config=config,
quant_config=quant_config,
num_hidden_layers_override=num_hidden_layers_override,
require_post_norm=require_post_norm,
prefix=f"{prefix}.vision_model",
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
*,
num_hidden_layers_override: Optional[int] = None,
require_post_norm: Optional[bool] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/clip.py
forward
¶
load_weights
¶
Source code in vllm/model_executor/models/clip.py
CLIPVisionTransformer
¶
Bases: Module
Source code in vllm/model_executor/models/clip.py
encoder
instance-attribute
¶
encoder = CLIPEncoder(
config=config,
quant_config=quant_config,
num_hidden_layers_override=num_hidden_layers_override,
prefix=f"{prefix}.encoder",
)
__init__
¶
__init__(
config: CLIPVisionConfig,
quant_config: Optional[QuantizationConfig] = None,
*,
num_hidden_layers_override: Optional[int] = None,
require_post_norm: Optional[bool] = None,
prefix: str = "",
) -> None