vllm.model_executor.models.glm4v
Inference-only CogAgent model compatible with THUDM weights.
EVA2CLIPAttention
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
dense
instance-attribute
¶
dense = RowParallelLinear(
hidden_size,
hidden_size,
quant_config=quant_config,
prefix=f"{prefix}.dense",
)
query_key_value
instance-attribute
¶
query_key_value = QKVParallelLinear(
hidden_size,
head_dim,
num_heads,
quant_config=quant_config,
prefix=f"{prefix}.query_key_value",
)
__init__
¶
__init__(
config,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/glm4v.py
forward
¶
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPGLU
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
dense_4h_to_h
instance-attribute
¶
dense_4h_to_h = RowParallelLinear(
ffn_hidden_size,
hidden_size,
bias=False,
quant_config=quant_config,
prefix=f"{prefix}.dense_4h_to_h",
)
linear_proj
instance-attribute
¶
linear_proj = ReplicatedLinear(
in_features,
hidden_size,
bias=False,
quant_config=quant_config,
prefix=f"{prefix}.linear_proj",
)
merged_proj
instance-attribute
¶
merged_proj = MergedColumnParallelLinear(
hidden_size,
[ffn_hidden_size] * 2,
bias=False,
quant_config=quant_config,
prefix=f"{prefix}.merged_proj",
)
__init__
¶
__init__(
config,
in_features,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
The original implementation is the same as:
self.dense_h_to_4h = ColumnParallelLinear(
config.hidden_size,
config.ffn_hidden_size,
bias=False,
quant_config=quant_config
)
self.gate_proj = ColumnParallelLinear(
config.hidden_size,
config.ffn_hidden_size,
bias=False,
quant_config=quant_config
)
gate_proj_output, _ = self.gate_proj(x)
dense_h_to_4h_output, _ = self.dense_h_to_4h(x)
x = torch.cat([gate_proj_output, dense_h_to_4h_output], dim=-1)
We merge two ColumnParallelLinear into one MergedColumnParallelLinear:
self.merged_proj = MergedColumnParallelLinear(
config.hidden_size,
[config.ffn_hidden_size] * 2,
bias=False,
quant_config=quant_config
)
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPMLP
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
fc1
instance-attribute
¶
fc1 = ColumnParallelLinear(
hidden_size,
intermediate_size,
quant_config=quant_config,
prefix=f"{prefix}.fc1",
)
fc2
instance-attribute
¶
fc2 = RowParallelLinear(
intermediate_size,
hidden_size,
quant_config=quant_config,
prefix=f"{prefix}.fc2",
)
__init__
¶
__init__(
config,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPModel
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
conv
instance-attribute
¶
conv = Conv2d(
in_channels=hidden_size,
out_channels=hidden_size,
kernel_size=2,
stride=2,
)
linear_proj
instance-attribute
¶
linear_proj = EVA2CLIPGLU(
config,
in_features=hidden_size,
quant_config=quant_config,
prefix=f"{prefix}.linear_proj",
)
transformer
instance-attribute
¶
transformer = EVA2CLIPTransformer(
vision_config,
quant_config=quant_config,
prefix=f"{prefix}.transformer",
)
__init__
¶
__init__(
config,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/glm4v.py
forward
¶
images : torch.Tensor Input image tensor with shape (B, C, H, W)
torch.Tensor Transformed tensor with shape (B, L, D)
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPPatchEmbedding
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
proj
instance-attribute
¶
proj = Conv2d(
in_channels,
hidden_size,
kernel_size=patch_size,
stride=patch_size,
)
__init__
¶
Source code in vllm/model_executor/models/glm4v.py
forward
¶
images : torch.Tensor Input image tensor with shape (B, C, H, W)
torch.Tensor Transformed tensor with shape (B, L, D)
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPTransformer
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
layers
instance-attribute
¶
layers = ModuleList(
[
EVA2CLIPTransformerLayer(
config,
quant_config=quant_config,
prefix=f"{prefix}.layers.{layer_idx}",
)
for layer_idx in range(num_hidden_layers)
]
)
__init__
¶
__init__(
config,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/glm4v.py
EVA2CLIPTransformerLayer
¶
Bases: Module
Source code in vllm/model_executor/models/glm4v.py
attention
instance-attribute
¶
attention = EVA2CLIPAttention(
config,
quant_config=quant_config,
prefix=f"{prefix}.attention",
)
mlp
instance-attribute
¶
mlp = EVA2CLIPMLP(
config,
quant_config=quant_config,
prefix=f"{prefix}.mlp",
)
post_attention_layernorm
instance-attribute
¶
post_attention_layernorm = LayerNorm(
hidden_size, eps=layer_norm_eps
)
__init__
¶
__init__(
config,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
)
Source code in vllm/model_executor/models/glm4v.py
forward
¶
Source code in vllm/model_executor/models/glm4v.py
GLM4VDummyInputsBuilder
¶
Bases: BaseDummyInputsBuilder[GLM4VProcessingInfo]
Source code in vllm/model_executor/models/glm4v.py
get_dummy_mm_data
¶
get_dummy_mm_data(
seq_len: int, mm_counts: Mapping[str, int]
) -> MultiModalDataDict
Source code in vllm/model_executor/models/glm4v.py
get_dummy_text
¶
GLM4VForCausalLM
¶
Bases: ChatGLMBaseModel
, SupportsLoRA
, SupportsPP
, SupportsMultiModal
Source code in vllm/model_executor/models/glm4v.py
522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 |
|
packed_modules_mapping
class-attribute
instance-attribute
¶
packed_modules_mapping = {
"query_key_value": ["query_key_value"],
"dense_h_to_4h": ["dense_h_to_4h"],
"merged_proj": ["gate_proj", "dense_h_to_4h"],
}
__init__
¶
__init__(
*,
vllm_config: VllmConfig,
prefix: str = "",
transformer_type: type[GLM4VModel] = GLM4VModel,
) -> None
Source code in vllm/model_executor/models/glm4v.py
_parse_and_validate_image_input
¶
_parse_and_validate_image_input(
**kwargs: object,
) -> Optional[GLMVImagePixelInputs]
Source code in vllm/model_executor/models/glm4v.py
_process_image_input
¶
_process_image_input(
image_input: GLMVImagePixelInputs,
) -> Tensor
_validate_pixel_values
¶
Source code in vllm/model_executor/models/glm4v.py
forward
¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: Optional[
IntermediateTensors
] = None,
inputs_embeds: Optional[Tensor] = None,
**kwargs: object,
) -> Union[Tensor, IntermediateTensors]
Source code in vllm/model_executor/models/glm4v.py
get_input_embeddings
¶
get_input_embeddings(
input_ids: Tensor,
multimodal_embeddings: Optional[
MultiModalEmbeddings
] = None,
) -> Tensor
Source code in vllm/model_executor/models/glm4v.py
get_mm_mapping
¶
get_mm_mapping() -> MultiModelKeys
Get the module prefix in multimodal models
Source code in vllm/model_executor/models/glm4v.py
get_multimodal_embeddings
¶
get_multimodal_embeddings(
**kwargs: object,
) -> MultiModalEmbeddings
Source code in vllm/model_executor/models/glm4v.py
get_placeholder_str
classmethod
¶
GLM4VModel
¶
Bases: ChatGLMModel
Source code in vllm/model_executor/models/glm4v.py
vision
instance-attribute
¶
vision = EVA2CLIPModel(
config, quant_config, prefix=f"{prefix}.vision"
)
__init__
¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/glm4v.py
GLM4VMultiModalProcessor
¶
Bases: BaseMultiModalProcessor[GLM4VProcessingInfo]
Source code in vllm/model_executor/models/glm4v.py
_get_mm_fields_config
¶
_get_prompt_updates
¶
_get_prompt_updates(
mm_items: MultiModalDataItems,
hf_processor_mm_kwargs: Mapping[str, object],
out_mm_kwargs: MultiModalKwargs,
) -> Sequence[PromptUpdate]
Source code in vllm/model_executor/models/glm4v.py
GLM4VProcessingInfo
¶
Bases: BaseProcessingInfo
Source code in vllm/model_executor/models/glm4v.py
GLM4VProcessor
¶
This model doesn't define its own HF processor, so we implement our own one here.
Source code in vllm/model_executor/models/glm4v.py
image_transform
instance-attribute
¶
image_transform = Compose(
[
Resize(
(image_size, image_size), interpolation=BICUBIC
),
ToTensor(),
Normalize(
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711),
),
]
)
__call__
¶
__call__(
text: Optional[
Union[TextInput, list[TextInput]]
] = None,
images: Optional[
Union[ImageInput, list[ImageInput]]
] = None,
return_tensors: Optional[Union[str, TensorType]] = None,
) -> BatchFeature
Source code in vllm/model_executor/models/glm4v.py
__init__
¶
__init__(
config: ChatGLMConfig, tokenizer: PreTrainedTokenizer
) -> None