vllm.model_executor.models.internlm2_ve
InternLM2VEDecoderLayer
¶
Bases: Module
Source code in vllm/model_executor/models/internlm2_ve.py
attention
instance-attribute
¶
attention = InternLM2Attention(
hidden_size=hidden_size,
num_heads=num_attention_heads,
num_kv_heads=num_key_value_heads,
rope_theta=rope_theta,
rope_scaling=rope_scaling,
max_position_embeddings=max_position_embeddings,
cache_config=cache_config,
quant_config=quant_config,
prefix=f"{prefix}.attention",
)
feed_forward
instance-attribute
¶
feed_forward = InternLM2MLP(
hidden_size=hidden_size,
intermediate_size=intermediate_size,
hidden_act=hidden_act,
quant_config=quant_config,
prefix=f"{prefix}.feed_forward",
)
feed_forward_ve
instance-attribute
¶
feed_forward_ve = InternLM2MLP(
hidden_size=hidden_size,
intermediate_size=intermediate_size,
hidden_act=hidden_act,
quant_config=quant_config,
prefix=f"{prefix}.feed_forward_ve",
)
__init__
¶
__init__(
config: PretrainedConfig,
cache_config: Optional[CacheConfig] = None,
quant_config: Optional[QuantizationConfig] = None,
prefix: str = "",
) -> None
Source code in vllm/model_executor/models/internlm2_ve.py
forward
¶
forward(
positions: Tensor,
hidden_states: Tensor,
residual: Optional[Tensor],
visual_token_mask: Optional[Tensor] = None,
) -> tuple[Tensor, Tensor]
Source code in vllm/model_executor/models/internlm2_ve.py
InternLM2VEForCausalLM
¶
Bases: InternLM2ForCausalLM
Source code in vllm/model_executor/models/internlm2_ve.py
InternLM2VEModel
¶
Bases: InternLM2Model
Source code in vllm/model_executor/models/internlm2_ve.py
__init__
¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
forward
¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: Optional[
IntermediateTensors
] = None,
inputs_embeds: Optional[Tensor] = None,
visual_token_mask: Optional[Tensor] = None,
) -> Union[Tensor, IntermediateTensors]