vllm.model_executor.models.qwen2_rm
Inference-only Qwen2-RM model compatible with HuggingFace weights.
Qwen2ForProcessRewardModel
¶
Bases: Qwen2RewardBaseModel
Source code in vllm/model_executor/models/qwen2_rm.py
_pooler
instance-attribute
¶
_pooler = from_config_with_defaults(
pooler_config,
pooling_type=STEP,
normalize=False,
softmax=True,
step_tag_id=151651,
)
__init__
¶
Source code in vllm/model_executor/models/qwen2_rm.py
Qwen2ForRewardModel
¶
Bases: Qwen2RewardBaseModel
Source code in vllm/model_executor/models/qwen2_rm.py
_pooler
instance-attribute
¶
_pooler = from_config_with_defaults(
pooler_config,
pooling_type=ALL,
normalize=False,
softmax=False,
)
__init__
¶
Source code in vllm/model_executor/models/qwen2_rm.py
Qwen2RewardBaseModel
¶
Bases: Module
, SupportsLoRA
, SupportsPP
Source code in vllm/model_executor/models/qwen2_rm.py
make_empty_intermediate_tensors
instance-attribute
¶
model
instance-attribute
¶
model = Qwen2Model(
vllm_config=vllm_config,
prefix=maybe_prefix(prefix, "model"),
)
packed_modules_mapping
class-attribute
instance-attribute
¶
packed_modules_mapping = {
"qkv_proj": ["q_proj", "k_proj", "v_proj"],
"gate_up_proj": ["gate_proj", "up_proj"],
}
score
instance-attribute
¶
score = Sequential(
ColumnParallelLinear(
hidden_size,
hidden_size,
quant_config=quant_config,
return_bias=False,
),
ReLU(),
RowParallelLinear(
hidden_size,
num_labels,
quant_config=quant_config,
return_bias=False,
),
)
__init__
¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/qwen2_rm.py
forward
¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: Optional[
IntermediateTensors
] = None,
inputs_embeds: Optional[Tensor] = None,
) -> Union[Tensor, IntermediateTensors]
Source code in vllm/model_executor/models/qwen2_rm.py
get_input_embeddings
¶
load_weights
¶
pooler
¶
pooler(
hidden_states: Tensor, pooling_metadata: PoolingMetadata
) -> Optional[PoolerOutput]