vllm.model_executor.layers.fused_moe.prepare_finalize
MoEPrepareAndFinalizeNoEP
¶
Bases: FusedMoEPrepareAndFinalize
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
finalize
¶
finalize(
output: Tensor,
fused_expert_output: Tensor,
topk_weights: Tensor,
topk_ids: Tensor,
apply_router_weight_on_input: bool,
) -> None
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
max_num_tokens_per_rank
¶
prepare
¶
prepare(
a1: Tensor,
a1_scale: Optional[Tensor],
a2_scale: Optional[Tensor],
topk_weights: Tensor,
topk_ids: Tensor,
num_experts: int,
expert_map: Optional[Tensor],
apply_router_weight_on_input: bool,
quant_config: FusedMoEQuantConfig,
) -> tuple[
Tensor,
Optional[Tensor],
Optional[Tensor],
Optional[Tensor],
Optional[Tensor],
]