vllm.model_executor.layers.fused_moe.moe_fused_mul_sum ¶
moe_fused_mul_sum ¶
moe_fused_mul_sum(
inputs: Tensor,
topk_weights: Tensor,
outputs: Tensor | None = None,
topk_ids: Tensor | None = None,
expert_map: Tensor | None = None,
) -> Tensor
Fused kernel for MoE (Mixture of Experts) to perform weighted summation of expert outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | Tensor | The output from experts. Shape: (num_tokens, top_k, hidden_size). | required |
topk_weights | Tensor | The weights assigned to each expert for each token. Shape: (num_tokens, top_k). | required |
outputs | Tensor | None | Optional pre-allocated output tensor. Shape: (num_tokens, hidden_size). | None |
topk_ids | Tensor | None | Optional indices of the top-k experts. Used when | None |
expert_map | Tensor | None | Optional mapping for Expert Parallelism. A value < 0 indicates an invalid token/expert pair that will be skipped. | None |
Returns:
| Name | Type | Description |
|---|---|---|
Tensor | The fused weighted sum of expert outputs. | |
Shape | Tensor | (num_tokens, hidden_size). |