vllm.model_executor.layers.quantization.utils.mxfp4_utils ¶
_can_support_mxfp4 ¶
_can_support_mxfp4(
use_grouped_topk: bool = False,
topk_group: int | None = None,
num_expert_group: int | None = None,
expert_map: Tensor | None = None,
custom_routing_function: Callable | None = None,
e_score_correction_bias: Tensor | None = None,
apply_router_weight_on_input: bool = False,
scoring_func: str = "softmax",
activation: str = "swigluoai",
expert_load_view: Tensor | None = None,
logical_to_physical_map: Tensor | None = None,
logical_replica_count: Tensor | None = None,
)
Source code in vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
_dequant_mxfp4 ¶
Source code in vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
_dequant_mxfp4_fake ¶
_quant_dequant_mxfp4 ¶
Source code in vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
_quant_dequant_mxfp4_fake ¶
_swizzle_mxfp4 ¶
weight swizzle for mxfp4 moe, used for OAI mxfp4 kernel