vllm.model_executor.layers.quantization.utils.nvfp4_emulation_utils
__all__
module-attribute
¶
kE2M1ToFloat
module-attribute
¶
kE2M1ToFloat = tensor(
[0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0], dtype=float32
)
break_fp4_bytes
¶
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
cast_to_fp4
¶
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
convert_swizzled_to_linear
¶
convert_swizzled_to_linear(
a_sf_swizzled: Tensor, m, k, block_size
)
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
cutlass_fp4_supported
¶
cutlass_fp4_supported() -> bool
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
dequantize_to_dtype
¶
Dequantize the fp4 tensor back to high precision.
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
get_reciprocal
¶
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
ref_nvfp4_quant
¶
Source code in vllm/model_executor/layers/quantization/utils/nvfp4_emulation_utils.py
run_nvfp4_emulations
¶
run_nvfp4_emulations(
x: Tensor,
input_global_scale: Tensor,
weight: Tensor,
weight_scale_swizzled: Tensor,
weight_global_scale: Tensor,
)