vllm.model_executor.layers.mamba.ops.triton_helpers ¶

fast_exp ¶

fast_exp(x)

Faster alternative to tl.exp() using the hardware exp2 instruction.

tl.math.exp2 maps directly to a single ex2.approx.f32 PTX instruction, while tl.exp goes through libdevice __nv_expf which adds function call overhead and extra range checking.

Source code in vllm/model_executor/layers/mamba/ops/triton_helpers.py

@triton.jit
def fast_exp(x):
    """Faster alternative to tl.exp() using the hardware exp2 instruction.

    tl.math.exp2 maps directly to a single ex2.approx.f32 PTX instruction,
    while tl.exp goes through libdevice __nv_expf which adds function call
    overhead and extra range checking.
    """
    # exp(x) = exp2(x * log2(e)), where log2(e) = 1/ln(2) = 1.4426950408889634
    LOG2E = tl.constexpr(1.4426950408889634)
    return tl.math.exp2(LOG2E * x)