`vllm.triton_utils.jit_monitor` ¶

Monitor unexpected Triton kernel JIT compilation during inference.

After server warmup completes, any Triton JIT compilation or autotuning event indicates a cache miss or unexpected input shape that causes a latency spike. This module registers hooks in the Triton runtime to detect and log such events so they can be investigated.

Set --jit-monitor-verbose to log every Triton JIT compile with its dispatch key. This is intentionally opt-in because it can emit many logs and add overhead.

Currently monitors: - Triton @triton.autotune cache misses (via knobs.autotuning.print) - Triton @triton.jit first-time compilations (via knobs.runtime.jit_post_compile_hook)

Functions:

activate –

Enable JIT compilation monitoring after warmup.
is_active –

Return whether the JIT compilation monitor is currently active.

`_setup_triton_autotuning_print()` ¶

Enable TRITON_PRINT_AUTOTUNING unless the user opted out.

Source code in vllm/triton_utils/jit_monitor.py

def _setup_triton_autotuning_print() -> None:
    """Enable ``TRITON_PRINT_AUTOTUNING`` unless the user opted out."""
    if not HAS_TRITON:
        return
    from triton import knobs  # type: ignore[import-untyped]

    user_val = os.environ.get("TRITON_PRINT_AUTOTUNING")
    if user_val == "0":
        logger.debug(
            "TRITON_PRINT_AUTOTUNING=0 set by user — "
            "autotuning messages will stay suppressed."
        )
        return

    knobs.autotuning.print = True

`_setup_triton_jit_hook()` ¶

Register a jit_post_compile_hook that warns on compilation.

Source code in vllm/triton_utils/jit_monitor.py

def _setup_triton_jit_hook() -> None:
    """Register a ``jit_post_compile_hook`` that warns on compilation."""
    if not HAS_TRITON:
        return
    from triton import knobs  # type: ignore[import-untyped]

    existing_hook = knobs.runtime.jit_post_compile_hook

    def _on_jit_compile(**kwargs):
        # `jit_post_compile_hook` is Triton internal API and its
        # signature has changed across releases (kwargs added/renamed).
        # Accept **kwargs so an upstream change cannot crash this hook
        # with TypeError, and forward the full kwarg set to any
        # pre-existing hook unchanged.
        fn = kwargs.get("fn")
        fn_name = getattr(fn, "name", "<unknown>")
        _log_jit_compile(fn_name, kwargs)
        if existing_hook is not None:
            return existing_hook(**kwargs)
        return None

    knobs.runtime.jit_post_compile_hook = _on_jit_compile

`activate(*, verbose=False)` ¶

Enable JIT compilation monitoring after warmup.

Call once per worker process at the end of :func:compile_or_warm_up_model. After activation every Triton kernel compilation or autotuning benchmark that happens during inference will be logged as a warning.

Safe to call multiple times — subsequent calls are no-ops.

If the user has explicitly set TRITON_PRINT_AUTOTUNING=0 in their environment, autotuning printing is left disabled; the JIT compilation hook is still registered regardless.

Source code in vllm/triton_utils/jit_monitor.py

def activate(*, verbose: bool = False) -> None:
    """Enable JIT compilation monitoring after warmup.

    Call once per worker process at the end of
    :func:`compile_or_warm_up_model`.  After activation every Triton
    kernel compilation or autotuning benchmark that happens during
    inference will be logged as a warning.

    Safe to call multiple times — subsequent calls are no-ops.

    If the user has explicitly set ``TRITON_PRINT_AUTOTUNING=0`` in
    their environment, autotuning printing is left disabled; the JIT
    compilation hook is still registered regardless.
    """
    global _active, _verbose
    if _active:
        return
    _active = True
    _verbose = verbose

    _setup_triton_autotuning_print()
    _setup_triton_jit_hook()

    logger.info(
        "Kernel JIT monitor activated — Triton JIT compilations "
        "during inference will be logged as warnings."
    )

`is_active()` ¶

Return whether the JIT compilation monitor is currently active.

Source code in vllm/triton_utils/jit_monitor.py

def is_active() -> bool:
    """Return whether the JIT compilation monitor is currently active."""
    return _active

vllm.triton_utils.jit_monitor ¶

_setup_triton_autotuning_print() ¶

_setup_triton_jit_hook() ¶

activate(*, verbose=False) ¶

is_active() ¶

`vllm.triton_utils.jit_monitor` ¶

`_setup_triton_autotuning_print()` ¶

`_setup_triton_jit_hook()` ¶

`activate(*, verbose=False)` ¶

`is_active()` ¶