Skip to content

Incremental Compilation Workflow

When working on vLLM's C++/CUDA kernels located in the csrc/ directory, recompiling the entire project with uv pip install -e . for every change can be time-consuming. An incremental compilation workflow using CMake allows for faster iteration by only recompiling the necessary components after an initial setup. This guide details how to set up and use such a workflow, which complements your editable Python installation.

Prerequisites

Before setting up the incremental build:

  1. vLLM Editable Install: Ensure you have vLLM installed from source in an editable mode. Using pre-compiled wheels for the initial editable setup can be faster, as the CMake workflow will handle subsequent kernel recompilations.

    uv venv --python 3.12 --seed
    source .venv/bin/activate
    VLLM_USE_PRECOMPILED=1 uv pip install -U -e . --torch-backend=auto
    
  2. CUDA Toolkit: Verify that the NVIDIA CUDA Toolkit is correctly installed and nvcc is accessible in your PATH. CMake relies on nvcc to compile CUDA code. You can typically find nvcc in $CUDA_HOME/bin/nvcc or by running which nvcc. If you encounter issues, refer to the official CUDA Toolkit installation guides and vLLM's main GPU installation documentation for troubleshooting. The CMAKE_CUDA_COMPILER variable in your CMakeUserPresets.json should also point to your nvcc binary.

  3. Build Tools: It is highly recommended to install ccache for fast rebuilds by caching compilation results (e.g., sudo apt install ccache or conda install ccache). Also, ensure the core build dependencies like cmake and ninja are installed. These are installable through requirements/build.txt or your system's package manager.

    uv pip install -r requirements/build.txt --torch-backend=auto
    

Setting up the CMake Build Environment

The incremental build process is managed through CMake. You can configure your build settings using a CMakeUserPresets.json file at the root of the vLLM repository.

Generate CMakeUserPresets.json using the helper script

To simplify the setup, vLLM provides a helper script that attempts to auto-detect your system's configuration (like CUDA path, Python environment, and CPU cores) and generates the CMakeUserPresets.json file for you.

Run the script:

Navigate to the root of your vLLM clone and execute the following command:

python tools/generate_cmake_presets.py

The script will prompt you if it cannot automatically determine certain paths (e.g., nvcc or a specific Python executable for your vLLM development environment). Follow the on-screen prompts. If an existing CMakeUserPresets.json is found, the script will ask for confirmation before overwriting it.

After running the script, a CMakeUserPresets.json file will be created in the root of your vLLM repository.

Example CMakeUserPresets.json

Below is an example of what the generated CMakeUserPresets.json might look like. The script will tailor these values based on your system and any input you provide.

{
    "version": 6,
    "cmakeMinimumRequired": {
        "major": 3,
        "minor": 26,
        "patch": 1
    },
    "configurePresets": [
        {
            "name": "release",
            "generator": "Ninja",
            "binaryDir": "${sourceDir}/cmake-build-release",
            "cacheVariables": {
                "CMAKE_CUDA_COMPILER": "/usr/local/cuda/bin/nvcc",
                "CMAKE_C_COMPILER_LAUNCHER": "ccache",
                "CMAKE_CXX_COMPILER_LAUNCHER": "ccache",
                "CMAKE_CUDA_COMPILER_LAUNCHER": "ccache",
                "CMAKE_BUILD_TYPE": "Release",
                "VLLM_PYTHON_EXECUTABLE": "/home/user/venvs/vllm/bin/python",
                "CMAKE_INSTALL_PREFIX": "${sourceDir}",
                "CMAKE_CUDA_FLAGS": "",
                "NVCC_THREADS": "4",
                "CMAKE_JOB_POOLS": "compile=32"
            }
        }
    ],
    "buildPresets": [
        {
            "name": "release",
            "configurePreset": "release",
            "jobs": 32
        }
    ]
}

What do the various configurations mean? - CMAKE_CUDA_COMPILER: Path to your nvcc binary. The script attempts to find this automatically. - CMAKE_C_COMPILER_LAUNCHER, CMAKE_CXX_COMPILER_LAUNCHER, CMAKE_CUDA_COMPILER_LAUNCHER: Setting these to ccache (or sccache) significantly speeds up rebuilds by caching compilation results. Ensure ccache is installed (e.g., sudo apt install ccache or conda install ccache). The script sets these by default. - VLLM_PYTHON_EXECUTABLE: Path to the Python executable in your vLLM development environment. The script will prompt for this, defaulting to the current Python environment if suitable. - CMAKE_INSTALL_PREFIX: "${sourceDir}": Specifies that the compiled components should be installed back into your vLLM source directory. This is crucial for the editable install, as it makes the newly built kernels immediately available to your Python environment. - CMAKE_JOB_POOLS and jobs in build presets: Control the parallelism of the build. The script sets these based on the number of CPU cores detected on your system. - binaryDir: Specifies where the build artifacts will be stored (e.g., cmake-build-release).

Building and Installing with CMake

Once your CMakeUserPresets.json is configured:

  1. Initialize the CMake build environment: This step configures the build system according to your chosen preset (e.g., release) and creates the build directory at binaryDir
cmake --preset release
  1. Build and install the vLLM components: This command compiles the code and installs the resulting binaries into your vLLM source directory, making them available to your editable Python installation.
cmake --build --preset release --target install
  1. Make changes and repeat! Now you start using your editable install of vLLM, testing and making changes as needed. If you need to build again to update based on changes, simply run the CMake command again to build only the affected files.

    cmake --build --preset release --target install
    

Verifying the Build

After a successful build, you will find a populated build directory (e.g., cmake-build-release/ if you used the release preset and the example configuration).

> ls cmake-build-release/
bin             cmake_install.cmake      _deps                                machete_generation.log
build.ninja     CPackConfig.cmake        detect_cuda_compute_capabilities.cu  marlin_generation.log
_C.abi3.so      CPackSourceConfig.cmake  detect_cuda_version.cc               _moe_C.abi3.so
CMakeCache.txt  ctest                    _flashmla_C.abi3.so                  moe_marlin_generation.log
CMakeFiles      cumem_allocator.abi3.so  install_local_manifest.txt           vllm-flash-attn

The cmake --build ... --target install command copies the compiled shared libraries (like _C.abi3.so, _moe_C.abi3.so, etc.) into the appropriate vllm package directory within your source tree. This updates your editable installation with the newly compiled kernels.

Additional Tips

  • Adjust Parallelism: Fine-tune the CMAKE_JOB_POOLS in configurePresets and jobs in buildPresets in your CMakeUserPresets.json. Too many jobs can overload systems with limited RAM or CPU cores, leading to slower builds or system instability. Too few won't fully utilize available resources.
  • Clean Builds When Necessary: If you encounter persistent or strange build errors, especially after significant changes or switching branches, consider removing the CMake build directory (e.g., rm -rf cmake-build-release) and re-running the cmake --preset and cmake --build commands.
  • Specific Target Builds: For even faster iterations when working on a specific module, you can sometimes build a specific target instead of the full install target, though install ensures all necessary components are updated in your Python environment. Refer to CMake documentation for more advanced target management.