vllm.config.profiler ¶
ProfilerConfig ¶
Dataclass which contains profiler config for the engine.
Source code in vllm/config/profiler.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
delay_iterations class-attribute instance-attribute ¶
delay_iterations: int = Field(default=0, ge=0)
Number of engine iterations to skip before starting profiling. Defaults to 0, meaning profiling starts immediately after receiving /start_profile.
ignore_frontend class-attribute instance-attribute ¶
ignore_frontend: bool = False
If True, disables the front-end profiling of AsyncLLM when using the 'torch' profiler. This is needed to reduce overhead when using delay/limit options, since the front-end profiling does not track iterations and will capture the entire range.
max_iterations class-attribute instance-attribute ¶
max_iterations: int = Field(default=0, ge=0)
Maximum number of engine iterations to profile after starting profiling. Defaults to 0, meaning no limit.
profiler class-attribute instance-attribute ¶
profiler: ProfilerKind | None = None
Which profiler to use. Defaults to None. Options are:
-
'torch': Use PyTorch profiler.
-
'cuda': Use CUDA profiler.
torch_profiler_dir class-attribute instance-attribute ¶
torch_profiler_dir: str = ''
Directory to save torch profiler traces. Both AsyncLLM's CPU traces and worker's traces (CPU & GPU) will be saved under this directory. Note that it must be an absolute path.
torch_profiler_dump_cuda_time_total class-attribute instance-attribute ¶
torch_profiler_dump_cuda_time_total: bool = True
If True, dumps total CUDA time in torch profiler traces. Enabled by default.
torch_profiler_record_shapes class-attribute instance-attribute ¶
torch_profiler_record_shapes: bool = False
If True, records tensor shapes in the torch profiler. Disabled by default.
torch_profiler_use_gzip class-attribute instance-attribute ¶
torch_profiler_use_gzip: bool = True
If True, saves torch profiler traces in gzip format. Enabled by default
torch_profiler_with_flops class-attribute instance-attribute ¶
torch_profiler_with_flops: bool = False
If True, enables FLOPS counting in the torch profiler. Disabled by default.
torch_profiler_with_memory class-attribute instance-attribute ¶
torch_profiler_with_memory: bool = False
If True, enables memory profiling in the torch profiler. Disabled by default.
torch_profiler_with_stack class-attribute instance-attribute ¶
torch_profiler_with_stack: bool = True
If True, enables stack tracing in the torch profiler. Enabled by default.
_validate_profiler_config ¶
_validate_profiler_config() -> Self
Source code in vllm/config/profiler.py
compute_hash ¶
compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.
Source code in vllm/config/profiler.py
_is_uri_path ¶
Check if path is a URI (scheme://...), excluding Windows drive letters.
Supports custom URI schemes like gs://, s3://, hdfs://, etc. These paths should not be converted to absolute paths.