vllm.entrypoints.openai.fingerprint ¶
Build the system_fingerprint string returned by the OpenAI-compatible server.
Four modes, configured via --fingerprint-mode:
full(default):vllm-<version>[-<parallelism>]-<hash8>— encodes server version, any non-trivial parallelism degree (tp/pp/dp/ep), and an 8-char prefix ofvllm_config.compute_hash()(covers model identity, quant config, speculative, attention backend, etc.).hash:vllm-<version>-<hash8>— parallelism stripped.custom: user-provided literal via--fingerprint-value.none: the field is omitted (serialized asnull).
get_system_fingerprint is only called at serving-class init (a handful of times per server); each subclass caches the returned string on self.system_fingerprint, so per-request cost is one attribute read.
get_system_fingerprint ¶
Return the fingerprint for vllm_config using the mode configured by set_default_fingerprint_mode.
Source code in vllm/entrypoints/openai/fingerprint.py
set_default_fingerprint_mode ¶
set_default_fingerprint_mode(
mode: FingerprintMode, custom_value: str | None = None
) -> None
Configure the fingerprint mode for subsequent get_system_fingerprint calls. Called once at server startup.