vllm bench sweep serve_workload¶

JSON CLI Arguments¶

When passing JSON CLI arguments, the following sets of arguments are equivalent:

--json-arg '{"key1": "value1", "key2": {"key3": "value2"}}'
--json-arg.key1 value1 --json-arg.key2.key3 value2

Additionally, list elements can be passed individually using +:

--json-arg '{"key4": ["value3", "value4", "value5"]}'
--json-arg.key4+ value3 --json-arg.key4+='value4,value5'

Arguments¶

`--serve-cmd`¶

The command used to run the server: vllm serve ...

`--bench-cmd`¶

The command used to run the benchmark: vllm bench serve ...

`--after-bench-cmd`¶

After a benchmark run is complete, invoke this command instead of the default ServerWrapper.clear_cache().

`--show-stdout`¶

If set, logs the standard output of subcommands. Useful for debugging but can be quite spammy.

Default: False

`--server-ready-timeout`¶

Timeout in seconds to wait for the server to become ready.

Default: 300

`--serve-params`¶

Path to JSON file containing parameter combinations for the vllm serve command. Can be either a list of dicts or a dict where keys are benchmark names. If both serve_params and bench_params are given, this script will iterate over their Cartesian product.

`--link-vars`¶

Comma-separated list of linked variables between serve and bench, e.g. max_num_seqs=max_concurrency,max_model_len=random_input_len

Default: ""

`--bench-params`¶

Path to JSON file containing parameter combinations for the vllm bench serve command. Can be either a list of dicts or a dict where keys are benchmark names. If both serve_params and bench_params are given, this script will iterate over their Cartesian product.

`-o`, `--output-dir`¶

The main directory to which results are written.

Default: results

`-e`, `--experiment-name`¶

The name of this experiment (defaults to current timestamp). Results will be stored under output_dir/experiment_name.

`--num-runs`¶

Number of runs per parameter combination.

Default: 3

`--dry-run`¶

If set, prints the commands to run, then exits without executing them.

Default: False

`--resume`¶

Resume a previous execution of this script, i.e., only run parameter combinations for which there are still no output files under output_dir/experiment_name.

Default: False

workload options¶

`--workload-var`¶

Possible choices: request_rate, max_concurrency

The variable to adjust in each iteration.

Default: request_rate

`--workload-iters`¶

Number of workload levels to explore. This includes the first two iterations used to interpolate the value of workload_var for remaining iterations.

Default: 10

vllm bench sweep serve_workload¶

JSON CLI Arguments¶

Arguments¶

--serve-cmd¶

--bench-cmd¶

--after-bench-cmd¶

--show-stdout¶

--server-ready-timeout¶

--serve-params¶

--link-vars¶

--bench-params¶

-o, --output-dir¶

-e, --experiment-name¶

--num-runs¶

--dry-run¶

--resume¶

workload options¶

--workload-var¶

--workload-iters¶

`--serve-cmd`¶

`--bench-cmd`¶

`--after-bench-cmd`¶

`--show-stdout`¶

`--server-ready-timeout`¶

`--serve-params`¶

`--link-vars`¶

`--bench-params`¶

`-o`, `--output-dir`¶

`-e`, `--experiment-name`¶

`--num-runs`¶

`--dry-run`¶

`--resume`¶

`--workload-var`¶

`--workload-iters`¶