vllm.v1.kv_offload.tiering.fs.manager ¶
FileSystemTierManager: Pure-Python file system secondary tier for KV cache offloading.
Store path
Data is written to a temp file (
Load path
Data is read from the block file directly via os.readv into the provided memoryview slice.
_r//_g/.bin
(hash-based subdirectories to limit directory fan-out)
FileSystemTierManager ¶
Bases: SecondaryTierManager
Pure-Python disk-backed secondary tier.
Read-priority threads service load jobs preferentially; write-priority threads service store jobs preferentially. Both groups can drain either queue, so neither starves.
submit_store / submit_load are non-blocking: they enqueue tasks and return. get_finished() polls job completion and returns completed JobResults.
Source code in vllm/v1/kv_offload/tiering/fs/manager.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | |
__init__ ¶
__init__(
offloading_spec: OffloadingSpec,
primary_kv_view: memoryview,
tier_type: str,
root_dir: str,
n_read_threads: int = 16,
n_write_threads: int = 16,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
offloading_spec | OffloadingSpec | contains the vllm_config, kv_cache_config and block_size_factor. | required |
primary_kv_view | memoryview | Memoryview of the primary tier's CPU KV cache. | required |
tier_type | str | Tier type identifier, set by SecondaryTierFactory. | required |
root_dir | str | Root directory for block files. | required |
n_read_threads | int | Number of read-priority I/O threads. | 16 |
n_write_threads | int | Number of write-priority I/O threads. | 16 |
Source code in vllm/v1/kv_offload/tiering/fs/manager.py
get_finished ¶
Collect completed jobs from the finished-jobs queue.
shutdown ¶
Release resources held by this tier.
Shuts down the thread pool, clearing pending tasks and waiting for active threads to complete.