vllm.distributed.kv_transfer.kv_connector.v1.ssm_conv_transfer_utils ¶
Mamba conv-state sub-projection decomposition for the 3-read transfer.
With DS conv state layout (dim, state_len), x/B/C sub-projections are contiguous in memory. Each D rank reads its x, B, C slices via 3 separate RDMA transfers — no P-side permutation needed.
MambaConvSplitInfo dataclass ¶
Per-rank byte sizes of x, B, C sub-projections in the Mamba conv state.
Used by both P and D sides for NIXL descriptor registration. All fields are LOCAL to this engine's TP (already divided by TP size).
DS memory layout within one page (contiguous in memory): |--- x (x_local * conv_rows) ---|- B (b_local * conv_rows) -|- C -|
Source code in vllm/distributed/kv_transfer/kv_connector/v1/ssm_conv_transfer_utils.py
local_conv_offsets property ¶
(byte_offset, byte_size) of x, B, C within this engine's page.
Used by both P and D for local descriptor registration.
remote_conv_offsets ¶
(byte_offset, byte_size) of this D rank's x, B, C slice within one P page.
Used by D side only, during remote descriptor registration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_rank_offset | int | which slice this D rank reads. tp_ratio > 0: tp_rank % tp_ratio (selects slice of P's page). tp_ratio < 0: always 0 (read P's full page). | required |
tp_ratio | int | effective ratio (>= 1 when D_TP > P_TP, 1 when P_TP > D_TP since each P rank is read in full). | required |
Source code in vllm/distributed/kv_transfer/kv_connector/v1/ssm_conv_transfer_utils.py
compute_physical_blocks_per_logical ¶
Derive _physical_blocks_per_logical_kv_block from remote metadata.
The remote engine's ratio is not sent directly in the handshake, so we reconstruct it: total mamba state per logical block / block_len.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ssm_sizes | tuple[int, ...] | (conv_state_bytes, ssm_state_bytes) from NixlAgentMetadata. | required |
block_len | int | the engine's block_len in bytes (from block_lens[0]). | required |
Source code in vllm/distributed/kv_transfer/kv_connector/v1/ssm_conv_transfer_utils.py
derive_mamba_conv_split ¶
derive_mamba_conv_split(
mamba_spec: MambaSpec, local_tp: int
) -> MambaConvSplitInfo
Derive per-rank x/B/C byte sizes from a MambaSpec.
Called once at init on both P and D. Decomposes the conv dimension (= intermediate_size + 2 * groups_ss) into its x, B, C parts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mamba_spec | MambaSpec | MambaSpec whose shapes are: shapes[0] = conv state: (conv_dim_local, conv_rows) in DS layout. shapes[1] = SSM temporal: (local_num_heads, head_dim). | required |
local_tp | int | this engine's tensor-parallel size. | required |
Returns:
| Type | Description |
|---|---|
MambaConvSplitInfo | MambaConvSplitInfo with per-rank x_local, b_local, conv_rows, and |
MambaConvSplitInfo | conv_dtype_size. |