vllm.models.deepseek_v4.common.ops.save_partial_states ¶
save_partial_states ¶
save_partial_states(
kv: Tensor,
score: Tensor,
ape: Tensor,
positions: Tensor,
state_cache: Tensor,
slot_mapping: Tensor,
block_size: int,
state_width: int,
compress_ratio: int,
pdl_kwargs: dict | None = None,
) -> None
Write packed [kv, score+ape] partial states into the compressor cache.
One program per token; pads (slot_id == -1) are skipped.