vllm.core.placeholder_block_space_manager
PlaceholderBlockSpaceManager
¶
Bases: BlockSpaceManager
A version of BlockSpaceManager for use in environments where block management is not required. For example: pooling models or attention-free models like Mamba.
This class provides the same interface as BlockSpaceManager, but its methods perform no actions or return simple values like True in specific actions. It's designed to be used in scenarios where the overhead of block management is unnecessary, such as in an embedding environment.
Source code in vllm/core/placeholder_block_space_manager.py
__init__
¶
access_all_blocks_in_seq
¶
allocate
¶
allocate(seq_group: SequenceGroup) -> None
append_slots
¶
can_allocate
¶
can_allocate(
seq_group: SequenceGroup, num_lookahead_slots: int = 0
) -> AllocStatus
can_append_slots
¶
can_append_slots(
seq_group: SequenceGroup, num_lookahead_slots: int
) -> bool
can_swap_in
¶
can_swap_in(
seq_group: SequenceGroup, num_lookahead_slots: int
) -> AllocStatus
can_swap_out
¶
can_swap_out(seq_group: SequenceGroup) -> bool
fork
¶
get_block_table
¶
get_common_computed_block_ids
¶
get_num_cached_tokens
¶
get_prefix_cache_hit_rate
¶
mark_blocks_as_computed
¶
mark_blocks_as_computed(
seq_group: SequenceGroup, token_chunk_size: int
)
remove_seq_from_computed_blocks_tracker
¶
remove_seq_from_computed_blocks_tracker(
seq: Sequence,
) -> None