vllm.core.block.common
BlockList
¶
This class is an optimization to allow fast-access to physical block ids. It maintains a block id list that is updated with the block list and this avoids the need to reconstruct the block id list on every iteration of the block manager
Source code in vllm/core/block/common.py
__getitem__
¶
__init__
¶
__setitem__
¶
_add_block_id
¶
_update_block_id
¶
append_token_ids
¶
Source code in vllm/core/block/common.py
ids
¶
list
¶
reset
¶
BlockPool
¶
Used to pre-allocate block objects, in order to avoid excessive python object allocations/deallocations. The pool starts from "pool_size" objects and will increase to more objects if necessary
Note that multiple block objects may point to the same physical block id, which is why this pool is needed, so that it will be easier to support prefix caching and more complicated sharing of physical blocks.
Source code in vllm/core/block/common.py
__init__
¶
__init__(
block_size: int,
create_block: Factory,
allocator: BlockAllocator,
pool_size: int,
)
Source code in vllm/core/block/common.py
increase_pool
¶
Doubles the internal pool size
Source code in vllm/core/block/common.py
init_block
¶
init_block(
prev_block: Optional[Block],
token_ids: List[int],
block_size: int,
physical_block_id: Optional[int],
extra_hash: Optional[int] = None,
) -> Block
Source code in vllm/core/block/common.py
CacheMetricData
dataclass
¶
A utility dataclass to maintain cache metric. To avoid overflow, we maintain the hit rate in block granularity, so that we can maintain a single hit rate for n_completed_block x block_size, and calculate the real time hit rate by the following: BS = The number of queries per block. nB = The number of completed blocks. HR = hit rate of (nB x BS) queries. Q = current number of queries (< BS). H = current number of hits (< BS). hit rate = ((HR x nB) + (H / Q) x (Q / BS)) / (nB + Q / BS)
Source code in vllm/core/block/common.py
completed_block_cache_hit_rate
class-attribute
instance-attribute
¶
completed_block_cache_hit_rate: float = 0.0
num_incompleted_block_queries
class-attribute
instance-attribute
¶
num_incompleted_block_queries: int = 0
__init__
¶
__init__(
num_completed_blocks: int = 0,
completed_block_cache_hit_rate: float = 0.0,
num_incompleted_block_queries: int = 0,
num_incompleted_block_hit: int = 0,
block_size: int = 1000,
) -> None
get_hit_rate
¶
Source code in vllm/core/block/common.py
query
¶
query(hit: bool)
Source code in vllm/core/block/common.py
CopyOnWriteTracker
¶
A class for tracking and managing copy-on-write operations for blocks.
The CopyOnWriteTracker class maintains a mapping of source block indices to their corresponding copy-on-write destination block indices. It works in conjunction with a RefCounter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
refcounter
|
RefCounter
|
The reference counter used to track block reference counts. |
required |
Source code in vllm/core/block/common.py
__init__
¶
__init__(refcounter: RefCounterProtocol)
clear_cows
¶
Clears the copy-on-write tracking information and returns the current state.
This method returns a list mapping source block indices to destination block indices for the current copy-on-write operations. It then clears the internal tracking information.
Returns:
Type | Description |
---|---|
List[Tuple[BlockId, BlockId]]
|
List[Tuple[BlockId, BlockId]]: A list mapping source block indices to destination block indices for the current copy-on-write operations. |
Source code in vllm/core/block/common.py
is_appendable
¶
Checks if the block is shared or not. If shared, then it cannot be appended and needs to be duplicated via copy-on-write
Source code in vllm/core/block/common.py
record_cow
¶
Records a copy-on-write operation from source to target block id Args: src_block_id (BlockId): The source block id from which to copy the data trg_block_id (BlockId): The target block id to which the data is copied
Source code in vllm/core/block/common.py
ReadOnlyRefCounter
¶
Bases: RefCounterProtocol
A read-only view of the RefCounter class.
The ReadOnlyRefCounter class provides a read-only interface to access the reference counts maintained by a RefCounter instance. It does not allow modifications to the reference counts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
refcounter
|
RefCounter
|
The RefCounter instance to create a read-only view for. |
required |
Source code in vllm/core/block/common.py
RefCounter
¶
Bases: RefCounterProtocol
A class for managing reference counts for a set of block indices.
The RefCounter class maintains a dictionary that maps block indices to their corresponding reference counts. It provides methods to increment, decrement, and retrieve the reference count for a given block index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
all_block_indices
|
Iterable[BlockId]
|
An iterable of block indices to initialize the reference counter with. |
required |
Source code in vllm/core/block/common.py
get_all_blocks_recursively
¶
Retrieves all the blocks in a sequence starting from the last block.
This function recursively traverses the sequence of blocks in reverse order, starting from the given last block, and returns a list of all the blocks in the sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
last_block
|
Block
|
The last block in the sequence. |
required |
Returns:
Type | Description |
---|---|
List[Block]
|
List[Block]: A list of all the blocks in the sequence, in the order they appear. |