vllm.v1.spec_decode.ngram_proposer
NgramProposer
¶
Source code in vllm/v1/spec_decode/ngram_proposer.py
__init__
¶
__init__(vllm_config: VllmConfig)
Source code in vllm/v1/spec_decode/ngram_proposer.py
load_model
¶
propose
¶
Proposes the next sequence of tokens based on n-gram pattern matching in the context. The function finds matches of the last n tokens in the previous context, and returns k tokens that followed that match.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context_token_ids
|
ndarray
|
Numpy array of token IDs representing the context sequence. |
required |
Returns:
Name | Type | Description |
---|---|---|
Optional[ndarray]
|
np.ndarray: The sequence of tokens that followed the matched n-gram in the context. |
|
None |
Optional[ndarray]
|
If no matching n-gram pattern is found. |
Example
If context_token_ids = [1,2,3,4,2,3], min_n = 2, max_n = 3, and k = 4: - The last 3 (= max_n) tokens [4,2,3] cannot find a match. - The last 2 tokens [2,3] will be matched against the previous 4 tokens [1,2,3,4]. - Finding a match of [2,3] would return the tokens that followed that pattern. Here we will return [4,2,3] because we only have three tokens after the match.
Source code in vllm/v1/spec_decode/ngram_proposer.py
_find_subarray_kmp
¶
Source code in vllm/v1/spec_decode/ngram_proposer.py
_kmp_lps_array
¶
Build the lps (longest proper prefix which is also suffix) array for the pattern.