vllm.transformers_utils.configs.mlp_speculator
MLPSpeculatorConfig
¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/mlp_speculator.py
__init__
¶
__init__(
vocab_size: int = 32000,
emb_dim: int = 4096,
inner_dim: int = 0,
n_predict: int = 3,
top_k_tokens_per_head: Optional[list[int]] = None,
n_candidates: int = 5,
tie_weights: bool = False,
scale_input: bool = False,
**kwargs,
)
Initialize an MLPSpeculatorConfig
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vocab_size
|
int
|
int the model vocab size |
32000
|
emb_dim
|
int
|
int the model embedding dimension |
4096
|
inner_dim
|
int
|
int the inner dimension of the model. If 0, will be the emb_dim. |
0
|
n_predict
|
int
|
int the number of lookaheads for the speculator |
3
|
top_k_tokens_per_head
|
Optional[list[int]]
|
list[int] Number of tokens to consider from each head when forming the candidate tree. For each candidate branch in the tree, head n produces topk[n] additional sub-branches. NOTE: This parameter is currently unused. |
None
|
n_candidates
|
int
|
int number of child candidates to create per sequence |
5
|
tie_weights
|
bool
|
bool If true, use a single set of weights for every model head/stage after the first. The initial projection from the base model may have a different size, so that stays separate. |
False
|
scale_input
|
bool
|
bool if True, will scale the initial hidden states from the base model. |
False
|