vllm.model_executor.layers.logits_processor
A layer that compute logits from hidden_stats.
_logits_processor_threadpool
module-attribute
¶
_logits_processor_threadpool: Optional[
ThreadPoolExecutor
] = None
LogitsProcessor
¶
Bases: Module
Process logits and apply logits processors from sampling metadata.
This layer does the following: 1. Gather logits from model hidden_states. 2. Scale logits if needed. 3. Apply logits processors (if any).
Source code in vllm/model_executor/layers/logits_processor.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
__init__
¶
__init__(
vocab_size: int,
org_vocab_size: Optional[int] = None,
scale: float = 1.0,
logits_as_input: bool = False,
soft_cap: Optional[float] = None,
) -> None
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scale
|
float
|
A scaling factor to apply to the logits. |
1.0
|
Source code in vllm/model_executor/layers/logits_processor.py
_gather_logits
¶
gather/all-gather the logits tensor across model parallel group.
Source code in vllm/model_executor/layers/logits_processor.py
_get_logits
¶
_get_logits(
hidden_states: Tensor,
lm_head: VocabParallelEmbedding,
embedding_bias: Optional[Tensor],
) -> Optional[Tensor]
Source code in vllm/model_executor/layers/logits_processor.py
forward
¶
forward(
lm_head: VocabParallelEmbedding,
hidden_states: Tensor,
sampling_metadata: Optional[SamplingMetadata] = None,
embedding_bias: Optional[Tensor] = None,
) -> Optional[Tensor]
Source code in vllm/model_executor/layers/logits_processor.py
_apply_logits_processors
¶
_apply_logits_processors(
logits: Tensor, sampling_metadata: SamplingMetadata
) -> Tensor
Source code in vllm/model_executor/layers/logits_processor.py
_apply_logits_processors_single_seq
¶
_apply_logits_processors_single_seq(
logits_row,
logits_processors,
past_tokens_ids,
prompt_tokens_ids,
) -> Tensor
Source code in vllm/model_executor/layers/logits_processor.py
_prune_hidden_states
¶
_prune_hidden_states(
hidden_states: Tensor,
sampling_metadata: SamplingMetadata,
) -> Tensor