vllm.model_executor.layers.utils
Utility methods for model layers.
apply_penalties
¶
apply_penalties(
logits: Tensor,
prompt_tokens_tensor: Tensor,
output_tokens_tensor: Tensor,
presence_penalties: Tensor,
frequency_penalties: Tensor,
repetition_penalties: Tensor,
) -> Tensor
Applies penalties in place to the logits tensor
logits : The input logits tensor of shape [num_seqs, vocab_size]
prompt_tokens_tensor: A tensor containing the prompt tokens. The prompts
are padded to the maximum prompt length within the batch using
vocab_size
as the padding value. The value vocab_size
is used
for padding because it does not correspond to any valid token ID
in the vocabulary.
output_tokens_tensor: The output tokens tensor.
presence_penalties: The presence penalties of shape (num_seqs, )
frequency_penalties: The frequency penalties of shape (num_seqs, )
repetition_penalties: The repetition penalties of shape (num_seqs, )
Source code in vllm/model_executor/layers/utils.py
cpu_unquantized_gemm
¶
Source code in vllm/model_executor/layers/utils.py
default_unquantized_gemm
¶
dispatch_unquantized_gemm
¶
get_token_bin_counts_and_mask
¶
get_token_bin_counts_and_mask(
tokens: Tensor, vocab_size: int, num_seqs: int
) -> tuple[Tensor, Tensor]