Skip to content
vLLM
vllm.model_executor.layers.quantization.kernels
Initializing search
GitHub
Home
User Guide
Developer Guide
API Reference
CLI Reference
Community
vLLM
GitHub
Home
User Guide
Developer Guide
API Reference
API Reference
API Reference
Contents
Contents
vllm.beam_search
vllm.collect_env
vllm.config
vllm.connections
vllm.env_override
vllm.envs
vllm.forward_context
vllm
vllm.jsontree
vllm.logger
vllm.logits_process
vllm.outputs
vllm.pooling_params
vllm.sampling_params
vllm.scalar_type
vllm.scripts
vllm.sequence
vllm.test_utils
vllm.tracing
vllm.version
adapter_commons
assets
attention
benchmarks
compilation
core
device_allocator
distributed
engine
entrypoints
executor
inputs
logging_utils
lora
model_executor
model_executor
vllm.model_executor
vllm.model_executor.custom_op
vllm.model_executor.parameter
vllm.model_executor.pooling_metadata
vllm.model_executor.sampling_metadata
vllm.model_executor.utils
guided_decoding
layers
layers
vllm.model_executor.layers
vllm.model_executor.layers.activation
vllm.model_executor.layers.layernorm
vllm.model_executor.layers.lightning_attn
vllm.model_executor.layers.linear
vllm.model_executor.layers.logits_processor
vllm.model_executor.layers.pooler
vllm.model_executor.layers.rejection_sampler
vllm.model_executor.layers.resampler
vllm.model_executor.layers.rotary_embedding
vllm.model_executor.layers.sampler
vllm.model_executor.layers.spec_decode_base_sampler
vllm.model_executor.layers.typical_acceptance_sampler
vllm.model_executor.layers.utils
vllm.model_executor.layers.vocab_parallel_embedding
fused_moe
mamba
quantization
quantization
vllm.model_executor.layers.quantization
vllm.model_executor.layers.quantization.aqlm
vllm.model_executor.layers.quantization.auto_round
vllm.model_executor.layers.quantization.awq
vllm.model_executor.layers.quantization.awq_marlin
vllm.model_executor.layers.quantization.awq_triton
vllm.model_executor.layers.quantization.base_config
vllm.model_executor.layers.quantization.bitblas
vllm.model_executor.layers.quantization.bitsandbytes
vllm.model_executor.layers.quantization.deepgemm
vllm.model_executor.layers.quantization.deepspeedfp
vllm.model_executor.layers.quantization.experts_int8
vllm.model_executor.layers.quantization.fbgemm_fp8
vllm.model_executor.layers.quantization.fp8
vllm.model_executor.layers.quantization.gguf
vllm.model_executor.layers.quantization.gptq
vllm.model_executor.layers.quantization.gptq_bitblas
vllm.model_executor.layers.quantization.gptq_marlin
vllm.model_executor.layers.quantization.gptq_marlin_24
vllm.model_executor.layers.quantization.hqq_marlin
vllm.model_executor.layers.quantization.ipex_quant
vllm.model_executor.layers.quantization.kv_cache
vllm.model_executor.layers.quantization.marlin
vllm.model_executor.layers.quantization.modelopt
vllm.model_executor.layers.quantization.moe_wna16
vllm.model_executor.layers.quantization.neuron_quant
vllm.model_executor.layers.quantization.ptpc_fp8
vllm.model_executor.layers.quantization.qqq
vllm.model_executor.layers.quantization.rtn
vllm.model_executor.layers.quantization.schema
vllm.model_executor.layers.quantization.torchao
vllm.model_executor.layers.quantization.tpu_int8
compressed_tensors
kernels
kernels
vllm.model_executor.layers.quantization.kernels
vllm.model_executor.layers.quantization.kernels
Table of contents
kernels
mixed_precision
scaled_mm
quark
utils
model_loader
models
multimodal
platforms
plugins
profiler
prompt_adapter
reasoning
spec_decode
transformers_utils
triton_utils
usage
utils
v1
worker
CLI Reference
Community
Table of contents
kernels
vllm.model_executor.layers.quantization.kernels
Modules:
Name
Description
mixed_precision
scaled_mm
Back to top