vllm.model_executor.layers.quantization.kernels.mixed_precision
Modules:
Name | Description |
---|---|
MPLinearKernel |
|
allspark |
|
bitblas |
|
exllama |
|
machete |
|
marlin |
|
_POSSIBLE_KERNELS
module-attribute
¶
_POSSIBLE_KERNELS: list[type[MPLinearKernel]] = [
MacheteLinearKernel,
AllSparkLinearKernel,
MarlinLinearKernel,
BitBLASLinearKernel,
ExllamaLinearKernel,
]
choose_mp_linear_kernel
¶
choose_mp_linear_kernel(
config: MPLinearLayerConfig,
compute_capability: Optional[int] = None,
) -> type[MPLinearKernel]
Choose an MPLinearKernel that can implement the given config for the given compute capability. Attempts to choose the best kernel in terms of performance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config
|
MPLinearLayerConfig
|
Description of the linear layer to be implemented. |
required |
compute_capability
|
Optional[int]
|
The compute capability of
the target device, if None uses |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If no kernel can implement the given config. |
Returns:
Type | Description |
---|---|
type[MPLinearKernel]
|
type[MPLinearKernel]: Chosen kernel. |