vllm.model_executor.layers.quantization.tpu_int8
Int8TpuConfig
¶
Bases: QuantizationConfig
Int8 Quantization Config class for TPU Backend.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
__init__
¶
__init__(activation_scheme: str = 'none') -> None
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
from_config
classmethod
¶
from_config(config: dict[str, Any]) -> Int8TpuConfig
get_config_filenames
staticmethod
¶
get_name
¶
get_name() -> QuantizationMethods
get_quant_method
¶
get_quant_method(
layer: Module, prefix: str
) -> Optional[TPUInt8LinearMethod]
TPUInt8LinearMethod
¶
Bases: LinearMethodBase
Int8 Linear method for TPU Quant.
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
__init__
¶
__init__(quant_config: Int8TpuConfig)
_quantize_weight
¶
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
apply
¶
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
create_weights
¶
create_weights(
layer: Module,
input_size_per_partition: int,
output_partition_sizes: list[int],
input_size: int,
output_size: int,
params_dtype: dtype,
**extra_weight_attrs,
)
Source code in vllm/model_executor/layers/quantization/tpu_int8.py
process_weights_after_loading
¶
process_weights_after_loading(layer: Module) -> None