vllm.model_executor.model_loader.neuron
Utilities for selecting and loading Neuron models in transformers-neuronx framework.
TORCH_DTYPE_TO_NEURON_AMP
module-attribute
¶
TORCH_DTYPE_TO_NEURON_AMP = {
"auto": "f32",
"half": "f16",
"float16": "f16",
"bfloat16": "bf16",
"float": "f32",
"float32": "f32",
float16: "f16",
bfloat16: "bf16",
float32: "f32",
}
_NEURON_SUPPORTED_MODELS
module-attribute
¶
_NEURON_SUPPORTED_MODELS: dict[
str, tuple[str, str, str]
] = {
"LlamaForCausalLM": (
"transformers_neuronx.llama.model",
"LlamaForSampling",
"LlamaForCausalLM",
),
"MistralForCausalLM": (
"transformers_neuronx.mistral.model",
"MistralForSampling",
"MistralForCausalLM",
),
}
NeuronCausalLM
¶
Bases: Module
Source code in vllm/model_executor/model_loader/neuron.py
logits_processor
instance-attribute
¶
logits_processor = LogitsProcessor(
vocab_size, logits_as_input=True
)
on_device_sampling_disabled
instance-attribute
¶
__init__
¶
__init__(
config: PretrainedConfig,
on_device_sampling_disabled: bool = False,
) -> None
Source code in vllm/model_executor/model_loader/neuron.py
compute_logits
¶
compute_logits(
hidden_states: Tensor,
sampling_metadata: SamplingMetadata,
) -> Tensor
forward
¶
load_weights
¶
load_weights(model_name_or_path: str, **kwargs)
Source code in vllm/model_executor/model_loader/neuron.py
sample
¶
sample(
logits: Tensor, sampling_metadata: SamplingMetadata
) -> Optional[SamplerOutput]
Source code in vllm/model_executor/model_loader/neuron.py
NeuronSpeculationCausalLM
¶
Bases: Module
A Neuron-optimized causal language model with speculative decoding.
Source code in vllm/model_executor/model_loader/neuron.py
__init__
¶
forward
¶
Source code in vllm/model_executor/model_loader/neuron.py
sample
¶
sample(
logits: Tensor, sampling_metadata: SamplingMetadata
) -> Optional[list[SamplerOutput]]
Source code in vllm/model_executor/model_loader/neuron.py
_get_buckets
¶
Source code in vllm/model_executor/model_loader/neuron.py
_get_default_neuron_config
¶
_get_default_neuron_config(
model_config: ModelConfig,
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
)
Generate a neuron config based on vllm config args.
Source code in vllm/model_executor/model_loader/neuron.py
_get_default_neuron_config_for_speculation
¶
_get_default_neuron_config_for_speculation(
model_config: ModelConfig,
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
)
Generate a neuron config for speculative decoding based on vllm config args.
Source code in vllm/model_executor/model_loader/neuron.py
_get_model_architecture
¶
_get_model_architecture(config: PretrainedConfig) -> str
Source code in vllm/model_executor/model_loader/neuron.py
_get_neuron_config_after_override
¶
Source code in vllm/model_executor/model_loader/neuron.py
_get_neuron_on_device_generation_config
¶
_get_neuron_on_device_generation_config(
model_config: ModelConfig,
)
_is_neuron_on_device_sampling_disabled
¶
_is_neuron_on_device_sampling_disabled(
model_config: ModelConfig,
) -> bool
get_neuron_eagle_speculation_model
¶
get_neuron_eagle_speculation_model(
model_config: ModelConfig,
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
speculation_config: SpeculativeConfig,
)
Initializes a neuron-optimized EAGLE speculation model for inference.
Source code in vllm/model_executor/model_loader/neuron.py
get_neuron_model
¶
get_neuron_model(
model_config: ModelConfig,
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
) -> Module
Initializes a neuron-optimized model for inference.
Source code in vllm/model_executor/model_loader/neuron.py
get_neuron_speculation_model
¶
get_neuron_speculation_model(
model_config: ModelConfig,
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
speculation_config: SpeculativeConfig,
)
Initializes a neuron-optimized speculation model for inference.
This method is only applicable for speculation with a standalone draft model
Source code in vllm/model_executor/model_loader/neuron.py
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 |
|