vllm.transformers_utils.configs.ultravox
UltravoxConfig
¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of a
[UltravoxForConditionalGeneration
]. It is used to instantiate an
Ultravox model according to the specified arguments, defining the model
architecture.
Configuration objects inherit from [PretrainedConfig
] and can be used to
control the model outputs. Read the documentation from [PretrainedConfig
]
for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_config
|
`Union[AutoConfig, dict]`, *optional*
|
Custom audio config or dict |
None
|
text_config
|
`Union[AutoConfig, dict]`, *optional*
|
The config object of the text backbone. Can be any of |
None
|
ignore_index
|
`int`, *optional*, defaults to -100
|
The ignore index for the loss function. |
-100
|
audio_token_index
|
`int`, *optional*, defaults to 32000
|
The audio token index to encode the audio prompt. |
32000
|
stack_factor
|
`int`, *optional*, defaults to 8
|
Audio downsampling factor for the multimodal projector. |
8
|
norm_init
|
`float`, *optional*, defaults to 0.4
|
The initialization value for the layer normalization. |
0.4
|
projector_act
|
`str`, *optional*, defaults to `"swiglu"`
|
The activation function used by the multimodal projector. |
'swiglu'
|
text_model_lora_config
|
`LoraConfigSimplified`, *optional*
|
The LoRA configuration for finetuning the text model. |
None
|
audio_model_lora_config
|
`LoraConfigSimplified`, *optional*
|
The LoRA configuration for finetuning the audio model. |
None
|
projector_ln_mid
|
`bool`, *optional*, defaults to `False`
|
Whether to apply layer normalization at the middle of the
projector or at the end. Versions v0.4.1 and below
use |
False
|
Source code in vllm/transformers_utils/configs/ultravox.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
audio_config
instance-attribute
¶
audio_config = get_config(
audio_model_id, trust_remote_code=False
)
audio_model_lora_config
instance-attribute
¶
__init__
¶
__init__(
audio_config: Optional[dict[str, Any]] = None,
text_config: Optional[dict[str, Any]] = None,
audio_model_id: Optional[str] = None,
text_model_id: Optional[str] = None,
ignore_index: int = -100,
audio_token_index: int = 32000,
hidden_size: int = 4096,
stack_factor: int = 8,
norm_init: float = 0.4,
projector_act: str = "swiglu",
text_model_lora_config: Optional[dict[str, Any]] = None,
audio_model_lora_config: Optional[
dict[str, Any]
] = None,
projector_ln_mid: bool = False,
**kwargs,
)