vllm.transformers_utils.configs.ovis
AIMv2Config
¶
Bases: PretrainedConfig
This is the configuration class to store the configuration of an [AIMv2Model
].
Instantiating a configuration with the defaults will yield a similar configuration to that of the apple/aimv2-large-patch14-224.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hidden_size
|
int
|
Dimension of the hidden representations. |
1024
|
intermediate_size
|
int
|
Dimension of the SwiGLU representations. |
2816
|
num_hidden_layers
|
int
|
Number of hidden layers in the Transformer. |
24
|
num_attention_heads
|
int
|
Number of attention heads for each attention layer in the Transformer. |
8
|
num_channels
|
int
|
Number of input channels. |
3
|
image_size
|
int
|
Image size. |
224
|
patch_size
|
int
|
Patch size. |
14
|
rms_norm_eps
|
float
|
Epsilon value used for the RMS normalization layer. |
1e-05
|
attention_dropout
|
float
|
Dropout ratio for attention probabilities. |
0.0
|
projection_dropout
|
float
|
Dropout ratio for the projection layer after the attention. |
0.0
|
qkv_bias
|
bool
|
Whether to add a bias to the queries, keys and values. |
False
|
use_bias
|
bool
|
Whether to add a bias in the feed-forward and projection layers. |
False
|
kwargs
|
Any
|
Keyword arguments for the [ |
{}
|
Source code in vllm/transformers_utils/configs/ovis.py
__init__
¶
__init__(
hidden_size: int = 1024,
intermediate_size: int = 2816,
num_hidden_layers: int = 24,
num_attention_heads: int = 8,
num_channels: int = 3,
image_size: int = 224,
patch_size: int = 14,
rms_norm_eps: float = 1e-05,
attention_dropout: float = 0.0,
projection_dropout: float = 0.0,
qkv_bias: bool = False,
use_bias: bool = False,
**kwargs: Any,
)
Source code in vllm/transformers_utils/configs/ovis.py
Aimv2VisualTokenizerConfig
¶
Bases: BaseVisualTokenizerConfig
Source code in vllm/transformers_utils/configs/ovis.py
__init__
¶
BaseVisualTokenizerConfig
¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
__init__
¶
__init__(
vocab_size=16384,
tokenize_function="softmax",
tau=1.0,
depths=None,
drop_cls_token=False,
backbone_config: Optional[
Union[PretrainedConfig, dict]
] = None,
hidden_stride: int = 1,
**kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
OvisConfig
¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/ovis.py
conversation_formatter_class
instance-attribute
¶
__init__
¶
__init__(
llm_config: Optional[
Union[PretrainedConfig, dict]
] = None,
visual_tokenizer_config: Optional[
Union[PretrainedConfig, dict]
] = None,
multimodal_max_length=8192,
hidden_size=None,
conversation_formatter_class=None,
llm_attn_implementation=None,
disable_tie_weight=False,
**kwargs,
)
Source code in vllm/transformers_utils/configs/ovis.py
SiglipVisualTokenizerConfig
¶
Bases: BaseVisualTokenizerConfig