vllm.model_executor.models.voyage ¶
VoyageQwen3BidirectionalEmbedModel ¶
Bases: Qwen3Model
Qwen3Model + Voyage embedding head + bidirectional attention.
Checkpoint conventions (HF): - MLP: gate_proj + up_proj (unfused) - Attn: q_proj + k_proj + v_proj (unfused) - Linear head: linear.weight - Weights prefixed with "model." (e.g., model.layers.0...)
vLLM Qwen3Model expects
- mlp.gate_up_proj (fused)
- self_attn.qkv_proj (fused)
- No "model." prefix
We remap/fuse weights using generator pipeline and load directly (bypassing parent's stacked_params_mapping which would cause double-transformation like qkv_proj -> qkqkv_proj).
Source code in vllm/model_executor/models/voyage.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
hf_to_vllm_mapper class-attribute instance-attribute ¶
hf_to_vllm_mapper = WeightsMapper(
orig_to_new_prefix={"model.": ""}
)
__init__ ¶
_fuse_gate_up_proj ¶
_fuse_gate_up_proj(
weights: Iterable[WeightItem],
) -> Iterable[WeightItem]
Fuse gate_proj and up_proj into gate_up_proj.
Source code in vllm/model_executor/models/voyage.py
_fuse_qkv_proj ¶
_fuse_qkv_proj(
weights: Iterable[WeightItem],
) -> Iterable[WeightItem]
Fuse q_proj, k_proj, v_proj into qkv_proj.
Source code in vllm/model_executor/models/voyage.py
forward ¶
load_weights ¶
load_weights(weights: Iterable[WeightItem]) -> set[str]
Remap, fuse, and load weights using generator pipeline.