vllm.model_executor.models.colbert ¶
ColBERT late interaction model for retrieval and reranking.
ColBERT uses per-token embeddings and late interaction (MaxSim) scoring instead of single-vector representations or cross-encoder concatenation.
Reference: https://arxiv.org/abs/2004.12832
ColBERTModel ¶
Bases: BertEmbeddingModel
ColBERT late interaction model for retrieval/reranking.
This model extends BertEmbeddingModel with a ColBERT-style linear projection layer for per-token embeddings. It supports only: - "token_embed" task: Per-token embeddings for late interaction
ColBERT is fundamentally a per-token embedding model - the linear projection is trained for per-token representations, not for CLS pooling. Use a dedicated dense embedding model if you need single- vector representations.
The ColBERT scoring (MaxSim) is computed externally, either client-side or via the late interaction scoring path in ServingScores.
Attributes:
| Name | Type | Description |
|---|---|---|
colbert_linear | Linear projection from hidden_size to colbert_dim | |
supports_late_interaction | Literal[True] | Flag indicating this model uses late interaction scoring |
Source code in vllm/model_executor/models/colbert.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
colbert_dim instance-attribute ¶
colbert_dim: int | None = (
getattr(config, "colbert_dim", None)
or getattr(config, "dim", None)
or getattr(config, "projection_dim", None)
)
__init__ ¶
__init__(*, vllm_config: VllmConfig, prefix: str = '')
Source code in vllm/model_executor/models/colbert.py
_build_colbert_linear ¶
_build_colbert_linear() -> Linear
Build the ColBERT linear projection layer.
Source code in vllm/model_executor/models/colbert.py
_build_model ¶
_build_model(
vllm_config: VllmConfig, prefix: str = ""
) -> BertModel
_build_pooler ¶
_build_pooler(pooler_config: PoolerConfig) -> Pooler