vllm.v1.worker.gpu.mm.encoder_cache ¶
EncoderCache ¶
Source code in vllm/v1/worker/gpu/mm/encoder_cache.py
reset_encoder_cache ¶
Clear the GPU-side encoder cache storing vision embeddings.
This should be called when model weights are updated to ensure stale embeddings computed with old weights are not reused.
Source code in vllm/v1/worker/gpu/mm/encoder_cache.py
reset_mm_cache ¶
Clear the multi-modal cache that was used during profiling, but no longer needed during inference.