vllm.distributed.device_communicators.all2all
DeepEPAll2AllManagerBase
¶
Bases: All2AllManagerBase
All2All communication based on DeepEP High-Throughput kernels.
Source code in vllm/distributed/device_communicators/all2all.py
__init__
¶
Source code in vllm/distributed/device_communicators/all2all.py
combine
¶
destroy
¶
dispatch
¶
DeepEPHTAll2AllManager
¶
Bases: DeepEPAll2AllManagerBase
All2All communication based on DeepEP High-Throughput kernels.
Source code in vllm/distributed/device_communicators/all2all.py
__init__
¶
_make_all2all_kwargs
¶
Source code in vllm/distributed/device_communicators/all2all.py
get_handle
¶
Source code in vllm/distributed/device_communicators/all2all.py
DeepEPLLAll2AllManager
¶
Bases: DeepEPAll2AllManagerBase
All2All communication based on DeepEP Low-Latency kernels.
Source code in vllm/distributed/device_communicators/all2all.py
__init__
¶
_make_all2all_kwargs
¶
_make_all2all_kwargs(
max_num_tokens_per_dp_rank: int,
token_hidden_size: int,
num_ep_ranks: int,
num_global_experts: int,
num_local_experts: int,
) -> dict[Any, Any]
the maximum number of tokens a DP rank
can dispatch all the ranks must hold the same value.
token_hidden_size: the hidden dimension of each token. num_ep_ranks: the number of EP group ranks. num_global_experts: Number of experts in the model. num_local_experts: Number of experts in an EP rank.
Source code in vllm/distributed/device_communicators/all2all.py
get_handle
¶
The kwargs for DeepEPLLAll2AllManager is dictated by _make_all2all_kwargs.
Source code in vllm/distributed/device_communicators/all2all.py
NaiveAll2AllManager
¶
Bases: All2AllManagerBase
A naive implementation of all2all communication. It uses all-reduce under the hood, which is not efficient at all. The main purpose is for testing and debugging.
Source code in vllm/distributed/device_communicators/all2all.py
__init__
¶
combine
¶
Source code in vllm/distributed/device_communicators/all2all.py
destroy
¶
dispatch
¶
Source code in vllm/distributed/device_communicators/all2all.py
naive_multicast
¶
Source code in vllm/distributed/device_communicators/all2all.py
PPLXAll2AllManager
¶
Bases: All2AllManagerBase
All2All communication based on PPLX kernels.