vllm.compilation.wrapper
TorchCompileWrapperWithCustomDispatcher
¶
A wrapper class for torch.compile, with a custom dispatch logic.
Subclasses should:
1. Implement the forward method
2. Implement the dispatch logic in the call method
It can use self.compiled_codes
to access the compiled bytecode,
and with self.dispatch_to_code(index):
to dispatch to
the compiled code.
3. Implement the __init__
method to determine how to call
torch.compile
over the forward method.
Source code in vllm/compilation/wrapper.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
|
use_custom_dispatcher
instance-attribute
¶
use_custom_dispatcher: bool = (
compilation_level >= DYNAMO_ONCE
)
__call__
¶
Implement the dispatch logic here, beyond the torch.compile level. NOTE: this function can have additional arguments beyond the forward method, for directly dispatching to the compiled code.
Source code in vllm/compilation/wrapper.py
__init__
¶
Source code in vllm/compilation/wrapper.py
bytecode_hook
¶
Hook to save the compiled bytecode for direct execution.
Source code in vllm/compilation/wrapper.py
dispatch_to_code
¶
dispatch_to_code(index: int)
Context manager to dispatch to the compiled code. Why does this work? Because Dynamo guarantees that the compiled bytecode has exactly the same arguments, cell variables, and free variables as the original code. Therefore we can directly switch the code object in the function and call it.
See https://dev-discuss.pytorch.org/t/what-is-the-relationship-requirement-among-original-bytecode-transformed-bytecode-and-bytecode-returned-by-hooks-in-dynamo/1693/7 for more details.