To run inference on a single or multiple GPUs, use VLLM class from langchain.
Code
fromlangchain_community.llmsimportVLLMllm=VLLM(model="mosaicml/mpt-7b",trust_remote_code=True,# mandatory for hf modelsmax_new_tokens=128,top_k=10,top_p=0.95,temperature=0.8,# tensor_parallel_size=... # for distributed inference)print(llm("What is the capital of France ?"))