This is by far my favorite library to deploy Large Language Models for inference. If you have access to a reasonable accelerator this will get you up and running in minutes with out all the fuss of creating an API and writing the engine to run inference.