The NLP Cypher | 11.21.21
Hey … so have you ever deployed a state-of-the-art production level inference server? Don’t know how to do it? Well… last week, Michael Benesty dropped a bomb when he published one of the first ever detailed blogs on how to not only deploy a production level inference API but benchmarking some of the most widely used frameworks such as FastAPI and Triton servers and runtime engines such as ONNX runtime (ORT) and TensorRT (TRT). Eventually, Michael recreated Hugging Face’s ability […]
Read more