Model Serving Guide#
Deployment#
Assuming you have already created a model in the Model Registry, a deployment can now be created to prepare a model artifact for this model and make it accessible for running predictions behind a REST or gRPC endpoint. Follow the Deployment Creation Guide to create a Deployment for your model.
Predictor#
Predictors are responsible for running a model server that loads a trained model, handles inference requests and returns predictions, see the Predictor Guide.
Transformer#
Transformers are used to apply transformations on the model inputs before sending them to the predictor for making predictions using the model, see the Transformer Guide.
Resource Allocation#
Configure the resources to be allocated for predictor and transformer in a model deployment, see the Resource Allocation Guide.
Inference Batcher#
Configure the predictor to batch inference requests, see the Inference Batcher Guide.
Inference Logger#
Configure the predictor to log inference requests and predictions, see the Inference Logger Guide.
Troubleshooting#
Inspect the model server logs to troubleshoot your model deployments, see the Troubleshooting Guide.