Predictor#

Handle#

get_model_serving#

Connection.get_model_serving()

Get a reference to model serving to perform operations on. Model serving operates on top of a model registry, defaulting to the project's default model registry.

Example

import hopsworks

project = hopsworks.login()

ms = project.get_model_serving()

Returns

ModelServing. A model serving handle object to perform operations on.

Creation#

[source]

create_predictor#

ModelServing.create_predictor(
    model,
    name=None,
    artifact_version="CREATE",
    serving_tool=None,
    script_file=None,
    config_file=None,
    resources=None,
    inference_logger=None,
    inference_batcher=None,
    transformer=None,
    api_protocol="REST",
)

Create a Predictor metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()

Lazy

This method is lazy and does not persist any metadata or deploy any model on its own. To create a deployment using this predictor, call the deploy() method.

Arguments

model hsml.model.Model: Model to be deployed.
name str | None: Name of the predictor.
artifact_version str | None: Version number of the model artifact to deploy, CREATE to create a new model artifact or MODEL-ONLY to reuse the shared artifact containing only the model files.
serving_tool str | None: Serving tool used to deploy the model server.
script_file str | None: Path to a custom predictor script implementing the Predict class.
config_file str | None: Model server configuration file to be passed to the model deployment. It can be accessed via CONFIG_FILE_PATH environment variable from a predictor script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.
resources hsml.resources.PredictorResources | dict | None: Resources to be allocated for the predictor.
inference_logger hsml.inference_logger.InferenceLogger | dict | str | None: Inference logger configuration.
inference_batcher hsml.inference_batcher.InferenceBatcher | dict | None: Inference batcher configuration.
transformer hsml.transformer.Transformer | dict | None: Transformer to be deployed together with the predictor.
api_protocol str | None: API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC'). Defaults to 'REST'.

Returns

Predictor. The predictor metadata object.

Retrieval#

deployment.predictor#

Predictors can be accessed from the deployment metadata objects.

deployment.predictor

To retrieve a deployment, see the Deployment Reference.

Properties#

[source]

api_protocol#

API protocol enabled in the predictor (e.g., HTTP or GRPC).

[source]

artifact_files_path#

[source]

artifact_path#

Path of the model artifact deployed by the predictor. Resolves to /Projects/{project_name}/Models/{name}/{version}/Artifacts/{artifact_version}/{name}{version}.zip

[source]

artifact_version#

Artifact version deployed by the predictor.

[source]

config_file#

Model server configuration file passed to the model deployment. It can be accessed via CONFIG_FILE_PATH environment variable from a predictor or transformer script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.

[source]

created_at#

Created at date of the predictor.

[source]

creator#

Creator of the predictor.

[source]

description#

Description of the predictor.

[source]

environment#

Name of the inference environment

[source]

id#

Id of the predictor.

[source]

inference_batcher#

Configuration of the inference batcher attached to the deployment component (i.e., predictor or transformer).

[source]

inference_logger#

Configuration of the inference logger attached to this predictor.

[source]

model_framework#

Model framework of the model to be deployed by the predictor.

[source]

model_name#

Name of the model deployed by the predictor.

[source]

model_path#

Model path deployed by the predictor.

[source]

model_server#

Model server used by the predictor.

[source]

model_version#

Model version deployed by the predictor.

[source]

name#

Name of the predictor.

[source]

project_namespace#

Kubernetes project namespace

[source]

requested_instances#

Total number of requested instances in the predictor.

[source]

resources#

Resource configuration for the deployment component (i.e., predictor or transformer).

[source]

script_file#

Script file used to load and run the model.

[source]

serving_tool#

Serving tool used to run the model server.

[source]

transformer#

Transformer configuration attached to the predictor.

Methods#

[source]

deploy#

Predictor.deploy()

Create a deployment for this predictor and persists it in the Model Serving.

Example

import hopsworks

project = hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)
my_deployment = my_predictor.deploy()

print(my_deployment.get_state())

Returns

Deployment. The deployment metadata object of a new or existing deployment.

[source]

describe#

Predictor.describe()

Print a description of the predictor

[source]

to_dict#

Predictor.to_dict()

To be implemented by the component type