Deployment#

Handle#

get_model_serving#

Project.get_model_serving()

Connect to Project's Model Serving API.

Example for getting the Model Serving API of a project

import hopsworks

project = hopsworks.login()

ms = project.get_model_serving()

Returns

hsml.model_serving.ModelServing: The Model Serving API

Raises

hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

Creation#

[source]

create_deployment#

ModelServing.create_deployment(predictor, name=None, environment=None)

Create a Deployment metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = ms.create_deployment(my_predictor)
my_deployment.save()

Using the model object

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

my_deployment = my_model.deploy()

my_deployment.get_state().describe()

Using the Model Serving handle

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()

my_deployment.get_state().describe()

Lazy

This method is lazy and does not persist any metadata or deploy any model. To create a deployment, call the save() method.

Arguments

predictor hsml.predictor.Predictor: predictor to be used in the deployment
name str | None: name of the deployment
environment str | None: The inference environment to use

Returns

Deployment. The deployment metadata object.

[source]

deploy#

Model.deploy(
    name=None,
    description=None,
    artifact_version="CREATE",
    serving_tool=None,
    script_file=None,
    config_file=None,
    resources=None,
    inference_logger=None,
    inference_batcher=None,
    transformer=None,
    api_protocol="REST",
    environment=None,
)

Deploy the model.

Example

import hopsworks

project = hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

my_deployment = my_model.deploy()

Arguments

name str | None: Name of the deployment.
description str | None: Description of the deployment.
artifact_version str | None: Version number of the model artifact to deploy, CREATE to create a new model artifact or MODEL-ONLY to reuse the shared artifact containing only the model files.
serving_tool str | None: Serving tool used to deploy the model server.
script_file str | None: Path to a custom predictor script implementing the Predict class.
config_file str | None: Model server configuration file to be passed to the model deployment. It can be accessed via CONFIG_FILE_PATH environment variable from a predictor or transformer script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.
resources hsml.resources.PredictorResources | dict | None: Resources to be allocated for the predictor.
inference_logger hsml.inference_logger.InferenceLogger | dict | None: Inference logger configuration.
inference_batcher hsml.inference_batcher.InferenceBatcher | dict | None: Inference batcher configuration.
transformer hsml.transformer.Transformer | dict | None: Transformer to be deployed together with the predictor.
api_protocol str | None: API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC'). Defaults to 'REST'.
environment str | None: The inference environment to use.

Returns

Deployment: The deployment metadata object of a new or existing deployment.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

deploy#

Predictor.deploy()

Create a deployment for this predictor and persists it in the Model Serving.

Example

import hopsworks

project = hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)
my_deployment = my_predictor.deploy()

print(my_deployment.get_state())

Returns

Deployment. The deployment metadata object of a new or existing deployment.

Retrieval#

[source]

get_deployment#

ModelServing.get_deployment(name=None)

Get a deployment by name from Model Serving.

Example

# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by name
my_deployment = ms.get_deployment('deployment_name')

Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

Arguments

name str: Name of the deployment to get.

Returns

Deployment: The deployment metadata object or None if it does not exist.

Raises

hopsworks.client.exceptions.RestAPIError: If unable to retrieve deployment from model serving.

[source]

get_deployment_by_id#

ModelServing.get_deployment_by_id(id)

Get a deployment by id from Model Serving. Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

Example

# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by id
my_deployment = ms.get_deployment_by_id(1)

Arguments

id int: Id of the deployment to get.

Returns

Deployment: The deployment metadata object or None if it does not exist.

Raises

hopsworks.client.exceptions.RestAPIError: If unable to retrieve deployment from model serving.

[source]

get_deployments#

ModelServing.get_deployments(model=None, status=None)

Get all deployments from model serving.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

list_deployments = ms.get_deployment(my_model)

for deployment in list_deployments:
    print(deployment.get_state())

Arguments

model hsml.model.Model: Filter by model served in the deployments
status str: Filter by status of the deployments

Returns

List[Deployment]: A list of deployments.

Raises

hopsworks.client.exceptions.RestAPIError: If unable to retrieve deployments from model serving.

Properties#

[source]

api_protocol#

API protocol enabled in the deployment (e.g., HTTP or GRPC).

[source]

artifact_files_path#

Path of the artifact files deployed by the predictor.

[source]

artifact_path#

Path of the model artifact deployed by the predictor.

[source]

artifact_version#

Artifact version deployed by the predictor.

[source]

config_file#

Model server configuration file passed to the model deployment. It can be accessed via CONFIG_FILE_PATH environment variable from a predictor or transformer script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.

[source]

created_at#

Created at date of the predictor.

[source]

creator#

Creator of the predictor.

[source]

description#

Description of the deployment.

[source]

environment#

Name of inference environment

[source]

id#

Id of the deployment.

[source]

inference_batcher#

Configuration of the inference batcher attached to this predictor.

[source]

inference_logger#

Configuration of the inference logger attached to this predictor.

[source]

model_name#

Name of the model deployed by the predictor

[source]

model_path#

Model path deployed by the predictor.

[source]

model_registry_id#

Model Registry Id of the deployment.

[source]

model_server#

Model server ran by the predictor.

[source]

model_version#

Model version deployed by the predictor.

[source]

name#

Name of the deployment.

[source]

predictor#

Predictor used in the deployment.

[source]

project_namespace#

Name of inference environment

[source]

requested_instances#

Total number of requested instances in the deployment.

[source]

resources#

Resource configuration for the predictor.

[source]

script_file#

Script file used by the predictor.

[source]

serving_tool#

Serving tool used to run the model server.

[source]

transformer#

Transformer configured in the predictor.

Methods#

[source]

delete#

Deployment.delete(force=False)

Delete the deployment

Arguments

force: Force the deletion of the deployment. If the deployment is running, it will be stopped and deleted automatically. !!! warn A call to this method does not ask for a second confirmation.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

describe#

Deployment.describe()

Print a description of the deployment

[source]

download_artifact_files#

Deployment.download_artifact_files(local_path=None)

Download the artifact files served by the deployment

Arguments

local_path: path where to download the artifact files in the local filesystem

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

get_logs#

Deployment.get_logs(component="predictor", tail=10)

Prints the deployment logs of the predictor or transformer.

Arguments

component: Deployment component to get the logs from (e.g., predictor or transformer)
tail: Number of most recent lines to retrieve from the logs.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

get_model#

Deployment.get_model()

Retrieve the metadata object for the model being used by this deployment

[source]

get_state#

Deployment.get_state()

Get the current state of the deployment

Returns

PredictorState. The state of the deployment.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

get_url#

Deployment.get_url()

Get url to the deployment in Hopsworks

[source]

is_created#

Deployment.is_created()

Check whether the deployment is created.

Returns

bool. Whether the deployment is created or not.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

is_running#

Deployment.is_running(or_idle=True, or_updating=True)

Check whether the deployment is ready to handle inference requests

Arguments

or_idle: Whether the idle state is considered as running (default is True)
or_updating: Whether the updating state is considered as running (default is True)

Returns

bool. Whether the deployment is ready or not.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

is_stopped#

Deployment.is_stopped(or_created=True)

Check whether the deployment is stopped

Arguments

or_created: Whether the creating and created state is considered as stopped (default is True)

Returns

bool. Whether the deployment is stopped or not.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

predict#

Deployment.predict(data=None, inputs=None)

Send inference requests to the deployment. One of data or inputs parameters must be set. If both are set, inputs will be ignored.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# retrieve deployment by name
my_deployment = ms.get_deployment("my_deployment")

# (optional) retrieve model input example
my_model = project.get_model_registry()                                .get_model(my_deployment.model_name, my_deployment.model_version)

# make predictions using model inputs (single or batch)
predictions = my_deployment.predict(inputs=my_model.input_example)

# or using more sophisticated inference request payloads
data = { "instances": [ my_model.input_example ], "key2": "value2" }
predictions = my_deployment.predict(data)

Arguments

data Dict | hopsworks_common.client.istio.utils.infer_type.InferInput | None: Payload dictionary for the inference request including the model input(s)
inputs List | Dict | None: Model inputs used in the inference requests

Returns

dict. Inference response.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

save#

Deployment.save(await_update=600)

Persist this deployment including the predictor and metadata to Model Serving.

Arguments

await_update int | None: If the deployment is running, awaiting time (seconds) for the running instances to be updated. If the running instances are not updated within this timespan, the call to this method returns while the update in the background.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

start#

Deployment.start(await_running=600)

Start the deployment

Arguments

await_running int | None: Awaiting time (seconds) for the deployment to start. If the deployment has not started within this timespan, the call to this method returns while it deploys in the background.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue

[source]

stop#

Deployment.stop(await_stopped=600)

Stop the deployment

Arguments

await_stopped int | None: Awaiting time (seconds) for the deployment to stop. If the deployment has not stopped within this timespan, the call to this method returns while it stopping in the background.

Raises

hopsworks.client.exceptions.RestAPIError: In case the backend encounters an issue