Skip to content

hsml.model_serving #

ModelServing #

project_id property #

Id of the project in which Model Serving is located.

project_name property #

Name of the project in which Model Serving is located.

project_path property #

Path of the project the registry is connected to.

create_deployment #

create_deployment(
    predictor: Predictor,
    name: str | None = None,
    environment: str | None = None,
) -> Deployment

Create a Deployment metadata object.

Example
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = ms.create_deployment(my_predictor)
my_deployment.save()
Using the model object
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

my_deployment = my_model.deploy()

my_deployment.get_state().describe()
Using the Model Serving handle
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()

my_deployment.get_state().describe()
Lazy

This method is lazy and does not persist any metadata or deploy any model. To create a deployment, call the save() method.

PARAMETER DESCRIPTION
predictor

predictor to be used in the deployment

TYPE: Predictor

name

name of the deployment

TYPE: str | None DEFAULT: None

environment

(Deprecated) The project Python environment to use. This argument will be ignored, use the argument environment in the create_predictor() or create_endpoint() methods instead.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
Deployment

The deployment metadata object.

create_endpoint #

create_endpoint(
    name: str,
    script_file: str,
    description: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    environment: str | None = None,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
) -> Predictor

Create an Entrypoint metadata object.

Example
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
ms = project.get_model_serving()

my_endpoint = ms.create_entrypoint(name="feature_server", entrypoint_file="feature_server.py")

my_deployment = my_endpoint.deploy()
Lazy

This method is lazy and does not persist any metadata or deploy any endpoint on its own. To create a deployment using this endpoint, call the deploy() method.

PARAMETER DESCRIPTION
name

Name of the endpoint.

TYPE: str

script_file

Path to a custom script file implementing a HTTP server.

TYPE: str

description

Description of the endpoint.

TYPE: str | None DEFAULT: None

resources

Resources to be allocated for the predictor.

TYPE: PredictorResources | dict | None DEFAULT: None

inference_logger

Inference logger configuration.

TYPE: InferenceLogger | dict | str | None DEFAULT: None

inference_batcher

Inference batcher configuration.

TYPE: InferenceBatcher | dict | None DEFAULT: None

api_protocol

API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC').

TYPE: str | None DEFAULT: IE.API_PROTOCOL_REST

environment

The project Python environment to use

TYPE: str | None DEFAULT: None

scaling_configuration

Scaling configuration for the predictor.

TYPE: PredictorScalingConfig | dict | None DEFAULT: None

env_vars

Environment variables to set on the predictor.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Predictor

The predictor metadata object.

create_predictor #

create_predictor(
    model: Model,
    name: str | None = None,
    artifact_version: str | None = None,
    serving_tool: str | None = None,
    script_file: str | None = None,
    config_file: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    transformer: Transformer | dict | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    environment: str | None = None,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
    vllm_variant: str | None = None,
    vllm_image_tag: str | None = None,
) -> Predictor

Create a Predictor metadata object.

Example
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()
Lazy

This method is lazy and does not persist any metadata or deploy any model on its own. To create a deployment using this predictor, call the deploy() method.

PARAMETER DESCRIPTION
model

Model to be deployed.

TYPE: Model

name

Name of the predictor.

TYPE: str | None DEFAULT: None

artifact_version

(Deprecated) Version number of the model artifact to deploy, CREATE to create a new model artifact or MODEL-ONLY to reuse the shared artifact containing only the model files.

TYPE: str | None DEFAULT: None

serving_tool

Serving tool used to deploy the model server.

TYPE: str | None DEFAULT: None

script_file

Path to a custom predictor script implementing the Predict class.

TYPE: str | None DEFAULT: None

config_file

Model server configuration file to be passed to the model deployment. It can be accessed via CONFIG_FILE_PATH environment variable from a predictor script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.

TYPE: str | None DEFAULT: None

resources

Resources to be allocated for the predictor.

TYPE: PredictorResources | dict | None DEFAULT: None

inference_logger

Inference logger configuration.

TYPE: InferenceLogger | dict | str | None DEFAULT: None

inference_batcher

Inference batcher configuration.

TYPE: InferenceBatcher | dict | None DEFAULT: None

transformer

Transformer to be deployed together with the predictor.

TYPE: Transformer | dict | None DEFAULT: None

api_protocol

API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC').

TYPE: str | None DEFAULT: IE.API_PROTOCOL_REST

environment

The project Python environment to use

TYPE: str | None DEFAULT: None

scaling_configuration

Scaling configuration for the predictor.

TYPE: PredictorScalingConfig | dict | None DEFAULT: None

env_vars

Environment variables to set on the predictor.

TYPE: dict | None DEFAULT: None

vllm_variant

vLLM image variant for vLLM deployments. One of 'VLLM' or 'VLLM_OMNI'. Ignored for non-vLLM model servers.

TYPE: str | None DEFAULT: None

vllm_image_tag

vLLM image tag override. None uses the cluster default; if set, it should match one of the tags made available by a cluster administrator. Ignored for non-vLLM model servers.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
Predictor

The predictor metadata object.

create_transformer #

create_transformer(
    script_file: str | None = None,
    resources: PredictorResources | dict | None = None,
    scaling_configuration: TransformerScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
) -> Transformer

Create a Transformer metadata object.

Example
# login into Hopsworks using hopsworks.login()

# get Dataset API instance
dataset_api = project.get_dataset_api()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# create my_transformer.py Python script
class Transformer(object):
    def __init__(self):
        ''' Initialization code goes here '''
        pass

    def preprocess(self, inputs):
        ''' Transform the requests inputs here. The object returned by this method will be used as model input to make predictions. '''
        return inputs

    def postprocess(self, outputs):
        ''' Transform the predictions computed by the model before returning a response '''
        return outputs

uploaded_file_path = dataset_api.upload("my_transformer.py", "Resources", overwrite=True)
transformer_script_path = os.path.join("/Projects", project.name, uploaded_file_path)

my_transformer = ms.create_transformer(script_file=uploaded_file_path)

# or

from hsml.transformer import Transformer

my_transformer = Transformer(script_file)
Create a deployment with the transformer
my_predictor = ms.create_predictor(transformer=my_transformer)
my_deployment = my_predictor.deploy()

# or
my_deployment = ms.create_deployment(my_predictor, transformer=my_transformer)
my_deployment.save()
Lazy

This method is lazy and does not persist any metadata or deploy any transformer. To create a deployment using this transformer, set it in the predictor.transformer property.

PARAMETER DESCRIPTION
script_file

Path to a custom predictor script implementing the Transformer class.

TYPE: str | None DEFAULT: None

resources

Resources to be allocated for the transformer.

TYPE: PredictorResources | dict | None DEFAULT: None

scaling_configuration

Scaling configuration for the transformer.

TYPE: TransformerScalingConfig | dict | None DEFAULT: None

env_vars

Environment variables to set on the transformer.

TYPE: dict | None DEFAULT: None

RETURNS DESCRIPTION
Transformer

The transformer metadata object.

deploy_agent #

deploy_agent(
    entry: str,
    name: str | None = None,
    requirements: str | None = None,
    environment: str | None = None,
    upload_dir: str = "Resources/agents",
    description: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
) -> Deployment

Deploy a Python script or package as an agent.

The agent is created on first call and updated on subsequent calls. Each call uploads the latest local code, refreshes the Python environment, and rewrites the deployment's predictor metadata to reflect the arguments passed in — including any unspecified arguments, which fall back to their defaults. The deployment's running state is left untouched; call start() after the first deploy and restart() to roll a running agent onto the new code. Works the same whether invoked from outside or inside a Hopsworks cluster.

Pass either a .py script or a directory containing a pyproject.toml. For a script, the file is uploaded and run directly. For a package, a wheel is built locally with the project's PEP 517 backend, uploaded, and installed; a small runner module invokes the package via runpy.run_module.

ms = project.get_model_serving()

agent = ms.deploy_agent(entry="my_agent.py")
agent.start() # or agent.restart()

# iterate: edit code locally, push, then roll the running agent onto it
agent = ms.deploy_agent(entry="my_agent.py")
agent.restart()
PARAMETER DESCRIPTION
entry

Local path to a .py script or to a directory containing pyproject.toml.

TYPE: str

name

Name of the deployment, also used as the default Python environment name. Defaults to the basename of entry (without the .py extension for scripts). Must match [A-Za-z0-9_-]+.

TYPE: str | None DEFAULT: None

requirements

Local path to a requirements.txt to install into the environment.

TYPE: str | None DEFAULT: None

environment

Name of the Python environment to use; defaults to name. Created if it does not exist. Must match [A-Za-z0-9_-]+.

TYPE: str | None DEFAULT: None

upload_dir

Directory in the Hopsworks Filesystem under which agent files are placed; the agent gets its own subdirectory <upload_dir>/<name>.

TYPE: str DEFAULT: 'Resources/agents'

description

Description of the deployment.

TYPE: str | None DEFAULT: None

resources

Resources to be allocated for the predictor.

TYPE: PredictorResources | dict | None DEFAULT: None

inference_logger

Inference logger configuration.

TYPE: InferenceLogger | dict | str | None DEFAULT: None

inference_batcher

Inference batcher configuration.

TYPE: InferenceBatcher | dict | None DEFAULT: None

api_protocol

API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC').

TYPE: str | None DEFAULT: IE.API_PROTOCOL_REST

scaling_configuration

Scaling configuration for the predictor.

TYPE: PredictorScalingConfig | dict | None DEFAULT: None

RETURNS DESCRIPTION
Deployment

The deployment metadata object.

RAISES DESCRIPTION
ValueError

If entry is neither a .py file nor a directory with pyproject.toml, or if name/environment contain characters outside [A-Za-z0-9_-].

hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

get_deployment #

get_deployment(name: str = None) -> Deployment | None

Get a deployment by name from Model Serving.

Example
# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by name
my_deployment = ms.get_deployment('deployment_name')

Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

PARAMETER DESCRIPTION
name

Name of the deployment to get.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
Deployment | None

The deployment metadata object or None if it does not exist.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If unable to retrieve deployment from model serving.

get_deployment_by_id #

get_deployment_by_id(id: int) -> Deployment | None

Get a deployment by id from Model Serving.

Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

Example
# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by id
my_deployment = ms.get_deployment_by_id(1)
PARAMETER DESCRIPTION
id

Id of the deployment to get.

TYPE: int

RETURNS DESCRIPTION
Deployment | None

The deployment metadata object or None if it does not exist.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If unable to retrieve deployment from model serving.

get_deployments #

get_deployments(
    model: Model = None, status: str = None
) -> list[Deployment]

Get all deployments from model serving.

Example
# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

list_deployments = ms.get_deployment(my_model)

for deployment in list_deployments:
    print(deployment.get_state())
PARAMETER DESCRIPTION
model

Filter by model served in the deployments

TYPE: Model DEFAULT: None

status

Filter by status of the deployments

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
list[Deployment]

A list of deployments.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If unable to retrieve deployments from model serving.

get_inference_endpoints #

get_inference_endpoints() -> list[InferenceEndpoint]

Get all inference endpoints available in the current project.

RETURNS DESCRIPTION
list[InferenceEndpoint]

Inference endpoints for model inference