hsml.model_serving #

ModelServing #

project_id `property` #

Id of the project in which Model Serving is located.

project_name `property` #

Name of the project in which Model Serving is located.

project_path `property` #

Path of the project the registry is connected to.

create_deployment #

create_deployment(
    predictor: Predictor,
    name: str | None = None,
    environment: str | None = None,
) -> Deployment

Create a Deployment metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = ms.create_deployment(my_predictor)
my_deployment.save()

Using the model object

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

my_deployment = my_model.deploy()

my_deployment.get_state().describe()

Using the Model Serving handle

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()

my_deployment.get_state().describe()

Lazy

This method is lazy and does not persist any metadata or deploy any model. To create a deployment, call the save() method.

PARAMETER	DESCRIPTION
`predictor`	predictor to be used in the deployment TYPE: `Predictor`
`name`	name of the deployment TYPE: `str \| None` DEFAULT: `None`
`environment`	(Deprecated) The project Python environment to use. This argument will be ignored, use the argument `environment` in the `create_predictor()` or `create_endpoint()` methods instead. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Deployment`	The deployment metadata object.

create_endpoint #

create_endpoint(
    name: str,
    script_file: str,
    description: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    environment: str | None = None,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
) -> Predictor

Create an Entrypoint metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
ms = project.get_model_serving()

my_endpoint = ms.create_entrypoint(name="feature_server", entrypoint_file="feature_server.py")

my_deployment = my_endpoint.deploy()

Lazy

This method is lazy and does not persist any metadata or deploy any endpoint on its own. To create a deployment using this endpoint, call the deploy() method.

PARAMETER	DESCRIPTION
`name`	Name of the endpoint. TYPE: `str`
`script_file`	Path to a custom script file implementing a HTTP server. TYPE: `str`
`description`	Description of the endpoint. TYPE: `str \| None` DEFAULT: `None`
`resources`	Resources to be allocated for the predictor. TYPE: `PredictorResources \| dict \| None` DEFAULT: `None`
`inference_logger`	Inference logger configuration. TYPE: `InferenceLogger \| dict \| str \| None` DEFAULT: `None`
`inference_batcher`	Inference batcher configuration. TYPE: `InferenceBatcher \| dict \| None` DEFAULT: `None`
`api_protocol`	API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC'). TYPE: `str \| None` DEFAULT: `IE.API_PROTOCOL_REST`
`environment`	The project Python environment to use TYPE: `str \| None` DEFAULT: `None`
`scaling_configuration`	Scaling configuration for the predictor. TYPE: `PredictorScalingConfig \| dict \| None` DEFAULT: `None`
`env_vars`	Environment variables to set on the predictor. TYPE: `dict \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Predictor`	The predictor metadata object.

create_predictor #

create_predictor(
    model: Model,
    name: str | None = None,
    artifact_version: str | None = None,
    serving_tool: str | None = None,
    script_file: str | None = None,
    config_file: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    transformer: Transformer | dict | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    environment: str | None = None,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
    vllm_variant: str | None = None,
    vllm_image_tag: str | None = None,
) -> Predictor

Create a Predictor metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

my_predictor = ms.create_predictor(my_model)

my_deployment = my_predictor.deploy()

Lazy

This method is lazy and does not persist any metadata or deploy any model on its own. To create a deployment using this predictor, call the deploy() method.

PARAMETER	DESCRIPTION
`model`	Model to be deployed. TYPE: `Model`
`name`	Name of the predictor. TYPE: `str \| None` DEFAULT: `None`
`artifact_version`	(Deprecated) Version number of the model artifact to deploy, `CREATE` to create a new model artifact or `MODEL-ONLY` to reuse the shared artifact containing only the model files. TYPE: `str \| None` DEFAULT: `None`
`serving_tool`	Serving tool used to deploy the model server. TYPE: `str \| None` DEFAULT: `None`
`script_file`	Path to a custom predictor script implementing the Predict class. TYPE: `str \| None` DEFAULT: `None`
`config_file`	Model server configuration file to be passed to the model deployment. It can be accessed via `CONFIG_FILE_PATH` environment variable from a predictor script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine. TYPE: `str \| None` DEFAULT: `None`
`resources`	Resources to be allocated for the predictor. TYPE: `PredictorResources \| dict \| None` DEFAULT: `None`
`inference_logger`	Inference logger configuration. TYPE: `InferenceLogger \| dict \| str \| None` DEFAULT: `None`
`inference_batcher`	Inference batcher configuration. TYPE: `InferenceBatcher \| dict \| None` DEFAULT: `None`
`transformer`	Transformer to be deployed together with the predictor. TYPE: `Transformer \| dict \| None` DEFAULT: `None`
`api_protocol`	API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC'). TYPE: `str \| None` DEFAULT: `IE.API_PROTOCOL_REST`
`environment`	The project Python environment to use TYPE: `str \| None` DEFAULT: `None`
`scaling_configuration`	Scaling configuration for the predictor. TYPE: `PredictorScalingConfig \| dict \| None` DEFAULT: `None`
`env_vars`	Environment variables to set on the predictor. TYPE: `dict \| None` DEFAULT: `None`
`vllm_variant`	vLLM image variant for vLLM deployments. One of `'VLLM'` or `'VLLM_OMNI'`. Ignored for non-vLLM model servers. TYPE: `str \| None` DEFAULT: `None`
`vllm_image_tag`	vLLM image tag override. `None` uses the cluster default; if set, it should match one of the tags made available by a cluster administrator. Ignored for non-vLLM model servers. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Predictor`	The predictor metadata object.

create_transformer #

create_transformer(
    script_file: str | None = None,
    resources: PredictorResources | dict | None = None,
    scaling_configuration: TransformerScalingConfig
    | dict
    | None = None,
    env_vars: dict | None = None,
) -> Transformer

Create a Transformer metadata object.

Example

# login into Hopsworks using hopsworks.login()

# get Dataset API instance
dataset_api = project.get_dataset_api()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# create my_transformer.py Python script
class Transformer(object):
    def __init__(self):
        ''' Initialization code goes here '''
        pass

    def preprocess(self, inputs):
        ''' Transform the requests inputs here. The object returned by this method will be used as model input to make predictions. '''
        return inputs

    def postprocess(self, outputs):
        ''' Transform the predictions computed by the model before returning a response '''
        return outputs

uploaded_file_path = dataset_api.upload("my_transformer.py", "Resources", overwrite=True)
transformer_script_path = os.path.join("/Projects", project.name, uploaded_file_path)

my_transformer = ms.create_transformer(script_file=uploaded_file_path)

# or

from hsml.transformer import Transformer

my_transformer = Transformer(script_file)

Create a deployment with the transformer

my_predictor = ms.create_predictor(transformer=my_transformer)
my_deployment = my_predictor.deploy()

# or
my_deployment = ms.create_deployment(my_predictor, transformer=my_transformer)
my_deployment.save()

Lazy

This method is lazy and does not persist any metadata or deploy any transformer. To create a deployment using this transformer, set it in the predictor.transformer property.

PARAMETER	DESCRIPTION
`script_file`	Path to a custom predictor script implementing the Transformer class. TYPE: `str \| None` DEFAULT: `None`
`resources`	Resources to be allocated for the transformer. TYPE: `PredictorResources \| dict \| None` DEFAULT: `None`
`scaling_configuration`	Scaling configuration for the transformer. TYPE: `TransformerScalingConfig \| dict \| None` DEFAULT: `None`
`env_vars`	Environment variables to set on the transformer. TYPE: `dict \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Transformer`	The transformer metadata object.

deploy_agent #

deploy_agent(
    entry: str,
    name: str | None = None,
    requirements: str | None = None,
    environment: str | None = None,
    upload_dir: str = "Resources/agents",
    description: str | None = None,
    resources: PredictorResources | dict | None = None,
    inference_logger: InferenceLogger
    | dict
    | str
    | None = None,
    inference_batcher: InferenceBatcher
    | dict
    | None = None,
    api_protocol: str | None = IE.API_PROTOCOL_REST,
    scaling_configuration: PredictorScalingConfig
    | dict
    | None = None,
) -> Deployment

Deploy a Python script or package as an agent.

The agent is created on first call and updated on subsequent calls. Each call uploads the latest local code, refreshes the Python environment, and rewrites the deployment's predictor metadata to reflect the arguments passed in — including any unspecified arguments, which fall back to their defaults. The deployment's running state is left untouched; call start() after the first deploy and restart() to roll a running agent onto the new code. Works the same whether invoked from outside or inside a Hopsworks cluster.

Pass either a .py script or a directory containing a pyproject.toml. For a script, the file is uploaded and run directly. For a package, a wheel is built locally with the project's PEP 517 backend, uploaded, and installed; a small runner module invokes the package via runpy.run_module.

ms = project.get_model_serving()

agent = ms.deploy_agent(entry="my_agent.py")
agent.start() # or agent.restart()

# iterate: edit code locally, push, then roll the running agent onto it
agent = ms.deploy_agent(entry="my_agent.py")
agent.restart()

PARAMETER	DESCRIPTION
`entry`	Local path to a `.py` script or to a directory containing `pyproject.toml`. TYPE: `str`
`name`	Name of the deployment, also used as the default Python environment name. Defaults to the basename of `entry` (without the `.py` extension for scripts). Must match `[A-Za-z0-9_-]+`. TYPE: `str \| None` DEFAULT: `None`
`requirements`	Local path to a `requirements.txt` to install into the environment. TYPE: `str \| None` DEFAULT: `None`
`environment`	Name of the Python environment to use; defaults to `name`. Created if it does not exist. Must match `[A-Za-z0-9_-]+`. TYPE: `str \| None` DEFAULT: `None`
`upload_dir`	Directory in the Hopsworks Filesystem under which agent files are placed; the agent gets its own subdirectory `<upload_dir>/<name>`. TYPE: `str` DEFAULT: `'Resources/agents'`
`description`	Description of the deployment. TYPE: `str \| None` DEFAULT: `None`
`resources`	Resources to be allocated for the predictor. TYPE: `PredictorResources \| dict \| None` DEFAULT: `None`
`inference_logger`	Inference logger configuration. TYPE: `InferenceLogger \| dict \| str \| None` DEFAULT: `None`
`inference_batcher`	Inference batcher configuration. TYPE: `InferenceBatcher \| dict \| None` DEFAULT: `None`
`api_protocol`	API protocol to be enabled in the deployment (i.e., 'REST' or 'GRPC'). TYPE: `str \| None` DEFAULT: `IE.API_PROTOCOL_REST`
`scaling_configuration`	Scaling configuration for the predictor. TYPE: `PredictorScalingConfig \| dict \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Deployment`	The deployment metadata object.

RAISES	DESCRIPTION
`ValueError`	If `entry` is neither a `.py` file nor a directory with `pyproject.toml`, or if `name`/`environment` contain characters outside `[A-Za-z0-9_-]`.
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_deployment #

get_deployment(name: str = None) -> Deployment | None

Get a deployment by name from Model Serving.

Example

# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by name
my_deployment = ms.get_deployment('deployment_name')

Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

PARAMETER	DESCRIPTION
`name`	Name of the deployment to get. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`Deployment \| None`	The deployment metadata object or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If unable to retrieve deployment from model serving.

get_deployment_by_id #

get_deployment_by_id(id: int) -> Deployment | None

Get a deployment by id from Model Serving.

Getting a deployment from Model Serving means getting its metadata handle so you can subsequently operate on it (e.g., start or stop).

Example

# login and get Hopsworks Model Serving handle using .login() and .get_model_serving()

# get a deployment by id
my_deployment = ms.get_deployment_by_id(1)

PARAMETER	DESCRIPTION
`id`	Id of the deployment to get. TYPE: `int`

RETURNS	DESCRIPTION
`Deployment \| None`	The deployment metadata object or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If unable to retrieve deployment from model serving.

get_deployments #

get_deployments(
    model: Model = None, status: str = None
) -> list[Deployment]

Get all deployments from model serving.

Example

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Registry handle
mr = project.get_model_registry()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# retrieve the trained model you want to deploy
my_model = mr.get_model("my_model", version=1)

list_deployments = ms.get_deployment(my_model)

for deployment in list_deployments:
    print(deployment.get_state())

PARAMETER	DESCRIPTION
`model`	Filter by model served in the deployments TYPE: `Model` DEFAULT: `None`
`status`	Filter by status of the deployments TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[Deployment]`	A list of deployments.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If unable to retrieve deployments from model serving.

get_inference_endpoints #

get_inference_endpoints() -> list[InferenceEndpoint]

Get all inference endpoints available in the current project.

RETURNS	DESCRIPTION
`list[InferenceEndpoint]`	Inference endpoints for model inference

hsml.model_serving #

ModelServing #

project_id property #

project_name property #

project_path property #

create_deployment #

create_endpoint #

create_predictor #

create_transformer #

deploy_agent #

get_deployment #

get_deployment_by_id #

get_deployments #

get_inference_endpoints #

project_id `property` #

project_name `property` #

project_path `property` #