hsml.deployment #

Deployment #

NOT_FOUND_ERROR_CODE `class-attribute` `instance-attribute` #

NOT_FOUND_ERROR_CODE = 240000

Metadata object representing a deployment in Model Serving.

api_protocol `property` `writable` #

API protocol enabled in the deployment (e.g., HTTP or GRPC).

artifact_files_path `property` #

Path of the artifact files deployed by the predictor.

artifact_path `property` #

Path of the model artifact deployed by the predictor.

Deprecated

Artifact versions are deprecated in favor of deployment versions.

artifact_version `property` `writable` #

Artifact version deployed by the predictor.

Deprecated

Artifact versions are deprecated in favor of deployment versions.

config_file `property` `writable` #

Model server configuration file passed to the model deployment.

It can be accessed via CONFIG_FILE_PATH environment variable from a predictor or transformer script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.

created_at `property` #

Created at date of the predictor.

creator `property` #

Creator of the predictor.

description `property` `writable` #

Description of the deployment.

env_vars `property` `writable` #

Environment variables of the predictor.

environment `property` `writable` #

Name of inference environment.

has_model `property` #

Whether the deployment has a model associated.

id `property` #

Id of the deployment.

inference_batcher `property` `writable` #

Configuration of the inference batcher attached to this predictor.

inference_logger `property` `writable` #

Configuration of the inference logger attached to this predictor.

model_name `property` `writable` #

Name of the model deployed by the predictor.

model_path `property` `writable` #

Model path deployed by the predictor.

model_registry_id `property` `writable` #

Model Registry Id of the deployment.

model_server `property` `writable` #

Model server ran by the predictor.

model_version `property` `writable` #

Model version deployed by the predictor.

name `property` `writable` #

Name of the deployment.

predictor `property` `writable` #

Predictor used in the deployment.

project_name `property` `writable` #

Name of the project the deployment belongs to.

project_namespace `property` `writable` #

Name of the Kubernetes namespace the project is in.

requested_instances `property` #

Total number of requested instances in the deployment.

resources `property` `writable` #

Resource configuration for the predictor.

scaling_configuration `property` `writable` #

Scaling configuration for the deployment.

script_file `property` `writable` #

Script file used by the predictor.

serving_tool `property` `writable` #

Serving tool used to run the model server.

transformer `property` `writable` #

Transformer configured in the predictor.

version `property` #

Version of the deployment.

delete #

delete(force: bool = False)

Delete the deployment.

PARAMETER	DESCRIPTION
`force`	Force the deletion of the deployment. If the deployment is running, it will be stopped and deleted automatically. TYPE: `bool` DEFAULT: `False`

Warning

A call to this method does not ask for a second confirmation.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

describe #

describe()

Print a JSON description of the deployment.

download_artifact_files #

download_artifact_files(local_path: str | None = None)

Download the artifact files served by the deployment.

PARAMETER	DESCRIPTION
`local_path`	Path where to download the artifact files in the local filesystem. TYPE: `str \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

get_endpoint_url #

get_endpoint_url() -> str | None

Get the base endpoint URL for this deployment.

Returns the base URL that can be used with external HTTP clients. This is the path-based routing base endpoint without any protocol-specific suffixes like :predict or /v1.

If Istio client is not available, returns None.

RETURNS	DESCRIPTION
`str \| None`	Base endpoint URL, or `None` if unavailable.

Examples:

deployment = ms.get_deployment("my_deployment")
url = deployment.get_endpoint_url()
# url = "https://host:port/v1/project/name"

get_inference_url #

get_inference_url() -> str | None

Get the KServe inference URL for standard model deployments.

Returns the full URL with :predict suffix for KServe inference protocol. This method only returns a URL for standard model deployments (non-vLLM, with a model attached).

If Istio client is not available, falls back to Hopsworks REST API path.

RETURNS	DESCRIPTION
`str \| None`	Inference URL with `:predict` suffix, or `None` if not a standard model deployment.

Examples:

deployment = ms.get_deployment("my_deployment")
url = deployment.get_inference_url()
# Use with any HTTP client
import requests
response = requests.post(url, json={"instances": [[1, 2, 3]]})

get_logs #

get_logs(component: str = 'predictor', tail: int = 10)

Prints the deployment logs of the predictor or transformer.

.. note:: Legacy: this method prints to stdout and returns None. New code (and any agent / scripted use) should call meth:read_logs for a string return value or meth:tail_logs for incremental streaming.

PARAMETER	DESCRIPTION
`component`	Deployment component to get the logs from (e.g., predictor or transformer). TYPE: `str` DEFAULT: `'predictor'`
`tail`	Number of most recent lines to retrieve from the logs. TYPE: `int` DEFAULT: `10`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

get_model #

get_model()

Retrieve the metadata object for the model being used by this deployment.

get_openai_url #

get_openai_url() -> str | None

Get the OpenAI-compatible API URL for vLLM deployments.

Returns the URL for OpenAI-compatible API endpoints (e.g., /v1/chat/completions). This method only returns a URL for LLM (vLLM) deployments.

RETURNS	DESCRIPTION
`str \| None`	OpenAI-compatible URL (base URL + "/v1"), or `None` if not a LLM deployment.

Examples:

deployment = ms.get_deployment("my_llm_deployment")
url = deployment.get_openai_url()
# url = "https://host:port/v1/project/name/v1"
# Then use: url + "/chat/completions"

get_state #

get_state() -> PredictorState

Get the current state of the deployment.

RETURNS	DESCRIPTION
`PredictorState`	The state of the deployment.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

get_url #

get_url()

Get url to the deployment in Hopsworks.

is_created #

is_created() -> bool

Check whether the deployment is created.

RETURNS	DESCRIPTION
`bool`	Whether the deployment is created or not.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

is_running #

is_running(
    or_idle: bool = True, or_updating: bool = True
) -> bool

Check whether the deployment is ready to handle inference requests.

PARAMETER	DESCRIPTION
`or_idle`	Whether the idle state is considered as running (default is True). TYPE: `bool` DEFAULT: `True`
`or_updating`	Whether the updating state is considered as running (default is True). TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`bool`	Whether the deployment is ready or not.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

is_stopped #

is_stopped(or_created: bool = True) -> bool

Check whether the deployment is stopped.

PARAMETER	DESCRIPTION
`or_created`	Whether the creating and created state is considered as stopped (default is True). TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`bool`	Whether the deployment is stopped or not.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

predict #

predict(
    data: dict | InferInput = None,
    inputs: list | dict = None,
) -> dict

Send inference requests to the deployment.

One of data or inputs parameters must be set. If both are set, inputs will be ignored.

PARAMETER	DESCRIPTION
`data`	Payload dictionary for the inference request including the model input(s). TYPE: `dict \| InferInput` DEFAULT: `None`
`inputs`	Model inputs used in the inference requests. TYPE: `list \| dict` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict`	Inference response.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

Examples:

# login into Hopsworks using hopsworks.login()

# get Hopsworks Model Serving handle
ms = project.get_model_serving()

# retrieve deployment by name
my_deployment = ms.get_deployment("my_deployment")

# (optional) retrieve model input example
my_model = project.get_model_registry()                                .get_model(my_deployment.model_name, my_deployment.model_version)

# make predictions using model inputs (single or batch)
predictions = my_deployment.predict(inputs=my_model.input_example)

# or using more sophisticated inference request payloads
data = { "instances": [ my_model.input_example ], "key2": "value2" }
predictions = my_deployment.predict(data)

read_logs #

read_logs(
    component: str = "predictor",
    tail: int = 100,
    source: str = "opensearch",
    since: str | None = None,
    until: str | None = None,
    pod: str | None = None,
) -> str

Return deployment logs as a single plain-text string.

Programmatic counterpart to meth:get_logs. Suitable for agents and scripts: never prints, never short-circuits on deployment state. The default source="opensearch" reads the project's serving index and works for stopped or restarted deployments — meth:get_logs only reads live pod stdout and returns None when the deployment isn't running.

PARAMETER	DESCRIPTION
`component`	`predictor` or `transformer`. TYPE: `str` DEFAULT: `'predictor'`
`tail`	Most-recent lines to retrieve. Capped server-side. TYPE: `int` DEFAULT: `100`
`source`	`opensearch` (historical, default) or `kubernetes` (live pod-tailing; only works while running). TYPE: `str` DEFAULT: `'opensearch'`
`since`	ISO-8601 lower bound on log timestamp. Ignored on the Kubernetes path. TYPE: `str \| None` DEFAULT: `None`
`until`	ISO-8601 upper bound on log timestamp. Ignored on the Kubernetes path. TYPE: `str \| None` DEFAULT: `None`
`pod`	Restrict to one instance / container name. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The joined logs as plain text. Empty string when there are no
`str`	matching lines; `==> <instance> <==\\n` block headers when
`str`	multiple instances are present.

restart #

restart(
    await_stopped: int | None = 600,
    await_running: int | None = 600,
) -> None

Restart the deployment so it picks up the latest code and environment state.

If the deployment is already stopped, it is started in place.

PARAMETER	DESCRIPTION
`await_stopped`	Awaiting time (seconds) for the deployment to stop. TYPE: `int \| None` DEFAULT: `600`
`await_running`	Awaiting time (seconds) for the deployment to start again. TYPE: `int \| None` DEFAULT: `600`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

save #

save(await_update: int | None = 600)

Persist this deployment including the predictor and metadata to Model Serving.

PARAMETER	DESCRIPTION
`await_update`	If the deployment is running, awaiting time (seconds) for the running instances to be updated. If the running instances are not updated within this timespan, the call to this method returns while the update in the background. TYPE: `int \| None` DEFAULT: `600`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

start #

start(await_running: int | None = 600)

Start the deployment.

PARAMETER	DESCRIPTION
`await_running`	Awaiting time (seconds) for the deployment to start. If the deployment has not started within this timespan, the call to this method returns while it deploys in the background. TYPE: `int \| None` DEFAULT: `600`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

stop #

stop(await_stopped: int | None = 600)

Stop the deployment.

PARAMETER	DESCRIPTION
`await_stopped`	Awaiting time (seconds) for the deployment to stop. If the deployment has not stopped within this timespan, the call to this method returns while it stopping in the background. TYPE: `int \| None` DEFAULT: `600`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue.

tail_logs #

tail_logs(
    component: str = "predictor",
    interval: float = 2.0,
    source: str = "opensearch",
    since: str | None = "now",
    timeout: float | None = None,
    stop_on_status: str | None = None,
) -> Iterator[str]

Yield only newly observed log chunks as plain text.

Client-side polling, not server-streaming: each tick calls meth:read_logs with a moving cursor and yields the portion not already seen. Deduplication uses the OpenSearch timestamp + doc_id pair; a content-hash fallback covers the Kubernetes path.

Example::

for chunk in dep.tail_logs(timeout=120):
    print(chunk, end="")

PARAMETER	DESCRIPTION
`component`	`predictor` or `transformer`. TYPE: `str` DEFAULT: `'predictor'`
`interval`	Seconds between polls. TYPE: `float` DEFAULT: `2.0`
`source`	`opensearch` (default) or `kubernetes`. TYPE: `str` DEFAULT: `'opensearch'`
`since`	`"now"` to start from the current instant (default), or an ISO-8601 timestamp to replay from a specific point. TYPE: `str \| None` DEFAULT: `'now'`
`timeout`	Stop after this many seconds. `None` runs forever. TYPE: `float \| None` DEFAULT: `None`
`stop_on_status`	Stop when `deployment.get_state().status` matches this string (e.g. `"Stopped"`). TYPE: `str \| None` DEFAULT: `None`

YIELDS	DESCRIPTION
`str`	Plain-text log chunks containing only newly observed content.

hsml.deployment #

Deployment #

NOT_FOUND_ERROR_CODE class-attribute instance-attribute #

api_protocol property writable #

artifact_files_path property #

artifact_path property #

artifact_version property writable #

config_file property writable #

created_at property #

creator property #

description property writable #

env_vars property writable #

environment property writable #

has_model property #

id property #

inference_batcher property writable #

inference_logger property writable #

model_name property writable #

model_path property writable #

model_registry_id property writable #

model_server property writable #

model_version property writable #

name property writable #

predictor property writable #

project_name property writable #

project_namespace property writable #

requested_instances property #

resources property writable #

scaling_configuration property writable #

script_file property writable #

serving_tool property writable #

transformer property writable #

version property #

delete #

describe #

download_artifact_files #

get_endpoint_url #

get_inference_url #

get_logs #

get_model #

get_openai_url #

get_state #

get_url #

is_created #

is_running #

is_stopped #

predict #

read_logs #

restart #

save #

start #

stop #

tail_logs #

NOT_FOUND_ERROR_CODE `class-attribute` `instance-attribute` #

api_protocol `property` `writable` #

artifact_files_path `property` #

artifact_path `property` #

artifact_version `property` `writable` #

config_file `property` `writable` #

created_at `property` #

creator `property` #

description `property` `writable` #

env_vars `property` `writable` #

environment `property` `writable` #

has_model `property` #

id `property` #

inference_batcher `property` `writable` #

inference_logger `property` `writable` #

model_name `property` `writable` #

model_path `property` `writable` #

model_registry_id `property` `writable` #

model_server `property` `writable` #

model_version `property` `writable` #

name `property` `writable` #

predictor `property` `writable` #

project_name `property` `writable` #

project_namespace `property` `writable` #

requested_instances `property` #

resources `property` `writable` #

scaling_configuration `property` `writable` #

script_file `property` `writable` #

serving_tool `property` `writable` #

transformer `property` `writable` #

version `property` #