Skip to content

hsml.scaling_config #

ComponentScalingConfig #

Bases: ABC

Scaling configuration for a predictor or transformer.

max_instances property writable #

Maximum number of instances to scale to. Maximum allowed is configured in the cluster settings by the cluster administrator. Must be at least 1 and greater than or equal to min_instances.

min_instances property writable #

min_instances: int

Minimum number of instances to scale to. For deployments using kserve, this must be set to 0 to enable scaling to zero. Default is 0 for deployments using kserve and 1 for deployments not using kserve.

panic_threshold_percentage property writable #

The percentage of the scale metric threshold that, when exceeded during the panic window, will trigger a scale-up event. Min is 1. Max is 200. Default is 200.

panic_window_percentage property writable #

The percentage of the stable window to use as the panic window during high load situations. Min is 1. Max is 100. Default is 10.

scale_metric property writable #

The metric to use for scaling. Can be either 'CONCURRENCY' or 'RPS'.

scale_to_zero_retention_seconds property writable #

The amount of time in seconds the last instance must be kept before being scaled down to zero. Default is 0.

stable_window_seconds property writable #

The interval in seconds over which to calculate the average metric. Larger values result in smoother scaling but slower reaction times. Min is 1 second. Max is 3600 seconds.

target property writable #

Target value for the selected scaling metric that the autoscaler should try to maintain during the stable window. For RPS, this is requests per second. For CONCURRENCY, this is concurrent number of requests.

describe #

describe()

Print a JSON description of the scaling configuration.

get_default_scaling_configuration staticmethod #

get_default_scaling_configuration(
    serving_tool: str,
    min_instances: int | None,
    component_type: str = "predictor",
) -> ComponentScalingConfig

Get the default scaling configuration based on the serving tool and number of instances.

PARAMETER DESCRIPTION
serving_tool

the serving tool to use (e.g. kserve)

TYPE: str

min_instances

minimum number of instances, or None to use the default

TYPE: int | None

component_type

the component type (predictor or transformer)

TYPE: str DEFAULT: 'predictor'

RETURNS DESCRIPTION
ComponentScalingConfig

The default scaling configuration for the given serving tool.

PredictorScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a predictor.

ScaleMetric #

Bases: Enum

Scaling metric for a predictor or transformer. Can be either 'CONCURRENCY' or 'RPS'.

TransformerScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a transformer.