hsml.scaling_config #

ComponentScalingConfig #

Bases: ABC

Scaling configuration for a predictor or transformer.

max_instances `property` `writable` #

Maximum number of instances to scale to. Maximum allowed is configured in the cluster settings by the cluster administrator. Must be at least 1 and greater than or equal to min_instances.

min_instances `property` `writable` #

min_instances: int

Minimum number of instances to scale to. For deployments using kserve, this must be set to 0 to enable scaling to zero. Default is 0 for deployments using kserve and 1 for deployments not using kserve.

panic_threshold_percentage `property` `writable` #

The percentage of the scale metric threshold that, when exceeded during the panic window, will trigger a scale-up event. Min is 1. Max is 200. Default is 200.

panic_window_percentage `property` `writable` #

The percentage of the stable window to use as the panic window during high load situations. Min is 1. Max is 100. Default is 10.

scale_metric `property` `writable` #

The metric to use for scaling. Can be either 'CONCURRENCY' or 'RPS'.

scale_to_zero_retention_seconds `property` `writable` #

The amount of time in seconds the last instance must be kept before being scaled down to zero. Default is 0.

stable_window_seconds `property` `writable` #

The interval in seconds over which to calculate the average metric. Larger values result in smoother scaling but slower reaction times. Min is 1 second. Max is 3600 seconds.

target `property` `writable` #

Target value for the selected scaling metric that the autoscaler should try to maintain during the stable window. For RPS, this is requests per second. For CONCURRENCY, this is concurrent number of requests.

describe #

describe()

Print a JSON description of the scaling configuration.

get_default_scaling_configuration `staticmethod` #

get_default_scaling_configuration(
    serving_tool: str,
    min_instances: int | None,
    component_type: str = "predictor",
) -> ComponentScalingConfig

Get the default scaling configuration based on the serving tool and number of instances.

PARAMETER	DESCRIPTION
`serving_tool`	the serving tool to use (e.g. kserve) TYPE: `str`
`min_instances`	minimum number of instances, or None to use the default TYPE: `int \| None`
`component_type`	the component type (predictor or transformer) TYPE: `str` DEFAULT: `'predictor'`

RETURNS	DESCRIPTION
`ComponentScalingConfig`	The default scaling configuration for the given serving tool.

PredictorScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a predictor.

ScaleMetric #

Bases: Enum

Scaling metric for a predictor or transformer. Can be either 'CONCURRENCY' or 'RPS'.

TransformerScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a transformer.

hsml.scaling_config #

ComponentScalingConfig #

max_instances property writable #

min_instances property writable #

panic_threshold_percentage property writable #

panic_window_percentage property writable #

scale_metric property writable #

scale_to_zero_retention_seconds property writable #

stable_window_seconds property writable #

target property writable #