Skip to content

hsml.scaling_config #

ComponentScalingConfig #

Bases: ABC

Scaling configuration for a predictor or transformer.

scale_metric property writable #

The metric to use for scaling. Can be either 'CONCURRENCY' or 'RPS'.

target property writable #

Target value for the selected scaling metric that the autoscaler should try to maintain during the stable window. For RPS, this is requests per second. For CONCURRENCY, this is concurrent number of requests.

min_instances property writable #

min_instances: int

Minimum number of instances to scale to. For deployments using kserve, this must be set to 0 to enable scaling to zero. Default is 0 for deployments using kserve and 1 for deployments not using kserve.

max_instances property writable #

Maximum number of instances to scale to. Maximum allowed is configured in the cluster settings by the cluster administrator. Must be at least 1 and greater than or equal to min_instances.

panic_window_percentage property writable #

The percentage of the stable window to use as the panic window during high load situations. Min is 1. Max is 100. Default is 10.

panic_threshold_percentage property writable #

The percentage of the scale metric threshold that, when exceeded during the panic window, will trigger a scale-up event. Min is 1. Max is 200. Default is 200.

stable_window_seconds property writable #

The interval in seconds over which to calculate the average metric. Larger values result in smoother scaling but slower reaction times. Min is 1 second. Max is 3600 seconds.

scale_to_zero_retention_seconds property writable #

The amount of time in seconds the last instance must be kept before being scaled down to zero. Default is 0.

__init__ #

__init__(
    min_instances: int,
    max_instances: int | None = None,
    scale_metric: ScaleMetric | str | Default | None = None,
    target: int | None = None,
    panic_window_percentage: float | None = None,
    panic_threshold_percentage: float | None = None,
    stable_window_seconds: int | None = None,
    scale_to_zero_retention_seconds: int | None = None,
    **kwargs,
)

Initialize a ComponentScalingConfig instance.

PARAMETER DESCRIPTION
min_instances

Minimum number of instances to scale to.

TYPE: int

max_instances

Maximum number of instances to scale to.

TYPE: int | None DEFAULT: None

scale_metric

Metric to use for scaling.

TYPE: ScaleMetric | str | Default | None DEFAULT: None

target

Target value for the selected scaling metric.

TYPE: int | None DEFAULT: None

panic_window_percentage

Percentage of the stable window to use as the panic window.

TYPE: float | None DEFAULT: None

panic_threshold_percentage

Percentage of the scale metric threshold to trigger scaling.

TYPE: float | None DEFAULT: None

stable_window_seconds

Interval in seconds for calculating the average metric.

TYPE: int | None DEFAULT: None

scale_to_zero_retention_seconds

Time in seconds to retain the last instance before scaling to zero.

TYPE: int | None DEFAULT: None

describe #

describe()

Print a JSON description of the scaling configuration.

get_default_scaling_configuration staticmethod #

get_default_scaling_configuration(
    serving_tool: str,
    min_instances: int | None,
    component_type: str = "predictor",
) -> ComponentScalingConfig

Get the default scaling configuration based on the serving tool and number of instances.

PARAMETER DESCRIPTION
serving_tool

the serving tool to use (e.g. kserve)

TYPE: str

min_instances

minimum number of instances, or None to use the default

TYPE: int | None

component_type

the component type (predictor or transformer)

TYPE: str DEFAULT: 'predictor'

RETURNS DESCRIPTION
ComponentScalingConfig

The default scaling configuration for the given serving tool.

PredictorScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a predictor.

__init__ #

__init__(**kwargs)

Initialize a PredictorScalingConfig instance.

KEYWORD ARGUMENTS FOR THE PREDICTOR SCALING CONFIGURATION DESCRIPTION
min_instances

Minimum number of instances to scale to (required).

TYPE: int

max_instances

Maximum number of instances to scale to.

TYPE: int | None

scale_metric

Metric to use for scaling.

TYPE: ScaleMetric | str | Default | None

target

Target value for the selected scaling metric.

TYPE: int | None

panic_window_percentage

Percentage of the stable window to use as the panic window.

TYPE: float | None

panic_threshold_percentage

Percentage of the scale metric threshold to trigger scaling.

TYPE: float | None

stable_window_seconds

Interval in seconds for calculating the average metric.

TYPE: int | None

scale_to_zero_retention_seconds

Time in seconds to retain the last instance before scaling to zero.

TYPE: int | None

RAISES DESCRIPTION
ValueError

If min_instances is not provided.

ScaleMetric #

Bases: Enum

Scaling metric for a predictor or transformer. Can be either 'CONCURRENCY' or 'RPS'.

TransformerScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a transformer.

__init__ #

__init__(**kwargs)

Initialize a TransformerScalingConfig instance.

KEYWORD ARGUMENTS FOR THE TRANSFORMER SCALING CONFIGURATION DESCRIPTION
min_instances

Minimum number of instances to scale to (required).

TYPE: int

max_instances

Maximum number of instances to scale to.

TYPE: int | None

scale_metric

Metric to use for scaling.

TYPE: ScaleMetric | str | Default | None

target

Target value for the selected scaling metric.

TYPE: int | None

panic_window_percentage

Percentage of the stable window to use as the panic window.

TYPE: float | None

panic_threshold_percentage

Percentage of the scale metric threshold to trigger scaling.

TYPE: float | None

stable_window_seconds

Interval in seconds for calculating the average metric.

TYPE: int | None

scale_to_zero_retention_seconds

Time in seconds to retain the last instance before scaling to zero.

TYPE: int | None

RAISES DESCRIPTION
ValueError

If min_instances is not provided.