hsml.scaling_config #

ComponentScalingConfig #

Bases: ABC

Scaling configuration for a predictor or transformer.

scale_metric `property` `writable` #

The metric to use for scaling. Can be either 'CONCURRENCY' or 'RPS'.

target `property` `writable` #

Target value for the selected scaling metric that the autoscaler should try to maintain during the stable window. For RPS, this is requests per second. For CONCURRENCY, this is concurrent number of requests.

min_instances `property` `writable` #

min_instances: int

Minimum number of instances to scale to. For deployments using kserve, this must be set to 0 to enable scaling to zero. Default is 0 for deployments using kserve and 1 for deployments not using kserve.

max_instances `property` `writable` #

Maximum number of instances to scale to. Maximum allowed is configured in the cluster settings by the cluster administrator. Must be at least 1 and greater than or equal to min_instances.

panic_window_percentage `property` `writable` #

The percentage of the stable window to use as the panic window during high load situations. Min is 1. Max is 100. Default is 10.

panic_threshold_percentage `property` `writable` #

The percentage of the scale metric threshold that, when exceeded during the panic window, will trigger a scale-up event. Min is 1. Max is 200. Default is 200.

stable_window_seconds `property` `writable` #

The interval in seconds over which to calculate the average metric. Larger values result in smoother scaling but slower reaction times. Min is 1 second. Max is 3600 seconds.

scale_to_zero_retention_seconds `property` `writable` #

The amount of time in seconds the last instance must be kept before being scaled down to zero. Default is 0.

init #

__init__(
    min_instances: int,
    max_instances: int | None = None,
    scale_metric: ScaleMetric | str | Default | None = None,
    target: int | None = None,
    panic_window_percentage: float | None = None,
    panic_threshold_percentage: float | None = None,
    stable_window_seconds: int | None = None,
    scale_to_zero_retention_seconds: int | None = None,
    **kwargs,
)

Initialize a ComponentScalingConfig instance.

PARAMETER	DESCRIPTION
`min_instances`	Minimum number of instances to scale to. TYPE: `int`
`max_instances`	Maximum number of instances to scale to. TYPE: `int \| None` DEFAULT: `None`
`scale_metric`	Metric to use for scaling. TYPE: `ScaleMetric \| str \| Default \| None` DEFAULT: `None`
`target`	Target value for the selected scaling metric. TYPE: `int \| None` DEFAULT: `None`
`panic_window_percentage`	Percentage of the stable window to use as the panic window. TYPE: `float \| None` DEFAULT: `None`
`panic_threshold_percentage`	Percentage of the scale metric threshold to trigger scaling. TYPE: `float \| None` DEFAULT: `None`
`stable_window_seconds`	Interval in seconds for calculating the average metric. TYPE: `int \| None` DEFAULT: `None`
`scale_to_zero_retention_seconds`	Time in seconds to retain the last instance before scaling to zero. TYPE: `int \| None` DEFAULT: `None`

describe #

describe()

Print a JSON description of the scaling configuration.

get_default_scaling_configuration `staticmethod` #

get_default_scaling_configuration(
    serving_tool: str,
    min_instances: int | None,
    component_type: str = "predictor",
) -> ComponentScalingConfig

Get the default scaling configuration based on the serving tool and number of instances.

PARAMETER	DESCRIPTION
`serving_tool`	the serving tool to use (e.g. kserve) TYPE: `str`
`min_instances`	minimum number of instances, or None to use the default TYPE: `int \| None`
`component_type`	the component type (predictor or transformer) TYPE: `str` DEFAULT: `'predictor'`

RETURNS	DESCRIPTION
`ComponentScalingConfig`	The default scaling configuration for the given serving tool.

PredictorScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a predictor.

init #

__init__(**kwargs)

Initialize a PredictorScalingConfig instance.

KEYWORD ARGUMENTS FOR THE PREDICTOR SCALING CONFIGURATION	DESCRIPTION
`min_instances`	Minimum number of instances to scale to (required). TYPE: `int`
`max_instances`	Maximum number of instances to scale to. TYPE: `int \| None`
`scale_metric`	Metric to use for scaling. TYPE: `ScaleMetric \| str \| Default \| None`
`target`	Target value for the selected scaling metric. TYPE: `int \| None`
`panic_window_percentage`	Percentage of the stable window to use as the panic window. TYPE: `float \| None`
`panic_threshold_percentage`	Percentage of the scale metric threshold to trigger scaling. TYPE: `float \| None`
`stable_window_seconds`	Interval in seconds for calculating the average metric. TYPE: `int \| None`
`scale_to_zero_retention_seconds`	Time in seconds to retain the last instance before scaling to zero. TYPE: `int \| None`

RAISES	DESCRIPTION
`ValueError`	If `min_instances` is not provided.

ScaleMetric #

Bases: Enum

Scaling metric for a predictor or transformer. Can be either 'CONCURRENCY' or 'RPS'.

TransformerScalingConfig #

Bases: ComponentScalingConfig

Scaling configuration for a transformer.

init #

__init__(**kwargs)

Initialize a TransformerScalingConfig instance.

KEYWORD ARGUMENTS FOR THE TRANSFORMER SCALING CONFIGURATION	DESCRIPTION
`min_instances`	Minimum number of instances to scale to (required). TYPE: `int`
`max_instances`	Maximum number of instances to scale to. TYPE: `int \| None`
`scale_metric`	Metric to use for scaling. TYPE: `ScaleMetric \| str \| Default \| None`
`target`	Target value for the selected scaling metric. TYPE: `int \| None`
`panic_window_percentage`	Percentage of the stable window to use as the panic window. TYPE: `float \| None`
`panic_threshold_percentage`	Percentage of the scale metric threshold to trigger scaling. TYPE: `float \| None`
`stable_window_seconds`	Interval in seconds for calculating the average metric. TYPE: `int \| None`
`scale_to_zero_retention_seconds`	Time in seconds to retain the last instance before scaling to zero. TYPE: `int \| None`

RAISES	DESCRIPTION
`ValueError`	If `min_instances` is not provided.

hsml.scaling_config #

ComponentScalingConfig #

scale_metric property writable #

target property writable #

min_instances property writable #

max_instances property writable #

panic_window_percentage property writable #

panic_threshold_percentage property writable #

stable_window_seconds property writable #

scale_to_zero_retention_seconds property writable #

__init__ #

describe #

get_default_scaling_configuration staticmethod #

PredictorScalingConfig #

__init__ #

ScaleMetric #

TransformerScalingConfig #

__init__ #

scale_metric `property` `writable` #

target `property` `writable` #

min_instances `property` `writable` #

max_instances `property` `writable` #

panic_window_percentage `property` `writable` #

panic_threshold_percentage `property` `writable` #

stable_window_seconds `property` `writable` #

scale_to_zero_retention_seconds `property` `writable` #

init #

get_default_scaling_configuration `staticmethod` #

init #

init #