hsml.scaling_config #
ComponentScalingConfig #
Bases: ABC
Scaling configuration for a predictor or transformer.
scale_metric property writable #
The metric to use for scaling. Can be either 'CONCURRENCY' or 'RPS'.
target property writable #
Target value for the selected scaling metric that the autoscaler should try to maintain during the stable window. For RPS, this is requests per second. For CONCURRENCY, this is concurrent number of requests.
min_instances property writable #
min_instances: int
Minimum number of instances to scale to. For deployments using kserve, this must be set to 0 to enable scaling to zero. Default is 0 for deployments using kserve and 1 for deployments not using kserve.
max_instances property writable #
Maximum number of instances to scale to. Maximum allowed is configured in the cluster settings by the cluster administrator. Must be at least 1 and greater than or equal to min_instances.
panic_window_percentage property writable #
The percentage of the stable window to use as the panic window during high load situations. Min is 1. Max is 100. Default is 10.
panic_threshold_percentage property writable #
The percentage of the scale metric threshold that, when exceeded during the panic window, will trigger a scale-up event. Min is 1. Max is 200. Default is 200.
stable_window_seconds property writable #
The interval in seconds over which to calculate the average metric. Larger values result in smoother scaling but slower reaction times. Min is 1 second. Max is 3600 seconds.
scale_to_zero_retention_seconds property writable #
The amount of time in seconds the last instance must be kept before being scaled down to zero. Default is 0.
__init__ #
__init__(
min_instances: int,
max_instances: int | None = None,
scale_metric: ScaleMetric | str | Default | None = None,
target: int | None = None,
panic_window_percentage: float | None = None,
panic_threshold_percentage: float | None = None,
stable_window_seconds: int | None = None,
scale_to_zero_retention_seconds: int | None = None,
**kwargs,
)
Initialize a ComponentScalingConfig instance.
| PARAMETER | DESCRIPTION |
|---|---|
min_instances | Minimum number of instances to scale to. TYPE: |
max_instances | Maximum number of instances to scale to. TYPE: |
scale_metric | Metric to use for scaling. TYPE: |
target | Target value for the selected scaling metric. TYPE: |
panic_window_percentage | Percentage of the stable window to use as the panic window. TYPE: |
panic_threshold_percentage | Percentage of the scale metric threshold to trigger scaling. TYPE: |
stable_window_seconds | Interval in seconds for calculating the average metric. TYPE: |
scale_to_zero_retention_seconds | Time in seconds to retain the last instance before scaling to zero. TYPE: |
get_default_scaling_configuration staticmethod #
get_default_scaling_configuration(
serving_tool: str,
min_instances: int | None,
component_type: str = "predictor",
) -> ComponentScalingConfig
Get the default scaling configuration based on the serving tool and number of instances.
| PARAMETER | DESCRIPTION |
|---|---|
serving_tool | the serving tool to use (e.g. kserve) TYPE: |
min_instances | minimum number of instances, or None to use the default TYPE: |
component_type | the component type (predictor or transformer) TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
ComponentScalingConfig | The default scaling configuration for the given serving tool. |
PredictorScalingConfig #
Bases: ComponentScalingConfig
Scaling configuration for a predictor.
__init__ #
__init__(**kwargs)
Initialize a PredictorScalingConfig instance.
| KEYWORD ARGUMENTS FOR THE PREDICTOR SCALING CONFIGURATION | DESCRIPTION |
|---|---|
min_instances | Minimum number of instances to scale to (required). TYPE: |
max_instances | Maximum number of instances to scale to. TYPE: |
scale_metric | Metric to use for scaling. TYPE: |
target | Target value for the selected scaling metric. TYPE: |
panic_window_percentage | Percentage of the stable window to use as the panic window. TYPE: |
panic_threshold_percentage | Percentage of the scale metric threshold to trigger scaling. TYPE: |
stable_window_seconds | Interval in seconds for calculating the average metric. TYPE: |
scale_to_zero_retention_seconds | Time in seconds to retain the last instance before scaling to zero. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If |
ScaleMetric #
TransformerScalingConfig #
Bases: ComponentScalingConfig
Scaling configuration for a transformer.
__init__ #
__init__(**kwargs)
Initialize a TransformerScalingConfig instance.
| KEYWORD ARGUMENTS FOR THE TRANSFORMER SCALING CONFIGURATION | DESCRIPTION |
|---|---|
min_instances | Minimum number of instances to scale to (required). TYPE: |
max_instances | Maximum number of instances to scale to. TYPE: |
scale_metric | Metric to use for scaling. TYPE: |
target | Target value for the selected scaling metric. TYPE: |
panic_window_percentage | Percentage of the stable window to use as the panic window. TYPE: |
panic_threshold_percentage | Percentage of the scale metric threshold to trigger scaling. TYPE: |
stable_window_seconds | Interval in seconds for calculating the average metric. TYPE: |
scale_to_zero_retention_seconds | Time in seconds to retain the last instance before scaling to zero. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
ValueError | If |