Feature Monitoring Configuration#
FeatureMonitoringConfig#
hsfs.core.feature_monitoring_config.FeatureMonitoringConfig(
feature_store_id,
name,
feature_name=None,
feature_monitoring_type=STATISTICS_COMPUTATION,
job_name=None,
detection_window_config=None,
reference_window_config=None,
statistics_comparison_config=None,
job_schedule=None,
description=None,
id=None,
feature_group_id=None,
feature_view_name=None,
feature_view_version=None,
href=None,
**kwargs
)
Creation from Feature Group#
create_statistics_monitoring#
FeatureGroup.create_statistics_monitoring(
name,
feature_name=None,
description=None,
start_date_time=None,
end_date_time=None,
cron_expression="0 0 12 ? * * *",
)
Run a job to compute statistics on snapshot of feature data on a schedule.
Experimental
Public API is subject to change, this feature is not suitable for production use-cases.
Example
# fetch feature group
fg = fs.get_feature_group(name="my_feature_group", version=1)
# enable statistics monitoring
my_config = fg.create_statistics_monitoring(
name="my_config",
start_date_time="2021-01-01 00:00:00",
description="my description",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
# Statistics computed on 10% of the last week of data
time_offset="1w",
row_percentage=0.1,
).save()
Arguments
- name
str
: Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature group. - feature_name
Optional[str]
: Name of the feature to monitor. If not specified, statistics will be computed for all features. - description
Optional[str]
: Description of the feature monitoring configuration. - start_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: Start date and time from which to start computing statistics. - end_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: End date and time at which to stop computing statistics. - cron_expression
Optional[str]
: Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * ', every day at 12pm UTC.
Raises
hsfs.client.exceptions.FeatureStoreException
.
Return
FeatureMonitoringConfig
Configuration with minimal information about the feature monitoring.
Additional information are required before feature monitoring is enabled.
create_feature_monitoring#
FeatureGroup.create_feature_monitoring(
name,
feature_name,
description=None,
start_date_time=None,
end_date_time=None,
cron_expression="0 0 12 ? * * *",
)
Enable feature monitoring to compare statistics on snapshots of feature data over time.
Experimental
Public API is subject to change, this feature is not suitable for production use-cases.
Example
# fetch feature group
fg = fs.get_feature_group(name="my_feature_group", version=1)
# enable feature monitoring
my_config = fg.create_feature_monitoring(
name="my_monitoring_config",
feature_name="my_feature",
description="my monitoring config description",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
# Data inserted in the last day
time_offset="1d",
window_length="1d",
).with_reference_window(
# Data inserted last week on the same day
time_offset="1w1d",
window_length="1d",
).compare_on(
metric="mean",
threshold=0.5,
).save()
Arguments
- name
str
: Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature group. - feature_name
str
: Name of the feature to monitor. - description
Optional[str]
: Description of the feature monitoring configuration. - start_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: Start date and time from which to start computing statistics. - end_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: End date and time at which to stop computing statistics. - cron_expression
Optional[str]
: Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * ', every day at 12pm UTC.
Raises
hsfs.client.exceptions.FeatureStoreException
.
Return
FeatureMonitoringConfig
Configuration with minimal information about the feature monitoring.
Additional information are required before feature monitoring is enabled.
Creation from Feature View#
create_statistics_monitoring#
FeatureView.create_statistics_monitoring(
name,
feature_name=None,
description=None,
start_date_time=None,
end_date_time=None,
cron_expression="0 0 12 ? * * *",
)
Run a job to compute statistics on snapshot of feature data on a schedule.
Experimental
Public API is subject to change, this feature is not suitable for production use-cases.
Example
# fetch feature view
fv = fs.get_feature_view(name="my_feature_view", version=1)
# enable statistics monitoring
my_config = fv._create_statistics_monitoring(
name="my_config",
start_date_time="2021-01-01 00:00:00",
description="my description",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
# Statistics computed on 10% of the last week of data
time_offset="1w",
row_percentage=0.1,
).save()
Arguments
- name
str
: Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature view. - feature_name
Optional[str]
: Name of the feature to monitor. If not specified, statistics will be computed for all features. - description
Optional[str]
: Description of the feature monitoring configuration. - start_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: Start date and time from which to start computing statistics. - end_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: End date and time at which to stop computing statistics. - cron_expression
Optional[str]
: Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * ', every day at 12pm UTC.
Raises
hsfs.client.exceptions.FeatureStoreException
.
Return
FeatureMonitoringConfig
Configuration with minimal information about the feature monitoring.
Additional information are required before feature monitoring is enabled.
create_feature_monitoring#
FeatureView.create_feature_monitoring(
name,
feature_name,
description=None,
start_date_time=None,
end_date_time=None,
cron_expression="0 0 12 ? * * *",
)
Enable feature monitoring to compare statistics on snapshots of feature data over time.
Experimental
Public API is subject to change, this feature is not suitable for production use-cases.
Example
# fetch feature view
fg = fs.get_feature_view(name="my_feature_view", version=1)
# enable feature monitoring
my_config = fg.create_feature_monitoring(
name="my_monitoring_config",
feature_name="my_feature",
description="my monitoring config description",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
# Data inserted in the last day
time_offset="1d",
window_length="1d",
).with_reference_window(
# compare to a given value
specific_value=0.5,
).compare_on(
metric="mean",
threshold=0.5,
).save()
Arguments
- name
str
: Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature group. - feature_name
str
: Name of the feature to monitor. - description
Optional[str]
: Description of the feature monitoring configuration. - start_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: Start date and time from which to start computing statistics. - end_date_time
Optional[Union[int, str, datetime.datetime, datetime.date, pandas._libs.tslibs.timestamps.Timestamp]]
: End date and time at which to stop computing statistics. - cron_expression
Optional[str]
: Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * ', every day at 12pm UTC.
Raises
hsfs.client.exceptions.FeatureStoreException
.
Return
FeatureMonitoringConfig
Configuration with minimal information about the feature monitoring.
Additional information are required before feature monitoring is enabled.
Retrieval from Feature Group#
get_feature_monitoring_configs#
FeatureGroup.get_feature_monitoring_configs(name=None, feature_name=None, config_id=None)
Fetch all feature monitoring configs attached to the feature group, or fetch by name or feature name only. If no arguments is provided the method will return all feature monitoring configs attached to the feature group, meaning all feature monitoring configs that are attach to a feature in the feature group. If you wish to fetch a single config, provide the its name. If you wish to fetch all configs attached to a particular feature, provide the feature name.
Example
# fetch your feature group
fg = fs.get_feature_group(name="my_feature_group", version=1)
# fetch all feature monitoring configs attached to the feature group
fm_configs = fg.get_feature_monitoring_configs()
# fetch a single feature monitoring config by name
fm_config = fg.get_feature_monitoring_configs(name="my_config")
# fetch all feature monitoring configs attached to a particular feature
fm_configs = fg.get_feature_monitoring_configs(feature_name="my_feature")
# fetch a single feature monitoring config with a given id
fm_config = fg.get_feature_monitoring_configs(config_id=1)
Arguments
- name
Optional[str]
: If provided fetch only the feature monitoring config with the given name. Defaults to None. - feature_name
Optional[str]
: If provided, fetch only configs attached to a particular feature. Defaults to None. - config_id
Optional[int]
: If provided, fetch only the feature monitoring config with the given id. Defaults to None.
Raises
hsfs.client.exceptions.RestAPIError
.
hsfs.client.exceptions.FeatureStoreException
.
- ValueError: if both name and feature_name are provided.
- TypeError: if name or feature_name are not string or None.
Return
Union[FeatureMonitoringConfig
, List[FeatureMonitoringConfig
], None]
A list of feature monitoring configs. If name provided,
returns either a single config or None if not found.
Retrieval from Feature View#
get_feature_monitoring_configs#
FeatureView.get_feature_monitoring_configs(name=None, feature_name=None, config_id=None)
Fetch feature monitoring configs attached to the feature view. If no arguments is provided the method will return all feature monitoring configs attached to the feature view, meaning all feature monitoring configs that are attach to a feature in the feature view. If you wish to fetch a single config, provide the its name. If you wish to fetch all configs attached to a particular feature, provide the feature name.
Example
# fetch your feature view
fv = fs.get_feature_view(name="my_feature_view", version=1)
# fetch all feature monitoring configs attached to the feature view
fm_configs = fv.get_feature_monitoring_configs()
# fetch a single feature monitoring config by name
fm_config = fv.get_feature_monitoring_configs(name="my_config")
# fetch all feature monitoring configs attached to a particular feature
fm_configs = fv.get_feature_monitoring_configs(feature_name="my_feature")
# fetch a single feature monitoring config with a particular id
fm_config = fv.get_feature_monitoring_configs(config_id=1)
Arguments
- name
Optional[str]
: If provided fetch only the feature monitoring config with the given name. Defaults to None. - feature_name
Optional[str]
: If provided, fetch only configs attached to a particular feature. Defaults to None. - config_id
Optional[int]
: If provided, fetch only the feature monitoring config with the given id. Defaults to None.
Raises
hsfs.client.exceptions.RestAPIError
.
hsfs.client.exceptions.FeatureStoreException
.
- ValueError: if both name and feature_name are provided.
- TypeError: if name or feature_name are not string or None.
Return
Union[FeatureMonitoringConfig
, List[FeatureMonitoringConfig
], None]
A list of feature monitoring configs. If name provided,
returns either a single config or None if not found.
Properties#
description#
Description of the feature monitoring configuration.
detection_window_config#
Configuration for the detection window.
enabled#
Controls whether or not this config is spawning new feature monitoring jobs. This field belongs to the scheduler configuration but is made transparent to the user for convenience.
feature_group_id#
Id of the Feature Group to which this feature monitoring configuration is attached.
feature_monitoring_type#
The type of feature monitoring to perform. Used for internal validation. Options are: - STATISTICS_COMPUTATION if no reference window (and, therefore, comparison config) is provided - STATISTICS_COMPARISON if a reference window (and, therefore, comparison config) is provided.
This property is read-only.
feature_name#
The name of the feature to monitor. If not set, all features of the Feature Group or Feature View are monitored, only available for scheduled statistics.
This property is read-only
feature_store_id#
Id of the Feature Store.
feature_view_name#
Name of the Feature View to which this feature monitoring configuration is attached.
feature_view_version#
Version of the Feature View to which this feature monitoring configuration is attached.
id#
Id of the feature monitoring configuration.
job_name#
Name of the feature monitoring job.
job_schedule#
Schedule of the feature monitoring job. This field belongs to the job configuration but is made transparent to the user for convenience.
name#
The name of the feature monitoring config. A Feature Group or Feature View cannot have multiple feature monitoring configurations with the same name. The name of a feature monitoring configuration is limited to 63 characters.
This property is read-only once the feature monitoring configuration has been saved.
reference_window_config#
Configuration for the reference window.
statistics_comparison_config#
Configuration for the comparison of detection and reference statistics.
Methods#
compare_on#
FeatureMonitoringConfig.compare_on(metric, threshold, strict=False, relative=False)
Sets the statistics comparison criteria for feature monitoring with a reference window.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Setup feature monitoring, a detection window and a reference window
my_monitoring_config = fg.create_feature_monitoring(
...
).with_detection_window(...).with_reference_window(...)
# Choose a metric and set a threshold for the difference
# e.g compare the relative mean of detection and reference window
my_monitoring_config.compare_on(
metric="mean",
threshold=1.0,
relative=True,
).save()
Note
Detection window and reference window/value/training_dataset must be set prior to comparison configuration.
Arguments
- metric
Optional[str]
: The metric to use for comparison. Different metric are available for different feature type. - threshold
Optional[float]
: The threshold to apply to the difference to potentially trigger an alert. - strict
Optional[bool]
: Whether to use a strict comparison (e.g. > or <) or a non-strict comparison (e.g. >= or <=). - relative
Optional[bool]
: Whether to use a relative comparison (e.g. relative mean) or an absolute comparison (e.g. absolute mean).
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.
delete#
FeatureMonitoringConfig.delete()
Deletes the feature monitoring configuration.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Delete the feature monitoring config
my_monitoring_config.delete()
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
disable#
FeatureMonitoringConfig.disable()
Disables the schedule of the feature monitoring job.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Disable the feature monitoring config
my_monitoring_config.disable()
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
enable#
FeatureMonitoringConfig.enable()
Enables the schedule of the feature monitoring job.
The scheduler can be configured via the job_schedule
property.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Enable the feature monitoring config
my_monitoring_config.enable()
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
get_history#
FeatureMonitoringConfig.get_history(start_time=None, end_time=None, with_statistics=True)
Fetch the history of the computed statistics and comparison results for this configuration.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Fetch the history of the computed statistics for this configuration
history = my_monitoring_config.get_history(
start_time="2021-01-01",
end_time="2021-01-31",
)
Args:
start_time: The start time of the time range to fetch the history for. end_time: The end time of the time range to fetch the history for. with_statistics: Whether to include the computed statistics in the results.
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
get_job#
FeatureMonitoringConfig.get_job()
Get the feature monitoring job which computes and compares statistics on the detection and reference windows.
Example
# Fetch registered config by name via feature group or feature view
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Get the job which computes statistics on detection and reference window
job = my_monitoring_config.get_job()
# Print job history and ongoing executions
job.executions
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
Returns
Job
. A handle for the job computing the statistics.
run_job#
FeatureMonitoringConfig.run_job()
Trigger the feature monitoring job which computes and compares statistics on the detection and reference windows.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Trigger the feature monitoring job once
my_monitoring_config.run_job()
Info
The feature monitoring job will be triggered asynchronously and the method will return immediately. Calling this method does not affect the ongoing schedule.
Raises
FeatureStoreException
: If the feature monitoring config has not been saved.
Returns
Job
. A handle for the job computing the statistics.
save#
FeatureMonitoringConfig.save()
Saves the feature monitoring configuration.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Setup feature monitoring and a detection window
my_monitoring_config = fg.create_statistics_monitoring(
name="my_monitoring_config",
).save()
Returns
FeatureMonitoringConfig
. The saved FeatureMonitoringConfig object.
update#
FeatureMonitoringConfig.update()
Updates allowed fields of the saved feature monitoring configuration.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Fetch registered config by name
my_monitoring_config = fg.get_feature_monitoring_configs(name="my_monitoring_config")
# Update the percentage of rows to use when computing the statistics
my_monitoring_config.detection_window.row_percentage = 10
my_monitoring_config.update()
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.
with_detection_window#
FeatureMonitoringConfig.with_detection_window(
time_offset=None, window_length=None, row_percentage=None
)
Sets the detection window of data to compute statistics on.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Compute statistics on a regular basis
fg.create_statistics_monitoring(
name="regular_stats",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
time_offset="1d",
window_length="1d",
row_percentage=0.1,
).save()
# Compute and compare statistics
fg.create_feature_monitoring(
name="regular_stats",
feature_name="my_feature",
cron_expression="0 0 12 ? * * *",
).with_detection_window(
time_offset="1d",
window_length="1d",
row_percentage=0.1,
).with_reference_window(...).compare_on(...).save()
Arguments
- time_offset
Optional[str]
: The time offset from the current time to the start of the time window. - window_length
Optional[str]
: The length of the time window. - row_percentage
Optional[float]
: The fraction of rows to use when computing the statistics [0, 1.0].
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.
with_reference_training_dataset#
FeatureMonitoringConfig.with_reference_training_dataset(training_dataset_version=None)
Sets the reference training dataset to compare statistics with.
See also with_reference_value(...)
and with_reference_window(...)
for other reference options.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Setup feature monitoring and a detection window
my_monitoring_config = fg.create_feature_monitoring(...).with_detection_window(...)
# Only for feature views: Compare to the statistics computed for one of your training datasets
# particularly useful if it has been used to train a model currently in production
my_monitoring_config.with_reference_training_dataset(
training_dataset_version=3,
).compare_on(...).save()
Provide a comparison configuration
You must provide a comparison configuration via compare_on()
before saving the feature monitoring config.
Arguments
- training_dataset_version
Optional[int]
: The version of the training dataset to use as reference.
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.
with_reference_value#
FeatureMonitoringConfig.with_reference_value(value=None)
Sets the reference value to compare statistics with.
See also with_reference_window(...)
and with_reference_training_dataset(...)
for other reference options.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Setup feature monitoring and a detection window
my_monitoring_config = fg.create_feature_monitoring(...).with_detection_window(...)
# Simplest reference window is a specific value
my_monitoring_config.with_reference_value(
value=0.0,
).compare_on(...).save()
Provide a comparison configuration
You must provide a comparison configuration via compare_on()
before saving the feature monitoring config.
Arguments
- value
Optional[Union[float, int]]
: A float value to use as reference.
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.
with_reference_window#
FeatureMonitoringConfig.with_reference_window(
time_offset=None, window_length=None, row_percentage=None
)
Sets the reference window of data to compute statistics on.
See also with_reference_value(...)
and with_reference_training_dataset(...)
for other reference options.
Example
# Fetch your feature group or feature view
fg = fs.get_feature_group(name="my_feature_group", version=1)
# Setup feature monitoring and a detection window
my_monitoring_config = fg.create_feature_monitoring(...).with_detection_window(...)
# Statistics computed on a rolling time window, e.g. same day last week
my_monitoring_config.with_reference_window(
time_offset="1w",
window_length="1d",
).compare_on(...).save()
Provide a comparison configuration
You must provide a comparison configuration via compare_on()
before saving the feature monitoring config.
Arguments
- time_offset
Optional[str]
: The time offset from the current time to the start of the time window. - window_length
Optional[str]
: The length of the time window. - row_percentage
Optional[float]
: The percentage of rows to use when computing the statistics. Defaults to 20%.
Returns
FeatureMonitoringConfig
. The updated FeatureMonitoringConfig object.