Helper columns#
Hopsworks Feature Store provides a functionality to define two types of helper columns inference_helper_columns
and training_helper_columns
for feature views.
Note
Both inference and training helper column name(s) must be part of the Query
object. If helper column name(s) belong to feature group that is part of a Join
with prefix
defined, then this prefix needs to prepended to the original column name when defining helper column list.
Inference Helper columns#
inference_helper_columns
are a list of feature names that are not used for training the model itself but are used for extra information during online or batch inference. For example computing on-demand feature like distance between previous and current place of transaction loc_delta_t_minus_1
in credit card fraud detection system. Feature loc_delta_t_minus_1
will be computed using previous transaction coordinates longitude
and latitude
that needs to fetched from the feature store and compared to the new transaction coordinates that arrives at inference application. In this use case longitude
and latitude
are inference_helper_columns
. They are not used for training but are necessary for computing on-demand feature loc_delta_t_minus_1
.
Define inference columns for feature views.
# define query object
query = label_fg.select("fraud_label")\
.join(trans_fg.select(["amount", "loc_delta_t_minus_1", "longitude", "latitude", "category"]))
# define feature view with helper columns
feature_view = fs.get_or_create_feature_view(
name='fv_with_helper_col',
version=1,
query=query,
labels=["fraud_label"],
transformation_functions=transformation_functions,
inference_helper_columns=["longitude", "latitude"],
)
Retrieval#
When retrieving data for model inference, helper columns will be omitted. However, they can be optionally fetched with inference or training data.
Batch inference#
Fetch inference helper column values and compute on-demand features during batch inference.
# import feature functions
from feature_functions import location_delta, time_delta
# Fetch feature view object
feature_view = fs.get_feature_view(
name='fv_with_helper_col',
version=1,
)
# Fetch feature data for batch inference with helper columns
df = feature_view.get_batch_data(start_time=start_time, end_time=end_time, inference_helpers=True)
df['longitude_prev'] = df['longitude'].shift(-1)
df['latitute_prev'] = df['latitute'].shift(-1)
# compute location delta
df['loc_delta_t_minus_1'] = df.apply(lambda row: location_delta(row['longitude'],
row['latitute'],
row['longitude_prev'],
row['latitute_prev']), axis=1)
# prepare datatame for prediction
df = df[[f.name for f in feature_view.features if not (f.label or f.inference_helper_column or f.training_helper_column)]]
Online inference#
Fetch inference helper column values and compute on-demand features during online inference.
from feature_functions import location_delta, time_delta
# Fetch feature view object
feature_view = fs.get_feature_view(
name='fv_with_helper_col',
version=1,
)
# Fetch feature data for batch inference without helper columns
df_without_inference_helpers = feature_view.get_batch_data()
# Fetch feature data for batch inference with helper columns
df_with_inference_helpers = feature_view.get_batch_data(inference_helpers=True)
# here cc_num, longitute and lattitude are provided as parameters to the application
cc_num = ...
longitude = ...
latitute = ...
# get previous transaction location of this credit card
inference_helper = feature_view.get_inference_helper({"cc_num": cc_num}, return_type="dict")
# compute location delta
loc_delta_t_minus_1 = location_delta(longitude,
latitute,
inference_helper['longitude'],
inference_helper['latitute'])
# Now get assembled feature vector for prediction
feature_vector = feature_view.get_feature_vector({"cc_num": cc_num},
passed_features={"loc_delta_t_minus_1": loc_delta_t_minus_1}
)
Training Helper columns#
training_helper_columns
are a list of feature names that are not the part of the model schema itself but are used during training for the extra information. For example one might want to use feature like category
of the purchased product to assign different weights.
Define training helper columns for feature views.
# define query object
query = label_fg.select("fraud_label")\
.join(trans_fg.select(["amount", "loc_delta_t_minus_1", "longitude", "latitude", "category"]))
# define feature view with helper columns
feature_view = fs.get_or_create_feature_view(
name='fv_with_helper_col',
version=1,
query=query,
labels=["fraud_label"],
transformation_functions=transformation_functions,
training_helper_columns=["category"]
)
Retrieval#
When retrieving training data helper columns will be omitted. However, they can be optionally fetched.
Fetch training data with or without inference helper column values.
# import feature functions
from feature_functions import location_delta, time_delta
# Fetch feature view object
feature_view = fs.get_feature_view(
name='fv_with_helper_col',
version=1,
)
# Create and training data with training helper columns
TEST_SIZE = 0.2
X_train, X_test, y_train, y_test = feature_view.train_test_split(
description='transactions fraud training dataset',
test_size=TEST_SIZE,
training_helper_columns=True
)
# Get existing training data with training helper columns
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(
training_dataset_version=1,
training_helper_columns=True
)
Note
To use helper columns with materialized training dataset it needs to be created with training_helper_columns=True
.