Transformation Functions#
HSFS provides functionality to attach transformation functions to training datasets.
To be able to attach a transformation function to a training dataset it has to be either part of the library installed in Hopsworks or attached when starting a Jupyter notebook or Hopsworks job.
Pyspark decorators.
Don't decorate transformation function with Pyspark @udf
or @pandas_udf
, as well as don't use any Pyspark dependencies.
HSFS will decorate transformation function only if it is used inside Pyspark application.
Examples#
Register transformation function plus_one
in the Hopsworks feature store.
from hsfs_transformers import transformers
plus_one_meta = fs.create_transformation_function(
transformation_function=transformers.plus_one,
output_type=int,
version=1)
plus_one_meta.save()
To retrieve all transformation functions from the feature store use get_transformation_functions
that will return list of TransformatioFunction
objects.
Specific transformation function can be retrieved by get_transformation_function
method where user can provide name and version of the transformation function.
If only name is provided then it will default to version 1.
Retrieving transformation functions from the feature store
# get all transformation functions
fs.get_transformation_functions()
# get transformation function by name. This will default to version 1
fs.get_transformation_function(name="plus_one")
# get transformation function by name and version.
fs.get_transformation_function(name="plus_one", version=2)
To attach transformation function to training dataset provide transformation functions as dict, where key is feature name and value is online transformation function name.
Also training dataset must be created from the Query object. Once attached transformation function will be applied on whenever save
, insert
and get_serving_vector
methods are called on training dataset object.
Attaching transformation functions to the training dataset
plus_one_meta = fs.get_transformation_function(name="plus_one", version=1)
fs.create_training_dataset(name="td_demo",
description="Dataset to train the demo model",
data_format="csv",
transformation_functions={"feature_name":plus_one_meta}
statistics_config=None,
version=1)
td.save(join_query)
Scala support
Creating and attaching Transformation functions to training datasets are not supported for hsfs scala client.
If training dataset with transformation function was created using python client and later insert
or getServingVector
methods are called on this training dataset from scala client hsfs will throw an exception.
Transfromation Function#
TransformationFunction#
hsfs.transformation_function.TransformationFunction(
featurestore_id,
transformation_fn=None,
version=None,
name=None,
source_code_content=None,
output_type=None,
id=None,
type=None,
items=None,
count=None,
href=None,
)
Properties#
id#
Training dataset id.
name#
output_type#
source_code_content#
transformation_fn#
transformer_code#
version#
Methods#
delete#
TransformationFunction.delete()
Delete transformation function from backend.
save#
TransformationFunction.save()
Persist transformation function in backend.
Creation#
create_transformation_function#
FeatureStore.create_transformation_function(transformation_function, output_type, version=None)
Create a transformation function metadata object.
Lazy
This method is lazy and does not persist the transformation function in the
feature store on its own. To materialize the transformation function and save
call the save()
method of the transformation function metadata object.
Arguments
- transformation_function
callable
: callable object. - output_type
Union[str, str, string, bytes, numpy.int8, int8, byte, numpy.int16, int16, short, int, int, numpy.int32, numpy.int64, int64, long, bigint, float, float, numpy.float64, float64, double, datetime.datetime, numpy.datetime64, datetime.date, bool, boolean, bool]
: python or numpy output type that will be inferred as pyspark.sql.types type.
Returns:
TransformationFunction
: The TransformationFunction metadata object.
Retrieval#
get_transformation_function#
FeatureStore.get_transformation_function(name, version=None)
Get transformation function metadata object.
Arguments
- name
str
: name of transformation function. - version
Optional[int]
: version of transformation function. Optional, if not provided all functions that match to provided name will be retrieved .
Returns:
TransformationFunction
: The TransformationFunction metadata object.
get_transformation_functions#
FeatureStore.get_transformation_functions()
Get all transformation functions metadata objects.
Returns:
List[TransformationFunction]
. List of transformation function instances.