Skip to content

hsfs.hopsworks_udf #

[source] HopsworksUdf #

Meta data for user defined functions.

Stores meta data required to execute the user defined function in both spark and python engine. The class generates uses the metadata to dynamically generate user defined functions based on the engine it is executed in.

PARAMETER DESCRIPTION
func

The transformation function object or the source code of the transformation function.

TYPE: Callable | str

return_types

A python type or a list of python types that denotes the data types of the columns output from the transformation functions.

TYPE: list[type] | type | list[str] | str

name

Name of the transformation function.

TYPE: str | None DEFAULT: None

transformation_features

A list of objects of TransformationFeature that maps the feature used for transformation to their corresponding statistics argument names if any.

TYPE: list[TransformationFeature] | None DEFAULT: None

transformation_function_argument_names

The argument names of the transformation function.

TYPE: list[str] | None DEFAULT: None

dropped_argument_names

The arguments to be dropped from the finial DataFrame after the transformation functions are applied.

TYPE: list[str] | None DEFAULT: None

dropped_feature_names

The feature name corresponding to the arguments names that are dropped.

TYPE: list[str] | None DEFAULT: None

feature_name_prefix

Prefixes if any used in the feature view.

TYPE: str | None DEFAULT: None

output_column_names

The names of the output columns returned from the transformation function.

TYPE: str | None DEFAULT: None

generate_output_col_names

Generate default output column names for the transformation function.

TYPE: bool DEFAULT: True

[source] return_types property #

return_types: list[str]

Get the output types of the UDF.

[source] function_name property #

function_name: str

Get the function name of the UDF.

[source] statistics_required property #

statistics_required: bool

Get if statistics for any feature is required by the UDF.

[source] transformation_statistics property writable #

transformation_statistics: TransformationStatistics | None

Feature statistics required for the defined UDF.

[source] output_column_names property writable #

output_column_names: list[str]

Output columns names of the transformation function.

[source] transformation_features property #

transformation_features: list[str]

List of feature names to be used in the User Defined Function.

[source] unprefixed_transformation_features property #

unprefixed_transformation_features: list[str]

List of feature name used in the transformation function without the feature name prefix.

[source] feature_name_prefix property #

feature_name_prefix: str | None

The feature name prefix that needs to be added to the feature names.

[source] statistics_features property #

statistics_features: list[str]

List of feature names that require statistics.

[source] dropped_features property writable #

dropped_features: list[str]

List of features that will be dropped after the UDF is applied.

[source] transformation_context property writable #

transformation_context: dict[str, Any]

Dictionary that contains the context variables required for the UDF.

These context variables passed to the UDF during execution.

[source] alias #

alias(*args: str)

Set the names of the transformed features output by the UDF.

[source] execute #

execute(*args) -> Any

Execute the UDF directly with the provided arguments.

This is a convenience method for quick testing of simple UDFs that don't require statistics or transformation context. It executes the UDF in offline mode (batch processing).

Quick UDF testing

@udf(return_type=float)
def add_one(value):
    return value + 1

# Direct execution for simple tests
result = add_one.execute(pd.Series([1.0, 2.0, 3.0]))
assert result.tolist() == [2.0, 3.0, 4.0]

Note

For UDFs that require statistics or transformation context or need to be executed in online mode, use [executor()][hsfs.hopsworks_udf.HopsworksUdf.executor] instead:

result = my_udf.executor(statistics=stats, context=ctx).execute(data)

PARAMETER DESCRIPTION
*args

Input arguments matching the UDF's parameter signature. For batch processing, pass pandas Series or DataFrames.

DEFAULT: ()

RETURNS DESCRIPTION
Any

The transformed values.

Any
  • pd.Series - Single output Pandas UDFs.
Any
  • pd.DataFrame - Multi-output Pandas UDFs.
Any
  • int | float | str | bool | datetime | time | date - Single output Python UDFs.
Any
  • tuple[int | float | str | bool | datetime | time | date] - Multi-output Python UDFs.

[source] json #

json() -> str

Convert class into its json serialized form.

RETURNS DESCRIPTION
str

JSON serialized object.

[source] from_response_json classmethod #

from_response_json(
    json_dict: dict[str, Any],
) -> HopsworksUdf

Function that constructs the class object from its json serialization.

PARAMETER DESCRIPTION
json_dict

JSON serialized dictionary for the class.

TYPE: dict[str, Any]

RETURNS DESCRIPTION
HopsworksUdf

JSON deserialized class object.

[source] TransformationFeature dataclass #

Mapping of feature names to their corresponding statistics argument names in the code.

The statistic_argument_name for a feature name would be None if the feature does not need statistics.

PARAMETER DESCRIPTION
feature_name

Name of the feature.

TYPE: str

statistic_argument_name

Name of the statistics argument in the code for the feature specified in the feature name.

TYPE: str | None