Feature#

Features are the most granular entity in the feature store and are logically grouped by feature groups.

The storage location of a single feature is determined by the feature group. Hence, enabling a feature group for online storage will make a feature available as an online feature.

New features can be appended to feature groups, however, to drop features, a new feature group version has to be created. When appending features it is possible to specify a default value which is used for existing feature vectors in the feature group for the new feature.

[source]

Feature#

hsfs.feature.Feature(
    name,
    type=None,
    description=None,
    primary=None,
    partition=None,
    hudi_precombine_key=None,
    online_type=None,
    default_value=None,
    feature_group_id=None,
    feature_group=None,
)

Metadata object representing a feature in a feature group in the Feature Store.

See Training Dataset Feature for the feature representation of training dataset schemas.

Feature Types#

Each features requires at least an offline type to be specified for the creation of the meta data of the feature group in the offline storage, even if the feature group is going to be a purely online feature group with no data in the offline storage.

Offline Storage#

The offline storage is based on Apache Hive and hence, any Hive Data Type can be leveraged.

Type Inference

When creating a feature group from a Spark DataFrame, without providing a schema manually, the feature store will infer the schema with feature types from the DataFrame.

Potential offline types are:

"None","TINYINT", "SMALLINT", "INT", "BIGINT", "FLOAT", "DOUBLE",
"DECIMAL", "TIMESTAMP", "DATE", "STRING",
"BOOLEAN", "BINARY",
"ARRAY <TINYINT>", "ARRAY <SMALLINT>", "ARRAY <INT>", "ARRAY <BIGINT>",
"ARRAY <FLOAT>", "ARRAY <DOUBLE>", "ARRAY <DECIMAL>", "ARRAY <TIMESTAMP>",
"ARRAY <DATE>", "ARRAY <STRING>",
"ARRAY <BOOLEAN>", "ARRAY <BINARY>", "ARRAY <ARRAY <FLOAT> >",
"ARRAY <ARRAY <INT> >", "ARRAY <ARRAY <STRING> >",
"MAP <FLOAT, FLOAT>", "MAP <FLOAT, STRING>", "MAP <FLOAT, INT>",
"MAP <FLOAT, BINARY>", "MAP <INT, INT>", "MAP <INT, STRING>",
"MAP <INT, BINARY>", "MAP <INT, FLOAT>", "MAP <INT, ARRAY <FLOAT> >",
"STRUCT < label: STRING, index: INT >", "UNIONTYPE < STRING, INT>"

Online Storage#

The online storage is based on MySQL Cluster (NDB) and hence, any MySQL Data Type can be leveraged.

Type Inference

When creating a feature group from a Spark DataFrame, without providing a schema manually, the feature store will infer the schema with feature types from the DataFrame.

Potential online types are:

"None", "INT(11)", "TINYINT(1)", "SMALLINT(5)", "MEDIUMINT(7)", "BIGINT(20)",
"FLOAT", "DOUBLE", "DECIMAL", "DATE", "DATETIME", "TIMESTAMP", "TIME", "YEAR",
"CHAR", "VARCHAR(25)", "VARCHAR(125)", "VARCHAR(225)", "VARCHAR(500)",
"VARCHAR(1000)", "VARCHAR(2000)", "VARCHAR(5000)", "VARCHAR(10000)", "BINARY",
"VARBINARY(100)", "VARBINARY(500)", "VARBINARY(1000)", "BLOB", "TEXT",
"TINYBLOB", "TINYTEXT", "MEDIUMBLOB", "MEDIUMTEXT", "LONGBLOB", "LONGTEXT",
"JSON"

Properties#

[source]

default_value#

Default value of the feature as string, if the feature was appended to the feature group.

[source]

description#

Description of the feature.

[source]

hudi_precombine_key#

Whether the feature is part of the hudi precombine key of the feature group.

[source]

name#

Name of the feature.

[source]

online_type#

Data type of the feature in the online feature store.

[source]

partition#

Whether the feature is part of the partition key of the feature group.

[source]

primary#

Whether the feature is part of the primary key of the feature group.

[source]

type#

Data type of the feature in the offline feature store.

Not a Python type

This type property is not to be confused with Python types. The type property represents the actual data type of the feature in the feature store.

Methods#

[source]

contains#

Feature.contains(other)

[source]

is_complex#

Feature.is_complex()

Returns true if the feature has a complex type.

[source]

json#

Feature.json()