hsfs.transformation_statistics #
[source] FeatureTransformationStatistics #
Data class that contains all the statistics parameters that can be used for transformations inside a custom transformation function.
[source] completeness property #
completeness: float | None
Fraction of non-null values in a column.
[source] approx_num_distinct_values property #
approx_num_distinct_values: int | None
Approximate number of distinct values.
[source] distinctness property #
distinctness: float | None
Fraction of distinct values of a feature over the number of all its values. Distinct values occur at least once.
Example
[a, a, b] contains two distinct values a and b, so distinctness is 2/3.
[source] entropy property #
entropy: float | None
Entropy is a measure of the level of information contained in an event (feature value) when considering all possible events (all feature values).
Entropy is estimated using observed value counts as the negative sum of (value_count/total_count) * log(value_count/total_count).
Example
[a, b, b, c, c] has three distinct values with counts [1, 2, 2].
Entropy is then (-1/5*log(1/5)-2/5*log(2/5)-2/5*log(2/5)) = 1.055.
[source] uniqueness property #
uniqueness: float | None
Fraction of unique values over the number of all values of a column. Unique values occur exactly once.
Example
[a, a, b] contains one unique value b, so uniqueness is 1/3.
[source] TransformationStatistics #
Class that stores feature transformation statistics of all features that require training dataset statistics in a transformation function.
All statistics for a feature is initially initialized with null values and will be populated with values when training dataset is created for the soe.
| PARAMETER | DESCRIPTION |
|---|---|
*features |
TYPE: |
Example
# Defining transformation statistics
transformation_statistics = TransformationStatistics("feature1", "feature2")
# Accessing feature transformation statistics for a specific feature
feature_transformation_statistics_feature1 = transformation_statistics.feature1