FeatureGroup (hsfs-parent 3.3.0-RC2 API)

java.lang.Object
- com.logicalclocks.hsfs.FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
- - com.logicalclocks.hsfs.spark.FeatureGroup

public class FeatureGroup
extends FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Field Summary

Fields
Modifier and Type Field and Description

protected StatisticsEngine statisticsEngine
- Fields inherited from class com.logicalclocks.hsfs.FeatureGroupBase
  created, creator, deltaStreamerJobConf, description, eventTime, expectationsNames, featureGroupEngineBase, features, featureStore, hudiPrecombineKey, id, location, LOGGER, name, onlineEnabled, onlineTopicName, partitionKeys, primaryKeys, statisticColumns, statisticsConfig, subject, timeTravelFormat, type, utils, version

Fields
Modifier and Type	Field and Description
`protected StatisticsEngine`	`statisticsEngine`

Constructor Summary

Constructors
Constructor and Description
`FeatureGroup()`
`FeatureGroup(FeatureStore featureStore, int id)`
`FeatureGroup(FeatureStore featureStore, Integer id)`
`FeatureGroup(FeatureStore featureStore, @NonNull String name, Integer version, String description, List<String> primaryKeys, List<String> partitionKeys, String hudiPrecombineKey, boolean onlineEnabled, TimeTravelFormat timeTravelFormat, List<Feature> features, StatisticsConfig statisticsConfig, String onlineTopicName, String eventTime)`
`FeatureGroup(Integer id, String description, List<Feature> features)`

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`void`	`appendFeatures(Feature features)` Append a single feature to the schema of the feature group.
`void`	`appendFeatures(List<Feature> features)` Append features to the schema of the feature group.
`Query`	`asOf(String wallclockTime)` Get Query object to retrieve all features of the group at a point in the past.
`Query`	`asOf(String wallclockTime, String excludeUntil)` Get Query object to retrieve all features of the group at a point in the past.
`void`	`commitDeleteRecord(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)` Drops records present in the provided DataFrame and commits it as update to this Feature group.
`void`	`commitDeleteRecord(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Map<String,String> writeOptions)` Drops records present in the provided DataFrame and commits it as update to this Feature group.
`Map<Long,Map<String,String>>`	`commitDetails()` Retrieves commit timeline for this feature group.
`Map<Long,Map<String,String>>`	`commitDetails(Integer limit)` Retrieves commit timeline for this feature group.
`Map<Long,Map<String,String>>`	`commitDetails(String wallclockTime)` Return commit details as of specific point in time.
`Map<Long,Map<String,String>>`	`commitDetails(String wallclockTime, Integer limit)` Return commit details as of specific point in time.
`Statistics`	`computeStatistics()` Recompute the statistics for the feature group and save them to the feature store.
`Statistics`	`computeStatistics(String wallclockTime)` Recompute the statistics for the feature group and save them to the feature store.
`Statistics`	`getStatistics()` Get the last statistics commit for the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, boolean overwrite)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, boolean overwrite, Map<String,String> writeOptions)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, boolean overwrite, Map<String,String> writeOptions, JobConfiguration jobConfiguration)`
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, HudiOperationType operation)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, JobConfiguration jobConfiguration)`
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Map<String,String> writeOptions)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Storage storage)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Storage storage, boolean overwrite)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`void`	`insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Storage storage, boolean overwrite, HudiOperationType operation, Map<String,String> writeOptions)` Incrementally insert data to a feature group or overwrite all data contained in the feature group.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)` Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Map<String,String> writeOptions)` Deprecated.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName)` Deprecated.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, Map<String,String> writeOptions)`
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode)` Deprecated.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, boolean awaitTermination, Long timeout)` Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, boolean awaitTermination, Long timeout, String checkpointLocation)` Deprecated.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, boolean awaitTermination, Long timeout, String checkpointLocation, Map<String,String> writeOptions)` Deprecated.
`Object`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, boolean awaitTermination, Long timeout, String checkpointLocation, Map<String,String> writeOptions, JobConfiguration jobConfiguration)`
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, boolean awaitTermination, String checkpointLocation)` Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.
`org.apache.spark.sql.streaming.StreamingQuery`	`insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, String queryName, String outputMode, String checkpointLocation)` Deprecated.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read()` Reads the feature group from the offline storage as Spark DataFrame.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(boolean online)` Reads the feature group from the offline or online storage as Spark DataFrame.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(boolean online, Map<String,String> readOptions)` Reads the feature group from the offline or online storage as Spark DataFrame.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(Map<String,String> readOptions)` Reads the feature group from the offline storage as Spark DataFrame.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(String wallclockTime)` Reads Feature group into a dataframe at a specific point in time.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(String wallclockTime, Map<String,String> readOptions)` Reads Feature group into a dataframe at a specific point in time.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`readChanges(String wallclockStartTime, String wallclockEndTime)` Deprecated.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`readChanges(String wallclockStartTime, String wallclockEndTime, Map<String,String> readOptions)` Deprecated.
`void`	`save(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)` Deprecated.
`void`	`save(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData, Map<String,String> writeOptions)` Deprecated.
`Query`	`select(List<String> features)` Select a subset of features of the feature group and return a query object.
`Query`	`selectAll()` Select all features of the feature group and return a query object.
`Query`	`selectExcept(List<String> features)` Select all features including primary key and event time feature of the feature group except provided `features` and return a query object.
`Query`	`selectExceptFeatures(List<Feature> features)` Select all features including primary key and event time feature of the feature group except provided `features` and return a query object.
`Query`	`selectFeatures(List<Feature> features)` Select a subset of features of the feature group and return a query object.
`void`	`show(int numRows)` Show the first `n` rows of the feature group.
`void`	`show(int numRows, boolean online)` Show the first `n` rows of the feature group.
`void`	`updateFeatures(Feature feature)` Update the metadata of feature.
`void`	`updateFeatures(List<Feature> features)` Update the metadata of multiple features.

Methods inherited from class com.logicalclocks.hsfs.FeatureGroupBase
addTag, delete, deleteTag, getAvroSchema, getComplexFeatures, getDeserializedAvroSchema, getDeserializedEncodedAvroSchema, getEncodedAvroSchema, getFeature, getFeatureAvroSchema, getPrimaryKeys, getSubject, getTag, getTags, unloadSubject, updateDescription, updateFeatureDescription, updateStatisticsConfig

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

statisticsEngine

protected StatisticsEngine statisticsEngine

Constructor Detail

FeatureGroup

public FeatureGroup(FeatureStore featureStore,
                    @NonNull
                    @NonNull String name,
                    Integer version,
                    String description,
                    List<String> primaryKeys,
                    List<String> partitionKeys,
                    String hudiPrecombineKey,
                    boolean onlineEnabled,
                    TimeTravelFormat timeTravelFormat,
                    List<Feature> features,
                    StatisticsConfig statisticsConfig,
                    String onlineTopicName,
                    String eventTime)

FeatureGroup
```
public FeatureGroup()
```

FeatureGroup

public FeatureGroup(FeatureStore featureStore,
                    Integer id)

FeatureGroup

public FeatureGroup(Integer id,
                    String description,
                    List<Feature> features)

FeatureGroup

public FeatureGroup(FeatureStore featureStore,
                    int id)

Method Detail

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read()
                                                            throws FeatureStoreException,
                                                                   IOException

Reads the feature group from the offline storage as Spark DataFrame.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // read feature group
        fg.read()

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: Spark DataFrame containing the feature data.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group;; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(boolean online)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads the feature group from the offline or online storage as Spark DataFrame.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // read feature group data from online storage
        fg.read(true)
        // read feature group data from offline storage
        fg.read(false)

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: online - Set `online` to `true` to read from the online storage.
Returns:: Spark DataFrame containing the feature data.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group;; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(Map<String,String> readOptions)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads the feature group from the offline storage as Spark DataFrame.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // define additional read options (this example applies to HUDI enabled FGs)
        Map<String, String> readOptions = new HashMap<String, String>() {{
                                                  put("hoodie.datasource.read.end.instanttime", "20230401211015")
                                                }};
        // read feature group data
        fg.read(readOptions)

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: readOptions - Additional read options as key/value pairs.
Returns:: Spark DataFrame containing the feature data.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(boolean online,
                                                                   Map<String,String> readOptions)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads the feature group from the offline or online storage as Spark DataFrame.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // define additional read options (this example applies to HUDI enabled FGs)
        Map<String, String> readOptions = new HashMap<String, String>() {{
                                                  put("hoodie.datasource.read.end.instanttime", "20230401211015")
                                                }};
        // read feature group data from online storage
        fg.read(true, readOptions)
        // read feature group data from offline storage
        fg.read(false, readOptions)

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: online - Set `online` to `true` to read from the online storage.; readOptions - Additional read options as key/value pairs.
Returns:: Spark DataFrame containing the feature data.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(String wallclockTime)
                                                            throws FeatureStoreException,
                                                                   IOException,
                                                                   ParseException

Reads Feature group into a dataframe at a specific point in time.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // read feature group data as of specific point in time (Hudi commit timestamp).
        fg.read("20230205210923")

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: wallclockTime - Read data as of this point in time. Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.
Returns:: Spark DataFrame containing feature data.
Throws:: FeatureStoreException - In case it's unable to identify format of the provided wallclockTime date format; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided wallclockTime to date type.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(String wallclockTime,
                                                                   Map<String,String> readOptions)
                                                            throws FeatureStoreException,
                                                                   IOException,
                                                                   ParseException

Reads Feature group into a dataframe at a specific point in time.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // define additional read options (this example applies to HUDI enabled FGs)
        Map<String, String> readOptions = new HashMap<String, String>() {{
                                                  put("hoodie.datasource.read.end.instanttime", "20230401211015")
                                                }};
        // read feature group data as of specific point in time (Hudi commit timestamp).
        fg.read("20230205210923", readOptions)

Specified by:: read in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: wallclockTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; readOptions - Additional read options as key-value pairs.
Returns:: Spark DataFrame containing feature data.
Throws:: FeatureStoreException - In case it's unable to identify format of the provided wallclockTime date format; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided wallclockTime to date type.

readChanges

@Deprecated
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> readChanges(String wallclockStartTime,
                                                                                      String wallclockEndTime)
                                                                               throws FeatureStoreException,
                                                                                      IOException,
                                                                                      ParseException

Deprecated.

Throws:: FeatureStoreException; IOException; ParseException

readChanges

@Deprecated
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> readChanges(String wallclockStartTime,
                                                                                      String wallclockEndTime,
                                                                                      Map<String,String> readOptions)
                                                                               throws FeatureStoreException,
                                                                                      IOException,
                                                                                      ParseException

Deprecated.

Throws:: FeatureStoreException; IOException; ParseException

asOf
```
public Query asOf(String wallclockTime)
           throws FeatureStoreException,
                  ParseException
```
Get Query object to retrieve all features of the group at a point in the past. This method selects all features in the feature group and returns a Query object at the specified point in time. This can then either be read into a Dataframe or used further to perform joins or construct a training dataset.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get query object to retrieve feature group feature data as of
        // specific point in time (Hudi commit timestamp).
        fg.asOf("20230205210923")
 
 
```
Specified by:

asOf in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

wallclockTime - Read data as of this point in time. Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

Returns:

Query. The query object with the applied time travel condition

Throws:

FeatureStoreException - In case it's unable to identify format of the provided wallclockTime date format

ParseException - In case it's unable to parse provided wallclockTime to date type.

asOf
```
public Query asOf(String wallclockTime,
                  String excludeUntil)
           throws FeatureStoreException,
                  ParseException
```
Get Query object to retrieve all features of the group at a point in the past. This method selects all features in the feature group and returns a Query object at the specified point in time. This can then either be read into a Dataframe or used further to perform joins or construct a training dataset.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get query object to retrieve feature group feature data as of specific point in time "20230205210923"
        // but exclude commits until "20230204073411"  (Hudi commit timestamp).
        fg.asOf("20230205210923", "20230204073411")
 
 
```
Specified by:

asOf in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

wallclockTime - Read data as of this point in time. Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

excludeUntil - Exclude commits until this point in time. Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

Returns:

Query. The query object with the applied time travel condition

Throws:

FeatureStoreException - In case it's unable to identify format of the provided wallclockTime date format.

ParseException - In case it's unable to parse provided wallclockTime to date type.

show

public void show(int numRows)
          throws FeatureStoreException,
                 IOException

Show the first `n` rows of the feature group.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // show top 5 lines of feature group data.
        fg.show(5);

Specified by:: show in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: numRows - Number of rows to show.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group;; IOException - Generic IO exception.

show

public void show(int numRows,
                 boolean online)
          throws FeatureStoreException,
                 IOException

Show the first `n` rows of the feature group.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // show top 5 lines of feature data from online storage.
        fg.show(5, true);

Specified by:: show in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: numRows - Number of rows to show.; online - If `true` read from online feature store.
Throws:: FeatureStoreException - In case it cannot run read query on storage and/or no commit information was found for this feature group;; IOException - Generic IO exception.

save

@Deprecated
public void save(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)
                      throws FeatureStoreException,
                             IOException,
                             ParseException

Deprecated.

Throws:: FeatureStoreException; IOException; ParseException

save

@Deprecated
public void save(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                             Map<String,String> writeOptions)
                      throws FeatureStoreException,
                             IOException,
                             ParseException

Deprecated.

Throws:: FeatureStoreException; IOException; ParseException

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)
            throws IOException,
                   FeatureStoreException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. The `features` dataframe can be a Spark DataFrame or RDD. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        //insert feature data
        fg.insert(featureData);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - spark DataFrame, RDD. Features to be saved.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert

public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   Map<String,String> writeOptions)
            throws FeatureStoreException,
                   IOException,
                   ParseException

Incrementally insert data to a feature group or overwrite all data contained in the feature group. The `features` dataframe can be a Spark DataFrame or RDD. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Define additional write options (this example applies to HUDI enabled FGs)
        Map<String, String> writeOptions = = new HashMap<String, String>() {{
                           put("hoodie.bulkinsert.shuffle.parallelism", "5");
                           put("hoodie.insert.shuffle.parallelism", "5");
                           put("hoodie.upsert.shuffle.parallelism", "5");}
                           };
        // insert feature data
        fg.insert(featureData, writeOptions);

Specified by:: insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark DataFrame, RDD. Features to be saved.; writeOptions - Additional write options as key-value pairs.
Throws:: IOException - Generic IO exception.; FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.; ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   Storage storage)
            throws IOException,
                   FeatureStoreException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // insert feature data in offline only
        fg.insert(featureData, Storage.OFFLINE);

        // Or insert feature data in online only
        fg.insert(featureData, Storage.ONLINE);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

storage - Overwrite default behaviour, write to offline storage only with `Storage.OFFLINE` or online only with `Storage.ONLINE`

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   boolean overwrite)
            throws IOException,
                   FeatureStoreException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // insert feature data and drop all data in the feature group before inserting new data
        fg.insert(featureData, true);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

overwrite - Drop all data in the feature group before inserting new data. This does not affect metadata.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   Storage storage,
                   boolean overwrite)
            throws IOException,
                   FeatureStoreException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // insert feature data in offline only and drop all data in the feature group before inserting new data
        fg.insert(featureData, Storage.OFFLINE, true);

        // Or insert feature data in online only and drop all data in the feature group before inserting new data
        fg.insert(featureData, Storage.ONLINE, true);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

storage - Overwrite default behaviour, write to offline storage only with `Storage.OFFLINE` or online only with `Storage.ONLINE`.

overwrite - Drop all data in the feature group before inserting new data. This does not affect metadata.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   boolean overwrite,
                   Map<String,String> writeOptions)
            throws FeatureStoreException,
                   IOException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Define additional write options (this example applies to HUDI enabled FGs)
        Map<String, String> writeOptions = = new HashMap<String, String>() {{
                           put("hoodie.bulkinsert.shuffle.parallelism", "5");
                           put("hoodie.insert.shuffle.parallelism", "5");
                           put("hoodie.upsert.shuffle.parallelism", "5");}
                           };
        // insert feature data and drop all data in the feature group before inserting new data
        fg.insert(featureData, true, writeOptions);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

overwrite - Drop all data in the feature group before inserting new data. This does not affect metadata.

writeOptions - Additional write options as key-value pairs.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   HudiOperationType operation)
            throws FeatureStoreException,
                   IOException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Define additional write options (this example applies to HUDI enabled FGs)
        Map<String, String> writeOptions = = new HashMap<String, String>() {{
                           put("hoodie.bulkinsert.shuffle.parallelism", "5");
                           put("hoodie.insert.shuffle.parallelism", "5");
                           put("hoodie.upsert.shuffle.parallelism", "5");}
                           };
        // insert feature data
        fg.insert(featureData, HudiOperationType.INSERT);

        // upsert feature data
        fg.insert(featureData, HudiOperationType.UPSERT);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

operation - commit operation type, INSERT or UPSERT.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert
```
public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   Storage storage,
                   boolean overwrite,
                   HudiOperationType operation,
                   Map<String,String> writeOptions)
            throws FeatureStoreException,
                   IOException,
                   ParseException
```
Incrementally insert data to a feature group or overwrite all data contained in the feature group. By default, the data is inserted into the offline storage as well as the online storage if the feature group is online enabled. To insert only into the online or offline storage specify Storage.ONLINE or Storage.OFFLINE respectively. The `features` dataframe can be a Spark DataFrame or RDD. If statistics are enabled, statistics are recomputed for the entire feature group. If feature group's time travel format is `HUDI` then `operation` argument can be either `insert` or `upsert`. If the feature group doesn't exist, the insert method will create the necessary metadata the first time it is invoked and write the specified `features` dataframe as feature group to the online/offline feature store.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Define additional write options (this example applies to HUDI enabled FGs)
        Map<String, String> writeOptions = = new HashMap<String, String>() {{
                           put("hoodie.bulkinsert.shuffle.parallelism", "5");
                           put("hoodie.insert.shuffle.parallelism", "5");
                           put("hoodie.upsert.shuffle.parallelism", "5");}
                           };
        // insert feature data in offline only with additional write options and drop all previous data before new
        // data is inserted
        fg.insert(featureData, Storage.OFFLINE, true, HudiOperationType.INSERT, writeOptions);
 
 
```
Specified by:

insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

featureData - Spark DataFrame, RDD. Features to be saved.

storage - Overwrite default behaviour, write to offline storage only with `Storage.OFFLINE` or online only with `Storage.ONLINE`.

overwrite - Drop all data in the feature group before inserting new data. This does not affect metadata.

operation - commit operation type, INSERT or UPSERT.

writeOptions - Additional write options as key-value pairs.

Throws:

IOException - Generic IO exception.

FeatureStoreException - If client is not connected to Hopsworks; cannot run read query on storage and/or can't reconcile HUDI schema.

ParseException - In case it's unable to parse HUDI commit date string to date type.

insert

public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   JobConfiguration jobConfiguration)
            throws FeatureStoreException,
                   IOException,
                   ParseException

Specified by:: insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; ParseException

insert

public void insert(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                   boolean overwrite,
                   Map<String,String> writeOptions,
                   JobConfiguration jobConfiguration)
            throws FeatureStoreException,
                   IOException,
                   ParseException

Specified by:: insert in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)
                                                                       throws org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              IOException,
                                                                              FeatureStoreException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.

insert streaming dataframe in the Feature group.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark dataframe containing feature data
Returns:: StreamingQuery
Throws:: org.apache.spark.sql.streaming.StreamingQueryException - StreamingQueryException; IOException - IOException; FeatureStoreException - FeatureStoreException; TimeoutException - TimeoutException; ParseException - ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName)
                                                                       throws org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              IOException,
                                                                              FeatureStoreException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: org.apache.spark.sql.streaming.StreamingQueryException; IOException; FeatureStoreException; TimeoutException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              Map<String,String> writeOptions)
                                                                       throws FeatureStoreException,
                                                                              IOException,
                                                                              org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

insertStream

public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                  String queryName,
                                                                  Map<String,String> writeOptions)
                                                           throws FeatureStoreException,
                                                                  IOException,
                                                                  org.apache.spark.sql.streaming.StreamingQueryException,
                                                                  TimeoutException,
                                                                  ParseException

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode)
                                                                       throws org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              IOException,
                                                                              FeatureStoreException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: org.apache.spark.sql.streaming.StreamingQueryException; IOException; FeatureStoreException; TimeoutException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode,
                                                                              String checkpointLocation)
                                                                       throws FeatureStoreException,
                                                                              IOException,
                                                                              org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode,
                                                                              boolean awaitTermination,
                                                                              Long timeout)
                                                                       throws org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              IOException,
                                                                              FeatureStoreException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.

insert streaming dataframe in the Feature group.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark dataframe containing feature data; queryName - name of spark StreamingQuery; outputMode - outputMode; awaitTermination - whether or not to wait for query Termination; timeout - timeout
Returns:: StreamingQuery
Throws:: org.apache.spark.sql.streaming.StreamingQueryException - StreamingQueryException; IOException - IOException; FeatureStoreException - FeatureStoreException; TimeoutException - TimeoutException; ParseException - ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode,
                                                                              boolean awaitTermination,
                                                                              Long timeout,
                                                                              String checkpointLocation)
                                                                       throws FeatureStoreException,
                                                                              IOException,
                                                                              org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode,
                                                                              boolean awaitTermination,
                                                                              String checkpointLocation)
                                                                       throws org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              IOException,
                                                                              FeatureStoreException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated. insertStream method is deprecated FeatureGroups. Full capability insertStream is available for StreamFeatureGroups.

insert streaming dataframe in the Feature group.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark dataframe containing feature data; queryName - name of spark StreamingQuery; outputMode - outputMode; awaitTermination - whether or not to wait for query Termination; checkpointLocation - path to checkpoint location directory
Returns:: StreamingQuery
Throws:: org.apache.spark.sql.streaming.StreamingQueryException - StreamingQueryException; IOException - IOException; FeatureStoreException - FeatureStoreException; TimeoutException - TimeoutException; ParseException - ParseException

insertStream

@Deprecated
public org.apache.spark.sql.streaming.StreamingQuery insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                                                                              String queryName,
                                                                              String outputMode,
                                                                              boolean awaitTermination,
                                                                              Long timeout,
                                                                              String checkpointLocation,
                                                                              Map<String,String> writeOptions)
                                                                       throws FeatureStoreException,
                                                                              IOException,
                                                                              org.apache.spark.sql.streaming.StreamingQueryException,
                                                                              TimeoutException,
                                                                              ParseException

Deprecated.

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

insertStream

public Object insertStream(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                           String queryName,
                           String outputMode,
                           boolean awaitTermination,
                           Long timeout,
                           String checkpointLocation,
                           Map<String,String> writeOptions,
                           JobConfiguration jobConfiguration)
                    throws FeatureStoreException,
                           IOException,
                           org.apache.spark.sql.streaming.StreamingQueryException,
                           TimeoutException,
                           ParseException

Specified by:: insertStream in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException; IOException; org.apache.spark.sql.streaming.StreamingQueryException; TimeoutException; ParseException

commitDeleteRecord

public void commitDeleteRecord(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData)
                        throws FeatureStoreException,
                               IOException,
                               ParseException

Drops records present in the provided DataFrame and commits it as update to this Feature group. This method can only be used on time travel enabled feature groups.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Drops records of feature data and commit
        fg.commitDeleteRecord(featureData);

Specified by:: commitDeleteRecord in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark DataFrame, RDD. Feature data to be deleted.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or no commit information was found for this feature group;; IOException - Generic IO exception.; ParseException - In case it's unable to parse HUDI commit date string to date type.

commitDeleteRecord

public void commitDeleteRecord(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> featureData,
                               Map<String,String> writeOptions)
                        throws FeatureStoreException,
                               IOException,
                               ParseException

Drops records present in the provided DataFrame and commits it as update to this Feature group. This method can only be used on time travel enabled feature groups.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // Define additional write options (this example applies to HUDI enabled FGs)
        Map<String, String> writeOptions = = new HashMap<String, String>() {{
                           put("hoodie.bulkinsert.shuffle.parallelism", "5");
                           put("hoodie.insert.shuffle.parallelism", "5");
                           put("hoodie.upsert.shuffle.parallelism", "5");}
                           };
        // Drops records of feature data and commit
        fg.commitDeleteRecord(featureData, writeOptions);

Specified by:: commitDeleteRecord in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureData - Spark DataFrame, RDD. Feature data to be deleted.; writeOptions - Additional write options as key-value pairs.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or no commit information was found for this feature group;; IOException - Generic IO exception.; ParseException - In case it's unable to parse HUDI commit date string to date type.

commitDetails

public Map<Long,Map<String,String>> commitDetails()
                                           throws IOException,
                                                  FeatureStoreException,
                                                  ParseException

Retrieves commit timeline for this feature group. This method can only be used on time travel enabled feature groups.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get commit timeline.
        fg.commitDetails();

Specified by:: commitDetails in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: commit details.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or no commit information was found for this feature group;; IOException - Generic IO exception.; ParseException - In case it's unable to parse HUDI commit date string to date type.

commitDetails

public Map<Long,Map<String,String>> commitDetails(Integer limit)
                                           throws IOException,
                                                  FeatureStoreException,
                                                  ParseException

Retrieves commit timeline for this feature group. This method can only be used on time travel enabled feature groups.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get latest 10 commit details.
        fg.commitDetails(10);

Specified by:: commitDetails in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: limit - number of commits to return.
Returns:: commit details.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or no commit information was found for this feature group;; IOException - Generic IO exception.; ParseException - In case it's unable to parse HUDI commit date string to date type.

commitDetails
```
public Map<Long,Map<String,String>> commitDetails(String wallclockTime)
                                           throws IOException,
                                                  FeatureStoreException,
                                                  ParseException
```
Return commit details as of specific point in time.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get commit details as of 20230206
        fg.commitDetails("20230206");
 
 
```
Specified by:

commitDetails in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

wallclockTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

Returns:

commit details.

Throws:

FeatureStoreException - If Client is not connected to Hopsworks, unable to identify format of the provided wallclockTime date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse HUDI commit date string to date type.

commitDetails

public Map<Long,Map<String,String>> commitDetails(String wallclockTime,
                                                  Integer limit)
                                           throws IOException,
                                                  FeatureStoreException,
                                                  ParseException

Return commit details as of specific point in time.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // get top 10 commit details as of 20230206
        fg.commitDetails("20230206", 10);

Specified by:: commitDetails in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: wallclockTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; limit - number of commits to return.
Returns:: commit details.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks, unable to identify format of the provided wallclockTime date format and/or no commit information was found for this feature group;; IOException - Generic IO exception.; ParseException - In case it's unable to parse HUDI commit date string to date type.

selectFeatures
```
public Query selectFeatures(List<Feature> features)
```
Select a subset of features of the feature group and return a query object. The query can be used to construct joins of feature groups or create a feature view with a subset of features of the feature group.

Specified by:

selectFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature meta data objects.

Returns:

Query object.

select
```
public Query select(List<String> features)
```
Select a subset of features of the feature group and return a query object. The query can be used to construct joins of feature groups or create a feature view with a subset of features of the feature group.

Specified by:

select in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature names.

Returns:

Query object.

selectAll
```
public Query selectAll()
```
Select all features of the feature group and return a query object. The query can be used to construct joins of feature groups or create a feature view with a subset of features of the feature group.

Specified by:

selectAll in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Returns:

Query object.

selectExceptFeatures
```
public Query selectExceptFeatures(List<Feature> features)
```
Select all features including primary key and event time feature of the feature group except provided `features` and return a query object. The query can be used to construct joins of feature groups or create a feature view with a subset of features of the feature group.

Specified by:

selectExceptFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature meta data objects.

Returns:

Query object.

selectExcept
```
public Query selectExcept(List<String> features)
```
Select all features including primary key and event time feature of the feature group except provided `features` and return a query object. The query can be used to construct joins of feature groups or create a feature view with a subset of features of the feature group.

Specified by:

selectExcept in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature names.

Returns:

Query object.

updateFeatures
```
public void updateFeatures(List<Feature> features)
                    throws FeatureStoreException,
                           IOException,
                           ParseException
```
Update the metadata of multiple features. Currently only feature description updates are supported.

Specified by:

updateFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature metadata objects

Throws:

FeatureStoreException - If Client is not connected to Hopsworks, unable to identify date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse date string to date type.

updateFeatures
```
public void updateFeatures(Feature feature)
                    throws FeatureStoreException,
                           IOException,
                           ParseException
```
Update the metadata of feature. Currently only feature description updates are supported.

Specified by:

updateFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

feature - Feature metadata object

Throws:

FeatureStoreException - If Client is not connected to Hopsworks, unable to identify date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse date string to date type.

appendFeatures
```
public void appendFeatures(List<Feature> features)
                    throws FeatureStoreException,
                           IOException,
                           ParseException
```
Append features to the schema of the feature group. It is only possible to append features to a feature group. Removing features is considered a breaking change.

Specified by:

appendFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - list of Feature metadata objects

Throws:

FeatureStoreException - If Client is not connected to Hopsworks, unable to identify date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse date string to date type.

appendFeatures
```
public void appendFeatures(Feature features)
                    throws FeatureStoreException,
                           IOException,
                           ParseException
```
Append a single feature to the schema of the feature group. It is only possible to append features to a feature group. Removing features is considered a breaking change.

Specified by:

appendFeatures in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

features - List of Feature metadata objects

Throws:

FeatureStoreException - If Client is not connected to Hopsworks, unable to identify date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse date string to date type.

computeStatistics
```
public Statistics computeStatistics()
                             throws FeatureStoreException,
                                    IOException
```
Recompute the statistics for the feature group and save them to the feature store.

Specified by:

computeStatistics in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Returns:

statistics object of computed statistics

Throws:

FeatureStoreException - If Client is not connected to Hopsworks,

IOException - Generic IO exception.

computeStatistics
```
public Statistics computeStatistics(String wallclockTime)
                             throws FeatureStoreException,
                                    IOException,
                                    ParseException
```
Recompute the statistics for the feature group and save them to the feature store. Statistics are only computed for data in the offline storage of the feature group.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature group handle
        FeatureGroup fg = fs.getFeatureGroup("electricity_prices", 1);
        // compute statistics as of 20230206
        fg.computeStatistics("20230206", 10);
 
 
```
Specified by:

computeStatistics in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

wallclockTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

Returns:

statistics object of computed statistics

Throws:

FeatureStoreException - In case Client is not connected to Hopsworks, unable to identify format of the provided wallclockTime date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

ParseException - In case it's unable to parse HUDI and or statistics commit date string to date type.

getStatistics
```
public Statistics getStatistics()
                         throws FeatureStoreException,
                                IOException
```
Get the last statistics commit for the feature group.

Specified by:

getStatistics in class FeatureGroupBase<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Returns:

statistics object of latest commit

Throws:

FeatureStoreException - In case Client is not connected to Hopsworks, unable to identify format of the provided wallclockTime date format and/or no commit information was found for this feature group;

IOException - Generic IO exception.

Class FeatureGroup

Field Summary

Fields inherited from class com.logicalclocks.hsfs.FeatureGroupBase

Constructor Summary

Method Summary

Methods inherited from class com.logicalclocks.hsfs.FeatureGroupBase

Methods inherited from class java.lang.Object

Field Detail

statisticsEngine

Constructor Detail

FeatureGroup

FeatureGroup

FeatureGroup

FeatureGroup

FeatureGroup

Method Detail

read

read

read

read

read

read

readChanges

readChanges

asOf

asOf

show

show

save

save

insert

insert

insert

insert

insert

insert

insert

insert

insert

insert

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

insertStream

commitDeleteRecord

commitDeleteRecord

commitDetails

commitDetails

commitDetails

commitDetails

selectFeatures

select

selectAll

selectExceptFeatures

selectExcept

updateFeatures

updateFeatures

appendFeatures

appendFeatures

computeStatistics

computeStatistics

getStatistics