FeatureView (hsfs-parent 3.3.0-RC2 API)

java.lang.Object
- com.logicalclocks.hsfs.FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
- - com.logicalclocks.hsfs.spark.FeatureView

public class FeatureView
extends FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class FeatureView.FeatureViewBuilder

Nested Classes
Modifier and Type	Class and Description
`static class`	`FeatureView.FeatureViewBuilder`

Field Summary
- Fields inherited from class com.logicalclocks.hsfs.FeatureViewBase
  description, extraFilterVersion, features, featureStore, id, labels, LOGGER, name, query, type, vectorServer, version

Constructor Summary

Constructors
Constructor and Description
`FeatureView(@NonNull String name, Integer version, @NonNull Query query, String description, @NonNull FeatureStore featureStore, List<String> labels)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addTag(String name, Object value)` Add name/value tag to the feature view.
`void`	`addTrainingDatasetTag(Integer version, String name, Object value)` Add name/value tag to the training dataset.
`void`	`clean(FeatureStore featureStore, String featureViewName, Integer featureViewVersion)` Delete the feature view and all associated metadata and training data.
`Integer`	`createTrainingData(String startTime, String endTime, String description, DataFormat dataFormat)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`Integer`	`createTrainingData(String startTime, String endTime, String description, DataFormat dataFormat, Boolean coalesce, StorageConnector storageConnector, String location, Long seed, StatisticsConfig statisticsConfig, Map<String,String> writeOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`Integer`	`createTrainTestSplit(Float testSize, String trainStart, String trainEnd, String testStart, String testEnd, String description, DataFormat dataFormat)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`Integer`	`createTrainTestSplit(Float testSize, String trainStart, String trainEnd, String testStart, String testEnd, String description, DataFormat dataFormat, Boolean coalesce, StorageConnector storageConnector, String location, Long seed, StatisticsConfig statisticsConfig, Map<String,String> writeOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`Integer`	`createTrainValidationTestSplit(Float validationSize, Float testSize, String trainStart, String trainEnd, String validationStart, String validationEnd, String testStart, String testEnd, String description, DataFormat dataFormat)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`Integer`	`createTrainValidationTestSplit(Float validationSize, Float testSize, String trainStart, String trainEnd, String validationStart, String validationEnd, String testStart, String testEnd, String description, DataFormat dataFormat, Boolean coalesce, StorageConnector storageConnector, String location, Long seed, StatisticsConfig statisticsConfig, Map<String,String> writeOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and save the corresponding training data into `location`.
`void`	`delete()` Delete current feature view, all associated metadata and training data.
`void`	`deleteAllTrainingDatasets()` Delete all training datasets.
`void`	`deleteTag(String name)` Delete a tag of the feature view.
`void`	`deleteTrainingDataset(Integer version)` Delete a training dataset.
`void`	`deleteTrainingDatasetTag(Integer version, String name)` Delete a tag of the training dataset.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`getBatchData()` Get all data from the feature view as a batch from the offline feature store.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`getBatchData(String startTime, String endTime)` Get a batch of data from an event time interval from the offline feature store.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`getBatchData(String startTime, String endTime, Map<String,String> readOptions)` Get a batch of data from an event time interval from the offline feature store.
`String`	`getBatchQuery()` Get a query string of the batch query.
`String`	`getBatchQuery(String startTime, String endTime)` Get a query string of the batch query.
`HashSet<String>`	`getPrimaryKeys()` Get set of primary key names that is used as keys in input dict object for `getServingVector` method.
`Object`	`getTag(String name)` Get a single tag value of the feature view.
`Map<String,Object>`	`getTags()` Get all tags of the feature view.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainingData(Integer version)` Get training data created by `featureView.createTrainingData` or `featureView.trainingData`.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainingData(Integer version, Map<String,String> readOptions)` Get training data created by `featureView.createTrainingData` or `featureView.trainingData`.
`Object`	`getTrainingDatasetTag(Integer version, String name)` Get a single tag value of the training dataset.
`Map<String,Object>`	`getTrainingDatasetTags(Integer version)` Get all tags of the training dataset.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainTestSplit(Integer version)` Get training data created by `featureView.createTrainTestSplit` or `featureView.trainTestSplit`.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainTestSplit(Integer version, Map<String,String> readOptions)` Get training data created by `featureView.createTrainTestSplit` or `featureView.trainTestSplit`.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainValidationTestSplit(Integer version)` Get training data created by `featureView.createTrainValidationTestSplit` or featureView.trainValidationTestSplit`.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`getTrainValidationTestSplit(Integer version, Map<String,String> readOptions)` Get training data created by `featureView.createTrainValidationTestSplit` or featureView.trainValidationTestSplit`.
`void`	`purgeAllTrainingData()` Delete all training datasets in this feature view (data only).
`void`	`purgeTrainingData(Integer version)` Delete a training dataset (data only).
`void`	`recreateTrainingDataset(Integer version, Map<String,String> writeOptions)` Recreate a training dataset.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainingData(String startTime, String endTime, String description)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainingData(String startTime, String endTime, String description, Long seed, StatisticsConfig statisticsConfig, Map<String,String> readOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainTestSplit(Float testSize, String trainStart, String trainEnd, String testStart, String testEnd, String description)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainTestSplit(Float testSize, String trainStart, String trainEnd, String testStart, String testEnd, String description, Long seed, StatisticsConfig statisticsConfig, Map<String,String> readOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainValidationTestSplit(Float validationSize, Float testSize, String trainStart, String trainEnd, String validationStart, String validationEnd, String testStart, String testEnd, String description)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>`	`trainValidationTestSplit(Float validationSize, Float testSize, String trainStart, String trainEnd, String validationStart, String validationEnd, String testStart, String testEnd, String description, Long seed, StatisticsConfig statisticsConfig, Map<String,String> readOptions, FilterLogic extraFilterLogic, Filter extraFilter)` Create the metadata for a training dataset and get the corresponding training data from the offline feature store.
`FeatureView`	`update(FeatureView other)` Update the description of the feature view.

Methods inherited from class com.logicalclocks.hsfs.FeatureViewBase
getFeatureVector, getFeatureVector, getFeatureVectors, getFeatureVectors, initBatchScoring, initServing, initServing, validateTrainTestSplit, validateTrainValidationTestSplit

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

FeatureView

public FeatureView(@NonNull
                   @NonNull String name,
                   Integer version,
                   @NonNull
                   @NonNull Query query,
                   String description,
                   @NonNull
                   @NonNull FeatureStore featureStore,
                   List<String> labels)

Method Detail

delete

public void delete()
            throws FeatureStoreException,
                   IOException

Delete current feature view, all associated metadata and training data.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // delete feature view
        fv.delete();

Specified by:: delete in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException - In case client is not connected to Hopsworks.; IOException - Generic IO exception.

clean

public void clean(FeatureStore featureStore,
                  String featureViewName,
                  Integer featureViewVersion)
           throws FeatureStoreException,
                  IOException

Delete the feature view and all associated metadata and training data. This can delete corrupted feature view which cannot be retrieved due to a corrupted query for example.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // delete feature view
        fv.clean();

Specified by:: clean in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: featureStore - Feature store metadata object.; featureViewName - Name of feature view.; featureViewVersion - Version of feature view.
Throws:: FeatureStoreException - In case client is not connected to Hopsworks.; IOException - Generic IO exception.

update

public FeatureView update(FeatureView other)
                   throws FeatureStoreException,
                          IOException

Update the description of the feature view.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // update with new description
        fv.setDescription("Updated description");
        // delete feature view
        fv.update(fv);

Specified by:: update in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: other - Updated FeatureView metadata Object.
Returns:: FeatureView Metadata Object.
Throws:: FeatureStoreException - In case client is not connected to Hopsworks.; IOException - Generic IO exception.

getBatchQuery

public String getBatchQuery()
                     throws FeatureStoreException,
                            IOException,
                            ParseException

Get a query string of the batch query.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get batch query
        fv.getBatchQuery();

Specified by:: getBatchQuery in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: String query string of the batch query
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getBatchQuery
```
public String getBatchQuery(String startTime,
                            String endTime)
                     throws FeatureStoreException,
                            IOException,
                            ParseException
```
Get a query string of the batch query.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get batch query that will fetch data from jan 1, 2023 to Jan 31, 2023
        fv.getBatchQuery("20230101", "20130131");
 
 
```
Specified by:

getBatchQuery in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

Returns:

String query string of the batch query

Throws:

FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats;

IOException - Generic IO exception.

ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

getBatchData

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getBatchData()
                                                                    throws FeatureStoreException,
                                                                           IOException,
                                                                           ParseException

Get all data from the feature view as a batch from the offline feature store.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get batch data
        fv.getBatchData();

Specified by:: getBatchData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: Dataset<Row> Spark dataframe of batch data.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

getBatchData

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getBatchData(String startTime,
                                                                           String endTime)
                                                                    throws FeatureStoreException,
                                                                           IOException,
                                                                           ParseException

Get a batch of data from an event time interval from the offline feature store.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get batch query that will fetch data from jan 1, 2023 to Jan 31, 2023
        fv.getBatchData("20230101", "20130131");

Specified by:: getBatchData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.
Returns:: Dataset<Row> Spark dataframe of batch data.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

getBatchData

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> getBatchData(String startTime,
                                                                           String endTime,
                                                                           Map<String,String> readOptions)
                                                                    throws FeatureStoreException,
                                                                           IOException,
                                                                           ParseException

Get a batch of data from an event time interval from the offline feature store.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // get batch query that will fetch data from jan 1, 2023 to Jan 31, 2023
        fv.getBatchData("20230101", "20130131");

Specified by:: getBatchData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; readOptions - Additional read options as key/value pairs.
Returns:: Dataset<Row> Spark dataframe of batch data.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

addTag
```
public void addTag(String name,
                   Object value)
            throws FeatureStoreException,
                   IOException
```
Add name/value tag to the feature view. A tag consists of a name and value pair. Tag names are unique identifiers across the whole cluster. The value of a tag can be any valid json - primitives, arrays or json objects.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // attach a tag to a feature view
        JSONObject value = ...;
        fv.addTag("tag_schema", value);
 
 
```
Specified by:

addTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

Parameters:

name - Name of the tag

value - Value of the tag. The value of a tag can be any valid json - primitives, arrays or json objects

Throws:

FeatureStoreException - If Client is not connected to Hopsworks.

IOException - Generic IO exception.

getTags

public Map<String,Object> getTags()
                           throws FeatureStoreException,
                                  IOException

Get all tags of the feature view.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get tags
        fv.getTags();

Specified by:: getTags in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: Map<String, Object> a map of tag name and values. The value of a tag can be any valid json - primitives, arrays or json objects
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

getTag

public Object getTag(String name)
              throws FeatureStoreException,
                     IOException

Get a single tag value of the feature view.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get tag
        fv.getTag("tag_name");

Specified by:: getTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: name - name of the tag
Returns:: Object The value of a tag can be any valid json - primitives, arrays or json objects
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

deleteTag

public void deleteTag(String name)
               throws FeatureStoreException,
                      IOException

Delete a tag of the feature view.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // delete tag
        fv.deleteTag("tag_name");

Specified by:: deleteTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: name - Name of the tag to be deleted.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

createTrainingData
```
public Integer createTrainingData(String startTime,
                                  String endTime,
                                  String description,
                                  DataFormat dataFormat)
                           throws IOException,
                                  FeatureStoreException,
                                  ParseException
```
Create the metadata for a training dataset and save the corresponding training data into `location`. The training data can be retrieved by calling `feature_view.getTrainingData()`.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset
        String startTime = "20220101000000";
        String endTime = "20220606235959";
        String description = "demo training dataset":
        fv.createTrainingData(startTime, endTime, description, DataFormat.CSV);
 
 
```
Parameters:

startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.

dataFormat - The data format used to save the training dataset.

Returns:

Integer Training dataset version.

Throws:

FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats.

IOException - Generic IO exception.

ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

createTrainingData
```
public Integer createTrainingData(String startTime,
                                  String endTime,
                                  String description,
                                  DataFormat dataFormat,
                                  Boolean coalesce,
                                  StorageConnector storageConnector,
                                  String location,
                                  Long seed,
                                  StatisticsConfig statisticsConfig,
                                  Map<String,String> writeOptions,
                                  FilterLogic extraFilterLogic,
                                  Filter extraFilter)
                           throws IOException,
                                  FeatureStoreException,
                                  ParseException
```
Create the metadata for a training dataset and save the corresponding training data into `location`. The training data can be retrieved by calling `featureView.getTrainingData()`.
```
 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset
        String startTime = "20220101000000";
        String endTime = "20220606235959";
        String description = "demo training dataset":
        String location = "";
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        fv.createTrainingData(startTime, endTime, description, DataFormat.CSV, true, location, statisticsConfig);
 
 
```
Parameters:

startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.

description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.

dataFormat - The data format used to save the training dataset.

coalesce - If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split.

storageConnector - Storage connector defining the sink location for the training dataset. If `null` is provided and materializes training dataset on HopsFS.

location - Path to complement the sink storage connector with, e.g if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. If empty string is provided `""`, saving the training dataset at the root defined by the storage connector.

seed - Define a seed to create the random splits with, in order to guarantee reproducability,

statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.

writeOptions - Additional write options as key-value pairs.

extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.

extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.

Returns:

Integer Training dataset version.

Throws:

FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided `startTime`/`endTime` date formats.

IOException - Generic IO exception.

ParseException - In case it's unable to parse provided `startTime`/`endTime` strings to date types.

createTrainTestSplit

public Integer createTrainTestSplit(Float testSize,
                                    String trainStart,
                                    String trainEnd,
                                    String testStart,
                                    String testEnd,
                                    String description,
                                    DataFormat dataFormat)
                             throws IOException,
                                    FeatureStoreException,
                                    ParseException

Create the metadata for a training dataset and save the corresponding training data into `location`. The training data is split into train and test set at random or according to time ranges. The training data can be retrieved by calling `featureView.getTrainTestSplit` method.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String testStart = "20220701000000";
        String testEnd = "20220830235959";
        String description = "demo training dataset":
        fv.createTrainTestSplit(null, trainStart, trainEnd, testStart, testEnd, description, DataFormat.CSV);

        // or based on random split
        fv.createTrainTestSplit(30, null, null, null, null, description, DataFormat.CSV);

Parameters:: testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; dataFormat - The data format used to save the training dataset.
Returns:: Integer Training dataset version
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

createTrainTestSplit

public Integer createTrainTestSplit(Float testSize,
                                    String trainStart,
                                    String trainEnd,
                                    String testStart,
                                    String testEnd,
                                    String description,
                                    DataFormat dataFormat,
                                    Boolean coalesce,
                                    StorageConnector storageConnector,
                                    String location,
                                    Long seed,
                                    StatisticsConfig statisticsConfig,
                                    Map<String,String> writeOptions,
                                    FilterLogic extraFilterLogic,
                                    Filter extraFilter)
                             throws IOException,
                                    FeatureStoreException,
                                    ParseException

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String testStart = "20220701000000";
        String testEnd = "20220830235959";
        String description = "demo training dataset":
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        Map<String, String> writeOptions = new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // define extra filters
        Filter leftFtFilter = new Filter();
        leftFtFilter.setFeature(new Feature("left_ft_name"));
        leftFtFilter.setValue("400");
        leftFtFilter.setCondition(SqlFilterCondition.EQUALS);
        Filter rightFtFilter = new Filter();
        rightFtFilter.setFeature(new Feature("right_ft_name"));
        rightFtFilter.setValue("50");
        rightFtFilter.setCondition(SqlFilterCondition.EQUALS);
        FilterLogic extraFilterLogic = new FilterLogic(SqlFilterLogic.AND, leftFtFilter, rightFtFilter);
        Filter extraFilter = new Filter();
        extraFilter.setFeature(new Feature("ft_name"));
        extraFilter.setValue("100");
        extraFilter.setCondition(SqlFilterCondition.GREATER_THAN);

        // create training data
        fv.createTrainTestSplit(null, null, trainStart, trainEnd, testStart,
        testEnd,  description, DataFormat.CSV, coalesce, storageConnector, location, seed, statisticsConfig,
        writeOptions, extraFilterLogic, extraFilter);

        // or based on random split
        fv.createTrainTestSplit(20, 10, null, null,  null, null, description, DataFormat.CSV, coalesce,
        storageConnector, location, seed, statisticsConfig, writeOptions, extraFilterLogic, extraFilter);

Parameters:: testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; dataFormat - The data format used to save the training dataset.; coalesce - If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split.; storageConnector - Storage connector defining the sink location for the training dataset. If `null` is provided and materializes training dataset on HopsFS.; location - Path to complement the sink storage connector with, e.g if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. If empty string is provided `""`, saving the training dataset at the root defined by the storage connector.; seed - Define a seed to create the random splits with, in order to guarantee reproducability,; statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.; writeOptions - Additional write options as key-value pairs.; extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.; extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.
Returns:: Integer Training dataset version.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

createTrainValidationTestSplit

public Integer createTrainValidationTestSplit(Float validationSize,
                                              Float testSize,
                                              String trainStart,
                                              String trainEnd,
                                              String validationStart,
                                              String validationEnd,
                                              String testStart,
                                              String testEnd,
                                              String description,
                                              DataFormat dataFormat)
                                       throws IOException,
                                              FeatureStoreException,
                                              ParseException

Create the metadata for a training dataset and save the corresponding training data into `location`. The training data is split into train, validation, and test set at random or according to time range. The training data can be retrieved by calling `featureView.getTrainValidationTestSplit`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String validationStart = "20220701000000";
        String validationEnd = "20220830235959";
        String testStart = "20220901000000";
        String testEnd = "20220931235959";
        String description = "demo training dataset":
        fv.createTrainTestSplit(null, null, trainStart, trainEnd, validationStart, validationEnd, testStart,
        testEnd, description, DataFormat.CSV);

        // or based on random split
        fv.createTrainTestSplit(20, 10, null, null, null, null, null, null, description, DataFormat.CSV);

Parameters:: validationSize - Size of validation set.; testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; dataFormat - The data format used to save the training dataset.
Returns:: Integer Training dataset version.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

createTrainValidationTestSplit

public Integer createTrainValidationTestSplit(Float validationSize,
                                              Float testSize,
                                              String trainStart,
                                              String trainEnd,
                                              String validationStart,
                                              String validationEnd,
                                              String testStart,
                                              String testEnd,
                                              String description,
                                              DataFormat dataFormat,
                                              Boolean coalesce,
                                              StorageConnector storageConnector,
                                              String location,
                                              Long seed,
                                              StatisticsConfig statisticsConfig,
                                              Map<String,String> writeOptions,
                                              FilterLogic extraFilterLogic,
                                              Filter extraFilter)
                                       throws IOException,
                                              FeatureStoreException,
                                              ParseException

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String validationStart = "20220701000000";
        String validationEnd = "20220830235959";
        String testStart = "20220901000000";
        String testEnd = "20220931235959";
        String description = "demo training dataset":
        StorageConnector.S3Connector storageConnector = fs.getS3Connector("s3Connector");
        String location = "";
        Long seed = 1234L;
        Boolean coalesce = true;
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        Map<String, String> writeOptions = new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // define extra filters
        Filter leftFtFilter = new Filter();
        leftFtFilter.setFeature(new Feature("left_ft_name"));
        leftFtFilter.setValue("400");
        leftFtFilter.setCondition(SqlFilterCondition.EQUALS);
        Filter rightFtFilter = new Filter();
        rightFtFilter.setFeature(new Feature("right_ft_name"));
        rightFtFilter.setValue("50");
        rightFtFilter.setCondition(SqlFilterCondition.EQUALS);
        FilterLogic extraFilterLogic = new FilterLogic(SqlFilterLogic.AND, leftFtFilter, rightFtFilter);
        Filter extraFilter = new Filter();
        extraFilter.setFeature(new Feature("ft_name"));
        extraFilter.setValue("100");
        extraFilter.setCondition(SqlFilterCondition.GREATER_THAN);
        // create training data
        fv.createTrainTestSplit(null, null, trainStart, trainEnd, validationStart, validationEnd, testStart,
        testEnd,  description, DataFormat.CSV, coalesce, storageConnector, location, seed, statisticsConfig,
        writeOptions, extraFilterLogic, extraFilter);

        // or based on random split
        fv.createTrainTestSplit(20, 10, null, null, null, null, null, null, description, DataFormat.CSV, coalesce,
        storageConnector, location, seed, statisticsConfig, writeOptions, extraFilterLogic, extraFilter);

Parameters:: validationSize - Size of validation set.; testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; dataFormat - The data format used to save the training dataset.; coalesce - If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split.; storageConnector - Storage connector defining the sink location for the training dataset. If `null` is provided and materializes training dataset on HopsFS.; location - Path to complement the sink storage connector with, e.g if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. If empty string is provided `""`, saving the training dataset at the root defined by the storage connector.; seed - Define a seed to create the random splits with, in order to guarantee reproducability,; statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.; writeOptions - Additional write options as key-value pairs.; extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.; extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.
Returns:: Integer Training dataset version.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

recreateTrainingDataset

public void recreateTrainingDataset(Integer version,
                                    Map<String,String> writeOptions)
                             throws FeatureStoreException,
                                    IOException

Recreate a training dataset.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // define write options
        Map<String, String> writeOptions = new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        //recreate training data
        fv.recreateTrainingDataset(1, writeOptions);

Parameters:: version - Training dataset version.; writeOptions - Additional read options as key-value pairs.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

getTrainingData

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainingData(Integer version)
                                                                             throws IOException,
                                                                                    FeatureStoreException,
                                                                                    ParseException

Get training data created by `featureView.createTrainingData` or `featureView.trainingData`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get training data
        fv.getTrainingData(1);

Parameters:: version - Training dataset version.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getTrainingData

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainingData(Integer version,
                                                                                    Map<String,String> readOptions)
                                                                             throws IOException,
                                                                                    FeatureStoreException,
                                                                                    ParseException

Get training data created by `featureView.createTrainingData` or `featureView.trainingData`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // define write options
        Map<String, String> readOptions = new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // get training data
        fv.getTrainingData(1, readOptions);

Specified by:: getTrainingData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.; readOptions - Additional read options as key/value pairs.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getTrainTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainTestSplit(Integer version)
                                                                               throws IOException,
                                                                                      FeatureStoreException,
                                                                                      ParseException

Get training data created by `featureView.createTrainTestSplit` or `featureView.trainTestSplit`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get train test split dataframe of features and labels
        fv.getTrainTestSplit(1);

Parameters:: version - Training dataset version.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getTrainTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainTestSplit(Integer version,
                                                                                      Map<String,String> readOptions)
                                                                               throws IOException,
                                                                                      FeatureStoreException,
                                                                                      ParseException

Get training data created by `featureView.createTrainTestSplit` or `featureView.trainTestSplit`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // define additional readOptions
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // get train test split dataframe of features and labels
        fv.getTrainTestSplit(1, readOptions);

Specified by:: getTrainTestSplit in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.; readOptions - Additional read options as key/value pairs.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getTrainValidationTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainValidationTestSplit(Integer version)
                                                                                         throws IOException,
                                                                                                FeatureStoreException,
                                                                                                ParseException

Get training data created by `featureView.createTrainValidationTestSplit` or featureView.trainValidationTestSplit`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get train, validation, test split dataframe of features and labels
        fv.getTrainValidationTestSplit(1);

Parameters:: version - Training dataset version.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

getTrainValidationTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> getTrainValidationTestSplit(Integer version,
                                                                                                Map<String,String> readOptions)
                                                                                         throws IOException,
                                                                                                FeatureStoreException,
                                                                                                ParseException

Get training data created by `featureView.createTrainValidationTestSplit` or featureView.trainValidationTestSplit`.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // define additional readOptions
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // get train, validation, test split dataframe of features and labels
        fv.getTrainValidationTestSplit(1, readOptions);

Specified by:: getTrainValidationTestSplit in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.; readOptions - Additional read options as key/value pairs.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse strings dates to date types.

trainingData

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainingData(String startTime,
                                                                                 String endTime,
                                                                                 String description)
                                                                          throws IOException,
                                                                                 FeatureStoreException,
                                                                                 ParseException

Create the metadata for a training dataset and get the corresponding training data from the offline feature store. This returns the training data in memory and does not materialise data in storage. The training data can be recreated by calling `featureView.getTrainingData` with the metadata created.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String startTime = "20220101000000";
        String endTime = "20220630235959";
        String description = "demo training dataset":
        fv.createTrainTestSplit(startTime, endTime, description);

Parameters:: startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

trainingData

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainingData(String startTime,
                                                                                 String endTime,
                                                                                 String description,
                                                                                 Long seed,
                                                                                 StatisticsConfig statisticsConfig,
                                                                                 Map<String,String> readOptions,
                                                                                 FilterLogic extraFilterLogic,
                                                                                 Filter extraFilter)
                                                                          throws IOException,
                                                                                 FeatureStoreException,
                                                                                 ParseException

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String startTime = "20220101000000";
        String endTime = "20220630235959";
        String description = "demo training dataset":
        Long seed = 1234L;
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // define extra filters
        Filter leftFtFilter = new Filter();
        leftFtFilter.setFeature(new Feature("left_ft_name"));
        leftFtFilter.setValue("400");
        leftFtFilter.setCondition(SqlFilterCondition.EQUALS);
        Filter rightFtFilter = new Filter();
        rightFtFilter.setFeature(new Feature("right_ft_name"));
        rightFtFilter.setValue("50");
        rightFtFilter.setCondition(SqlFilterCondition.EQUALS);
        FilterLogic extraFilterLogic = new FilterLogic(SqlFilterLogic.AND, leftFtFilter, rightFtFilter);
        Filter extraFilter = new Filter();
        extraFilter.setFeature(new Feature("ft_name"));
        extraFilter.setValue("100");
        extraFilter.setCondition(SqlFilterCondition.GREATER_THAN);
        // create training data
        fv.trainValidationTestSplit(startTime, endTime,  description, seed, statisticsConfig, readOptions,
        extraFilterLogic, extraFilter);

Parameters:: startTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; endTime - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; seed - Define a seed to create the random splits with, in order to guarantee reproducability.; statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.; readOptions - Additional read options as key/value pairs.; extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.; extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.
Returns:: List<Dataset<Row>> List of dataframe of features and labels.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

trainTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainTestSplit(Float testSize,
                                                                                   String trainStart,
                                                                                   String trainEnd,
                                                                                   String testStart,
                                                                                   String testEnd,
                                                                                   String description)
                                                                            throws IOException,
                                                                                   FeatureStoreException,
                                                                                   ParseException

Create the metadata for a training dataset and get the corresponding training data from the offline feature store. This returns the training data in memory and does not materialise data in storage. The training data is split into train and test set at random or according to time ranges. The training data can be recreated by calling `featureView.getTrainTestSplit` with the metadata created.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String testStart = "20220701000000";
        String testEnd = "20220830235959";
        String description = "demo training dataset":
        // create training data
        fv.trainValidationTestSplit(null, trainStart, trainEnd, testStart, trainEnd, description);
        // or random split
        fv.trainValidationTestSplit(30, null, null, null, null, description);

Parameters:: testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.
Returns:: List<Dataset<Row>> List of Spark Dataframes containing training dataset splits.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

trainTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainTestSplit(Float testSize,
                                                                                   String trainStart,
                                                                                   String trainEnd,
                                                                                   String testStart,
                                                                                   String testEnd,
                                                                                   String description,
                                                                                   Long seed,
                                                                                   StatisticsConfig statisticsConfig,
                                                                                   Map<String,String> readOptions,
                                                                                   FilterLogic extraFilterLogic,
                                                                                   Filter extraFilter)
                                                                            throws IOException,
                                                                                   FeatureStoreException,
                                                                                   ParseException

Create the metadata for a training dataset and get the corresponding training data from the offline feature store. This returns the training data in memory and does not materialise data in storage. The training data is split into train and test set at random or according to time ranges. The training data can be recreated by calling `feature_view.getTrainTestSplit` with the metadata created.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String testStart = "20220701000000";
        String testEnd = "20220830235959";
        String description = "demo training dataset":
        Long seed = 1234L;
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // define extra filters
        Filter leftFtFilter = new Filter();
        leftFtFilter.setFeature(new Feature("left_ft_name"));
        leftFtFilter.setValue("400");
        leftFtFilter.setCondition(SqlFilterCondition.EQUALS);
        Filter rightFtFilter = new Filter();
        rightFtFilter.setFeature(new Feature("right_ft_name"));
        rightFtFilter.setValue("50");
        rightFtFilter.setCondition(SqlFilterCondition.EQUALS);
        FilterLogic extraFilterLogic = new FilterLogic(SqlFilterLogic.AND, leftFtFilter, rightFtFilter);
        Filter extraFilter = new Filter();
        extraFilter.setFeature(new Feature("ft_name"));
        extraFilter.setValue("100");
        extraFilter.setCondition(SqlFilterCondition.GREATER_THAN);
        // create training data
        fv.trainTestSplit(null, strainStart, trainEnd, testStart, trainEnd, description, seed, statisticsConfig,
        readOptions, extraFilterLogic, extraFilter);

        // or random split
        fv.trainTestSplit(30, null, null, null, null, description, seed, statisticsConfig, readOptions,
        extraFilterLogic, extraFilter);

Parameters:: testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; seed - Define a seed to create the random splits with, in order to guarantee reproducability.; statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.; readOptions - Additional read options as key/value pairs.; extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.; extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.
Returns:: List<Dataset<Row>> List of Spark Dataframes containing training dataset splits.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

trainValidationTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainValidationTestSplit(Float validationSize,
                                                                                             Float testSize,
                                                                                             String trainStart,
                                                                                             String trainEnd,
                                                                                             String validationStart,
                                                                                             String validationEnd,
                                                                                             String testStart,
                                                                                             String testEnd,
                                                                                             String description)
                                                                                      throws IOException,
                                                                                             FeatureStoreException,
                                                                                             ParseException

Create the metadata for a training dataset and get the corresponding training data from the offline feature store. This returns the training data in memory and does not materialise data in storage. The training data is split into train, validation, and test set at random or according to time ranges. The training data can be recreated by calling `feature_view.getTrainValidationTestSplit` with the metadata created.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String validationStart = "20220701000000";
        String validationEnd = "20220830235959";
        String testStart = "20220901000000";
        String testEnd = "20220931235959";
        String description = "demo training dataset":
        fv.trainValidationTestSplit(null, null, trainStart, trainEnd, validationStart, validationEnd, testStart,
        testEnd,  description);

        // or based on random split
        fv.trainValidationTestSplit(20, 10, null, null, null, null, null, null, description);

Parameters:: validationSize - Size of validation set.; testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.
Returns:: List<Dataset<Row>> List of Spark Dataframes containing training dataset splits.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

trainValidationTestSplit

public List<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>> trainValidationTestSplit(Float validationSize,
                                                                                             Float testSize,
                                                                                             String trainStart,
                                                                                             String trainEnd,
                                                                                             String validationStart,
                                                                                             String validationEnd,
                                                                                             String testStart,
                                                                                             String testEnd,
                                                                                             String description,
                                                                                             Long seed,
                                                                                             StatisticsConfig statisticsConfig,
                                                                                             Map<String,String> readOptions,
                                                                                             FilterLogic extraFilterLogic,
                                                                                             Filter extraFilter)
                                                                                      throws IOException,
                                                                                             FeatureStoreException,
                                                                                             ParseException

Create the metadata for a training dataset and get the corresponding training data from the offline feature store. This returns the training data in memory and does not materialise data in storage. The training data is split into train, validation, and test set at random or according to time ranges. The training data can be recreated by calling `featureView.getTrainValidationTestSplit` with the metadata created.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // create training dataset based on time split
        String trainStart = "20220101000000";
        String trainEnd = "20220630235959";
        String validationStart = "20220701000000";
        String validationEnd = "20220830235959";
        String testStart = "20220901000000";
        String testEnd = "20220931235959";
        String description = "demo training dataset":
        Long seed = 1234L;
        StatisticsConfig statisticsConfig = new StatisticsConfig(true, true, true, true)
        Map<String, String> readOptions =  new HashMap<String, String>() {{
                           put("header", "true");
                           put("delimiter", ",")}
                           };
        // define extra filters
        Filter leftFtFilter = new Filter();
        leftFtFilter.setFeature(new Feature("left_ft_name"));
        leftFtFilter.setValue("400");
        leftFtFilter.setCondition(SqlFilterCondition.EQUALS);
        Filter rightFtFilter = new Filter();
        rightFtFilter.setFeature(new Feature("right_ft_name"));
        rightFtFilter.setValue("50");
        rightFtFilter.setCondition(SqlFilterCondition.EQUALS);
        FilterLogic extraFilterLogic = new FilterLogic(SqlFilterLogic.AND, leftFtFilter, rightFtFilter);
        Filter extraFilter = new Filter();
        extraFilter.setFeature(new Feature("ft_name"));
        extraFilter.setValue("100");
        extraFilter.setCondition(SqlFilterCondition.GREATER_THAN);
        // create training data
        fv.trainValidationTestSplit(null, null, trainStart, trainEnd, validationStart, validationEnd, testStart,
        testEnd,  description, seed, statisticsConfig,
        readOptions, extraFilterLogic, extraFilter);

        // or based on random split
        fv.trainValidationTestSplit(20, 10, null, null, null, null, null, null, description, statisticsConfig,
        seed, readOptions, extraFilterLogic, extraFilter);

Parameters:: validationSize - Size of validation set.; testSize - Size of test set.; trainStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; trainEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; validationEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testStart - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; testEnd - Datetime string. The String should be formatted in one of the following formats `yyyyMMdd`, `yyyyMMddHH`, `yyyyMMddHHmm`, or `yyyyMMddHHmmss`.; description - A string describing the contents of the training dataset to improve discoverability for Data Scientists.; seed - Define a seed to create the random splits with, in order to guarantee reproducability.; statisticsConfig - A configuration object, to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation, `"histograms"` to compute feature value frequencies and `"exact_uniqueness"` to compute uniqueness, distinctness and entropy. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statisticsConfig=null`.; readOptions - Additional read options as key/value pairs.; extraFilterLogic - Additional filters (set of Filter objects) to be attached to the training dataset. The filters will be also applied in `getBatchData`.; extraFilter - Additional filter to be attached to the training dataset. The filter will be also applied in `getBatchData`.
Returns:: List<Dataset<Row>> List of Spark Dataframes containing training dataset splits.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks and/or unable to identify format of the provided date strings to date formats.; IOException - Generic IO exception.; ParseException - In case it's unable to parse provided date strings to date types.

purgeTrainingData

public void purgeTrainingData(Integer version)
                       throws FeatureStoreException,
                              IOException

Delete a training dataset (data only).

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // Delete a training dataset version 1
        fv.purgeAllTrainingData(1);

Specified by:: purgeTrainingData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Version of the training dataset to be removed.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

purgeAllTrainingData

public void purgeAllTrainingData()
                          throws FeatureStoreException,
                                 IOException

Delete all training datasets in this feature view (data only).

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // Delete a training dataset.
        fv.purgeAllTrainingData(1);

Specified by:: purgeAllTrainingData in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

deleteTrainingDataset

public void deleteTrainingDataset(Integer version)
                           throws FeatureStoreException,
                                  IOException

Delete a training dataset. This will delete both metadata and training data.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // Delete a training dataset version 1.
        fv.deleteTrainingDataset(1);

Specified by:: deleteTrainingDataset in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Version of the training dataset to be removed.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

deleteAllTrainingDatasets

public void deleteAllTrainingDatasets()
                               throws FeatureStoreException,
                                      IOException

Delete all training datasets. This will delete both metadata and training data.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // Delete all training datasets in this feature view.
        fv.deleteAllTrainingDatasets();

Specified by:: deleteAllTrainingDatasets in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

addTrainingDatasetTag

public void addTrainingDatasetTag(Integer version,
                                  String name,
                                  Object value)
                           throws FeatureStoreException,
                                  IOException

Add name/value tag to the training dataset.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // add tag to datasets version 1 in this feature view.
        JSONObject json = ...;
        fv.addTrainingDatasetTag(1, "tag_name", json);

Specified by:: addTrainingDatasetTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.; name - Name of the tag.; value - Value of the tag. The value of a tag can be any valid json - primitives, arrays or json objects.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

getTrainingDatasetTags

public Map<String,Object> getTrainingDatasetTags(Integer version)
                                          throws FeatureStoreException,
                                                 IOException

Get all tags of the training dataset.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get tags of training dataset version 1 in this feature view.
        fv.getTrainingDatasetTags(1);

Specified by:: getTrainingDatasetTags in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.
Returns:: Map<String, Object> A map of tag name and values. The value of a tag can be any valid json - primitives, arrays or json objects
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

getTrainingDatasetTag

public Object getTrainingDatasetTag(Integer version,
                                    String name)
                             throws FeatureStoreException,
                                    IOException

Get a single tag value of the training dataset.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get tag with name `"demo_name"` of training dataset version 1 in this feature view.
        fv.getTrainingDatasetTags(1, "demo_name");

Specified by:: getTrainingDatasetTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Training dataset version.; name - Name of the tag.
Returns:: Object The value of a tag can be any valid json - primitives, arrays or json objects.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

deleteTrainingDatasetTag

public void deleteTrainingDatasetTag(Integer version,
                                     String name)
                              throws FeatureStoreException,
                                     IOException

Delete a tag of the training dataset.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // delete tag with name `"demo_name"` of training dataset version 1 in this feature view.
        fv.deleteTrainingDatasetTag(1, "demo_name");

Specified by:: deleteTrainingDatasetTag in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Parameters:: version - Tag version.; name - Name of the tag to be deleted.
Throws:: FeatureStoreException - If Client is not connected to Hopsworks.; IOException - Generic IO exception.

getPrimaryKeys

public HashSet<String> getPrimaryKeys()
                               throws SQLException,
                                      IOException,
                                      FeatureStoreException,
                                      ClassNotFoundException

Get set of primary key names that is used as keys in input dict object for `getServingVector` method.

 
        // get feature store handle
        FeatureStore fs = HopsworksConnection.builder().build().getFeatureStore();
        // get feature view handle
        FeatureView fv = fs.getFeatureView("fv_name", 1);
        // get set of primary key names
        fv.getPrimaryKeys();

Overrides:: getPrimaryKeys in class FeatureViewBase<FeatureView,FeatureStore,Query,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
Returns:: HashSet<String> Set of serving keys
Throws:: FeatureStoreException - In case client is not connected to Hopsworks.; IOException - Generic IO exception.; SQLException - In case there is online storage (RonDB) access error or other errors.; ClassNotFoundException - In case class `com.mysql.jdbc.Driver` can not be found.

Class FeatureView

Nested Class Summary

Field Summary

Fields inherited from class com.logicalclocks.hsfs.FeatureViewBase

Constructor Summary

Method Summary

Methods inherited from class com.logicalclocks.hsfs.FeatureViewBase

Methods inherited from class java.lang.Object

Constructor Detail

FeatureView

Method Detail

delete

clean

update

getBatchQuery

getBatchQuery

getBatchData

getBatchData

getBatchData

addTag

getTags

getTag

deleteTag

createTrainingData

createTrainingData

createTrainTestSplit

createTrainTestSplit

createTrainValidationTestSplit

createTrainValidationTestSplit

recreateTrainingDataset

getTrainingData

getTrainingData

getTrainTestSplit

getTrainTestSplit

getTrainValidationTestSplit

getTrainValidationTestSplit

trainingData

trainingData

trainTestSplit

trainTestSplit

trainValidationTestSplit

trainValidationTestSplit

purgeTrainingData

purgeAllTrainingData

deleteTrainingDataset

deleteAllTrainingDatasets

addTrainingDatasetTag

getTrainingDatasetTags

getTrainingDatasetTag

deleteTrainingDatasetTag

getPrimaryKeys