StorageConnectorUtils (hsfs-parent 3.9.0-RC24 API)

java.lang.Object
- com.logicalclocks.hsfs.spark.util.StorageConnectorUtils

public class StorageConnectorUtils
extends Object

Constructor Summary

Constructors
Constructor and Description

StorageConnectorUtils()

Constructors
Constructor and Description
`StorageConnectorUtils()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.AdlsConnector connector, String dataFormat, Map<String,String> options, String path)` Reads path into a spark dataframe using the AdlsConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.BigqueryConnector connector, String query, Map<String,String> options, String path)` Reads a query or a path into a spark dataframe using the sBigqueryConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.GcsConnector connector, String dataFormat, Map<String,String> options, String path)` Reads a path into a spark dataframe using the GcsConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.HopsFsConnector connector, String dataFormat, Map<String,String> options, String path)` Reads path into a spark dataframe using the HopsFsConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.JdbcConnector connector, String query)` Reads query into a spark dataframe using the JdbcConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.RedshiftConnector connector, String query)` Reads query into a spark dataframe using the RedshiftConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.S3Connector connector, String dataFormat, Map<String,String> options, String path)` Reads path into a spark dataframe using the S3Connector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector.SnowflakeConnector connector, String query)` Reads query into a spark dataframe using the SnowflakeConnector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`read(StorageConnector connector, String query, String dataFormat, Map<String,String> options, String path)` Reads a query or a path into a spark dataframe using the storage connector.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`readStream(StorageConnector.KafkaConnector connector, String topic, boolean topicPattern, String messageFormat, String schema, Map<String,String> options, boolean includeMetadata)` Reads stream into a spark dataframe using the kafka storage connector.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- StorageConnectorUtils
```
public StorageConnectorUtils()
```

Method Detail

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.HopsFsConnector connector,
                                                                   String dataFormat,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads path into a spark dataframe using the HopsFsConnector.

Parameters:: connector - HopsFsConnector object.; dataFormat - specify the file format to be read, e.g. `csv`, `parquet`.; options - Any additional key/value options to be passed to the connector.; path - Path to be read from within the storage connector. .
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.S3Connector connector,
                                                                   String dataFormat,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads path into a spark dataframe using the S3Connector.

Parameters:: connector - S3Connector object.; dataFormat - specify the file format to be read, e.g. `csv`, `parquet`.; options - Any additional key/value options to be passed to the connector.; path - Path to be read from within the bucket.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.RedshiftConnector connector,
                                                                   String query)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads query into a spark dataframe using the RedshiftConnector.

Parameters:: connector - Storage connector object.; query - SQL query string.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.AdlsConnector connector,
                                                                   String dataFormat,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads path into a spark dataframe using the AdlsConnector.

Parameters:: connector - AdlsConnector object.; dataFormat - specify the file format to be read, e.g. `csv`, `parquet`.; options - Any additional key/value options to be passed to the connector.; path - Path to be read from within the storage connector.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.SnowflakeConnector connector,
                                                                   String query)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads query into a spark dataframe using the SnowflakeConnector.

Parameters:: connector - SnowflakeConnector object.; query - SQL query string.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.JdbcConnector connector,
                                                                   String query)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads query into a spark dataframe using the JdbcConnector.

Parameters:: connector - JdbcConnector object.; query - SQL query string.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.GcsConnector connector,
                                                                   String dataFormat,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads a path into a spark dataframe using the GcsConnector.

Parameters:: connector - GcsConnector object.; dataFormat - Specify the file format to be read, e.g. `csv`, `parquet`.; options - Any additional key/value options to be passed to the connector.; path - Path to be read from within the storage connector.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector.BigqueryConnector connector,
                                                                   String query,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads a query or a path into a spark dataframe using the sBigqueryConnector.

Parameters:: connector - BigqueryConnector object.; query - SQL query string.; options - Any additional key/value options to be passed to the connector.; path - Path to the table be read from within the storage connector.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

read

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> read(StorageConnector connector,
                                                                   String query,
                                                                   String dataFormat,
                                                                   Map<String,String> options,
                                                                   String path)
                                                            throws FeatureStoreException,
                                                                   IOException

Reads a query or a path into a spark dataframe using the storage connector.

Parameters:: connector - Storage connector object.; query - SQL query string.; dataFormat - When reading from object stores such as S3, HopsFS and ADLS, specify the file format to be read, e.g. `csv`, `parquet`.; options - Any additional key/value options to be passed to the connector.; path - Path to be read from within the bucket of the storage connector. Not relevant for JDBC or database based connectors such as Snowflake, JDBC or Redshift.
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

readStream

public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> readStream(StorageConnector.KafkaConnector connector,
                                                                         String topic,
                                                                         boolean topicPattern,
                                                                         String messageFormat,
                                                                         String schema,
                                                                         Map<String,String> options,
                                                                         boolean includeMetadata)
                                                                  throws FeatureStoreException,
                                                                         IOException

Reads stream into a spark dataframe using the kafka storage connector.

Parameters:: connector - Storage connector object.; topic - name of the topic.; topicPattern - if provided will subscribe topics that match provided pattern.; messageFormat - format of the message. "avro" or "json".; schema - schema of the message; options - Any additional key/value options to be passed to the connector.; includeMetadata - whether to include metadata of the topic in the dataframe, such as "key", "topic", "partition", offset", "timestamp", "timestampType", "value.*".
Returns:: Spark dataframe.
Throws:: FeatureStoreException - If unable to retrieve StorageConnector from the feature store.; IOException - Generic IO exception.

Class StorageConnectorUtils

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

StorageConnectorUtils

Method Detail

read

read

read

read

read

read

read

read

read

readStream