hsfs.connection.Connection( host=None, port=443, project=None, engine=None, region_name="default", secrets_store="parameterstore", hostname_verification=True, trust_store_path=None, cert_folder="/tmp", api_key_file=None, api_key_value=None, )
A feature store connection object.
The connection is project specific, so you can access the project's own feature store but also any feature store which has been shared with the project you connect to.
This class provides convenience classmethods accessible from the
hsfs provides a factory method, accessible from the top level
module, so you don't have to import the
Connection class manually:
import hsfs conn = hsfs.connection()
Save API Key as File
To get started quickly, without saving the Hopsworks API in a secret storage, you can simply create a file with the previously created Hopsworks API Key and place it on the environment from which you wish to connect to the Hopsworks Feature Store.
You can then connect by simply passing the path to the key file when instantiating a connection:
import hsfs conn = hsfs.connection( 'my_instance', # DNS of your Feature Store instance 443, # Port to reach your Hopsworks instance, defaults to 443 'my_project', # Name of your Hopsworks Feature Store project api_key_file='featurestore.key', # The file containing the API key generated above hostname_verification=True) # Disable for self-signed certificates ) fs = conn.get_feature_store() # Get the project's default feature store
Clients in external clusters need to connect to the Hopsworks Feature Store using an API key. The API key is generated inside the Hopsworks platform, and requires at least the "project" and "featurestore" scopes to be able to access a feature store. For more information, see the integration guides.
Optional[str]: The hostname of the Hopsworks instance, defaults to
int: The port on which the Hopsworks instance can be reached, defaults to
Optional[str]: The name of the project to connect to. When running on Hopsworks, this defaults to the project from where the client is run from. Defaults to
Optional[str]: Which engine to use,
"training". Defaults to
None, which initializes the engine to Spark if the environment provides Spark, for example on Hopsworks and Databricks, or falls back on Hive in Python if Spark is not available, e.g. on local Python environments or AWS SageMaker. This option allows you to override this behaviour.
"training"engine is useful when only feature store metadata is needed, for example training dataset location and label information when Hopsworks training experiment is conducted.
str: The name of the AWS region in which the required secrets are stored, defaults to
str: The secrets storage to be used, either
"local", defaults to
bool: Whether or not to verify Hopsworks’ certificate, defaults to
Optional[str]: Path on the file system containing the Hopsworks certificates, defaults to
str: The directory to store retrieved HopsFS certificates, defaults to
"/tmp". Only required when running without a Spark environment.
Optional[str]: Path to a file containing the API Key, if provided,
secrets_storewill be ignored, defaults to
Optional[str]: API Key as string, if provided,
secrets_storewill be ignored
, however, this should be used with care, especially if the used notebook or job script is accessible by multiple parties. Defaults toNone`.
Connection. Feature Store connection handle to perform operations on a
Close a connection gracefully.
This will clean up any materialized certificates on the local file system of external environments such as AWS SageMaker.
Usage is recommended but optional.
Instantiate the connection.
Connection object implicitly calls this method for you to
instantiate the connection. However, it is possible to close the connection
gracefully with the
close() method, in order to clean up materialized
certificates. This might be desired when working on external environments such
as AWS SageMaker. Subsequently you can call
connect() again to reopen the
import hsfs conn = hsfs.connection() conn.close() conn.connect()
Connection.connection( host=None, port=443, project=None, engine=None, region_name="default", secrets_store="parameterstore", hostname_verification=True, trust_store_path=None, cert_folder="/tmp", api_key_file=None, api_key_value=None, )
Connection factory method, accessible through
Get a reference to a feature store to perform operations on.
Defaulting to the project name of default feature store. To get a Shared feature stores, the project name of the feature store is required.
str: The name of the feature store, defaults to
FeatureStore. A feature store handle object to perform operations on.
Connection.setup_databricks( host=None, port=443, project=None, engine=None, region_name="default", secrets_store="parameterstore", hostname_verification=True, trust_store_path=None, cert_folder="/tmp", api_key_file=None, api_key_value=None, )
Set up the HopsFS and Hive connector on a Databricks cluster.
This method will setup the HopsFS and Hive connectors to connect from a
Databricks cluster to a Hopsworks Feature Store instance. It returns a
Connection object and will print instructions on how to finalize the setup
of the Databricks cluster.
See also the Databricks integration guide.