Connection#
Connection#
hsfs.connection.Connection(
host=None,
port=443,
project=None,
engine=None,
region_name="default",
secrets_store="parameterstore",
hostname_verification=True,
trust_store_path=None,
cert_folder="/tmp",
api_key_file=None,
api_key_value=None,
)
A feature store connection object.
The connection is project specific, so you can access the project's own feature store but also any feature store which has been shared with the project you connect to.
This class provides convenience classmethods accessible from the hsfs
-module:
Connection factory
For convenience, hsfs
provides a factory method, accessible from the top level module, so you don't have to import the Connection
class manually:
import hsfs
conn = hsfs.connection()
Save API Key as File
To get started quickly, without saving the Hopsworks API in a secret storage, you can simply create a file with the previously created Hopsworks API Key and place it on the environment from which you wish to connect to the Hopsworks Feature Store.
You can then connect by simply passing the path to the key file when instantiating a connection:
import hsfs
conn = hsfs.connection(
'my_instance', # Hostname of your Feature Store instance
443, # Port to reach your Hopsworks instance, defaults to 443
'my_project', # Name of your Hopsworks Feature Store project
api_key_file='featurestore.key', # The file containing the API key generated above
hostname_verification=True) # Disable for self-signed certificates
)
fs = conn.get_feature_store() # Get the project's default feature store
# or
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()
Clients in external clusters need to connect to the Hopsworks Feature Store using an API key. The API key is generated inside the Hopsworks platform, and requires at least the "project" and "featurestore" scopes to be able to access a feature store. For more information, see the integration guides.
Arguments
- host
str | None
: The hostname of the Hopsworks instance in the form of[UUID].cloud.hopsworks.ai
, defaults toNone
. Do not use the url includinghttps://
when connecting programatically. - port
int
: The port on which the Hopsworks instance can be reached, defaults to443
. - project
str | None
: The name of the project to connect to. When running on Hopsworks, this defaults to the project from where the client is run from. Defaults toNone
. - engine
str | None
: Specifies the engine to use. Possible options are "spark", "python", "training", "spark-no-metastore", or "spark-delta". The default value is None, which automatically selects the engine based on the environment: "spark": Used if Spark is available, such as in Hopsworks or Databricks environments. "python": Used in local Python environments or AWS SageMaker when Spark is not available. "training": Used when only feature store metadata is needed, such as for obtaining training dataset locations and label information during Hopsworks training experiments. "spark-no-metastore": Functions like "spark" but does not rely on the Hive metastore. "spark-delta": Minimizes dependencies further by avoiding both Hive metastore and HopsFS. - region_name
str
: The name of the AWS region in which the required secrets are stored, defaults to"default"
. - secrets_store
str
: The secrets storage to be used, either"secretsmanager"
,"parameterstore"
or"local"
, defaults to"parameterstore"
. - hostname_verification
bool
: Whether or not to verify Hopsworks’ certificate, defaults toTrue
. - trust_store_path
str | None
: Path on the file system containing the Hopsworks certificates, defaults toNone
. - cert_folder
str
: The directory to store retrieved HopsFS certificates, defaults to"/tmp"
. Only required when running without a Spark environment. - api_key_file
str | None
: Path to a file containing the API Key, if provided,secrets_store
will be ignored, defaults toNone
. - api_key_value
str | None
: API Key as string, if provided,secrets_store
will be ignored, however, this should be used with care, especially if the used notebook or job script is accessible by multiple parties. Defaults to
None`.
Returns
Connection
. Feature Store connection handle to perform operations on a Hopsworks project.
Properties#
api_key_file#
api_key_value#
cert_folder#
host#
hostname_verification#
port#
project#
region_name#
secrets_store#
trust_store_path#
Methods#
close#
Connection.close()
Close a connection gracefully.
This will clean up any materialized certificates on the local file system of external environments such as AWS SageMaker.
Usage is optional.
Example
import hsfs
conn = hsfs.connection()
conn.close()
connect#
Connection.connect()
Instantiate the connection.
Creating a Connection
object implicitly calls this method for you to instantiate the connection. However, it is possible to close the connection gracefully with the close()
method, in order to clean up materialized certificates. This might be desired when working on external environments such as AWS SageMaker. Subsequently you can call connect()
again to reopen the connection.
Example
import hsfs
conn = hsfs.connection()
conn.close()
conn.connect()
connection#
Connection.connection(
host=None,
port=443,
project=None,
engine=None,
region_name="default",
secrets_store="parameterstore",
hostname_verification=True,
trust_store_path=None,
cert_folder="/tmp",
api_key_file=None,
api_key_value=None,
)
Connection factory method, accessible through hsfs.connection()
.
get_feature_store#
Connection.get_feature_store(name=None)
Get a reference to a feature store to perform operations on.
Defaulting to the project name of default feature store. To get a Shared feature stores, the project name of the feature store is required.
How to get feature store instance
import hsfs
conn = hsfs.connection()
fs = conn.get_feature_store()
# or
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()
Arguments
- name
str | None
: The name of the feature store, defaults toNone
.
Returns
FeatureStore
. A feature store handle object to perform operations on.