How-To set up a Snowflake Data Source#
Introduction#
Snowflake provides a cloud-based data storage and analytics service, used as a data warehouse in many enterprises.
Data warehouses are often the source of raw data for feature engineering pipelines and Snowflake supports scalable feature computation with SQL. However, Snowflake is not viable as an online feature store that serves features to models in production, with its columnar database layout its latency is too high compared to OLTP databases or key-value stores.
In this guide, you will configure a Data Source in Hopsworks to save all the authentication information needed in order to set up a connection to your Snowflake database. When you're finished, you'll be able to query the database using Spark through HSFS APIs.
Note
Currently, it is only possible to create data sources in the Hopsworks UI. You cannot create a data source programmatically.
Prerequisites#
Before you begin this guide you'll need to retrieve the following information from your Snowflake account and database, the following options are mandatory:
- Snowflake Connection URL: Consult the documentation of your target snowflake account to determine the correct connection URL. This is usually some form of your Snowflake account identifier. For example: OR:
<account_identifier>.snowflakecomputing.com
The account and organization details can be viewed in the Snowsight UI under Admin > Account or by querying it in SQL, as explained in Snowflake documentation. Below is an example of how to view the account and organization to get the account identifier from the Snowsight UI.https://<orgname>-<account_name>.snowflakecomputing.com
Token-based authentication or password based
The Snowflake data source supports both username and password authentication as well as token-based authentication.
Currently token-based authentication is in beta phase. Users are advised to use username/password and/or create a service account for accessing Snowflake from Hopsworks.
- Username and Password: Login name for the Snowflake user and password. This is often also referred to as
sfUser
andsfPassword
. - Warehouse: The warehouse to use for the session after connecting
- Database: The database to use for the session after connecting.
- Schema: The schema to use for the session after connecting.
These are a few additional optional arguments:
- Role: The role field can be used to specify which Snowflake security role to assume for the session after the connection is established.
- Application: The application field can also be specified to have better observability in Snowflake with regards to which application is running which query. The application field can be a simple string like “Hopsworks” or, for instance, the project name, to track usage and queries from each Hopsworks project.
Creation in the UI#
Step 1: Set up new Data Source#
Head to the Data Source View on Hopsworks (1) and set up a new data source (2).
Step 2: Enter Snowflake Settings#
Enter the details for your Snowflake connector. Start by giving it a name and an optional description.
- Select "Snowflake" as storage.
- Specify the hostname for your account in the following format
<account_identifier>.snowflakecomputing.com
orhttps://<orgname>-<account_name>.snowflakecomputing.com
. - Login name for the Snowflake user.
- Password for the Snowflake user or Token.
- The warehouse to connect to.
- The database to use for the connection.
- Add any additional optional arguments. For example, you can specify
Schema
,Table
,Role
, andApplication
. - Optional additional key/value arguments.
- Click on "Save Credentials".
Next Steps#
Move on to the usage guide for data sources to see how you can use your newly created Snowflake connector.