Getting started with Hopsworks.ai (Azure)#
Hopsworks.ai is our managed platform for running Hopsworks and the Feature Store in the cloud. It integrates seamlessly with third party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up Hopsworks.ai with your organization's Azure account.
Step 1: Connecting your Azure account#
Hopsworks.ai deploys Hopsworks clusters to your Azure account. To enable this, you have to create a service principal and a custom role for Hopsworks.ai granting access to either a subscription or resource group.
Step 1.0: Prerequisite#
For Hopsworks.ai to deploy a cluster the following resource providers need to be registered on your Azure subscription. You can verify that they are registered by going to your subscription in the Azure portal and click on Resource providers. If one of the resource providers is not registered select it and click on Register.
Microsoft.Network
Microsoft.Compute
Microsoft.Storage
Microsoft.ManagedIdentity
Step 1.1: Creating a service principal for Hopsworks.ai#
On Hopsworks.ai, go to Settings/Cloud Accounts and choose to Configure Azure:
Select Add subscription key:
The Azure account configuration will show you the required steps and permissions. Ensure that you have the Azure CLI installed Install the Azure CLI and are logged in Sign in with Azure CLI.
Copy the Azure CLI command from the first step and open a terminal:
Paste the command into the terminal and execute it:
At this point, you might get the following error message. This means that your Azure user does not have sufficient permissions to add the service principal. In this case, please ask your Azure administrator to add it for you or give you the required permissions.
Error
az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f
When using this permission, the backing application of the service principal being created must in the local tenant.
Step 1.2: Creating a custom role for Hopsworks.ai#
Proceed to the Azure Portal and open either a Subscription or Resource Group that you want to use for Hopsworks.ai. Click on Access control (IAM) Select Add and choose Add custom role.
Note
Granting access to a Subscription will grant access to all Resource Groups in that Subscription. If you are uncertain if that is what you want, then start with a Resource Group.
Name the role and proceed to Assignable scopes:
Ensure the scope is set to the Subscription or Resource Group you want to use. You can change it here if required. Proceed to the JSON tab:
Select Edit and replace the actions part of the JSON with the one from Hopsworks.ai Azure account configuration workflow:
Note
If the access rights provided by Hopsworks.ai Azure account configuration workflow are too permissive, you can go to Limiting Azure permissions for more details on how to limit the permissions.
Press Save, proceed to Review + create and create the role:
Step 1.3: Assigning the custom role to Hopsworks.ai#
Back in the Subscription or Resource Group, in Access control (IAM), select Add and choose Add role assignment:
Choose the custom role you just created, select User, group, or service principal to Assign access to and select the hopsworks.ai service principal. Press Save:
Go back to the Hopsworks.ai Azure account configuration workflow and proceed to the next step. Copy the CLI command shown:
Paste the CLI command into your terminal and execute it. Note that you might have multiple entries listed here. If so, ensure that you pick the subscription that you want to use.
Copy the value of id and paste it into the Subscription id field on Hopsworks.ai. Go back to the terminal and copy the value of tenantId. Ensure to NOT use the tenantId under managedByTenants. Paste the value into the Tenant ID field on Hopsworks.ai and press Finish.
Congratulations, you have successfully connected you Azure account to Hopsworks.ai.
Step 2: Creating and configuring a storage#
Note
If you prefer using terraform, you can skip this step and the remaining steps, and instead follow this guide.
The Hopsworks clusters deployed by hopsworks.ai store their data in a container in your Azure account. To enable this you need to perform the following operations
- Create a restrictive role to limit access to the storage account
- Create a User Assigned Managed Identity
- Create a storage account and give Hopsworks clusters access to the storage using the restrictive role
Step 2.1: Creating a Restrictive Role for Accessing Storage#
Similarly to Step 1.2 create a new role named Hopsworks Storage Role
. Add the following permissions to the role
"permissions": [
{
"actions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/write",
"Microsoft.Storage/storageAccounts/blobServices/read",
"Microsoft.Storage/storageAccounts/listKeys/action"
],
"notActions": [],
"dataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
],
"notDataActions": []
}
]
Note
Some of these permissions can be removed at the cost of Hopsworks features, see Limiting Azure permissions for more details.
Step 2.2: Creating a User Assigned Managed Identity#
Proceed to the Azure Portal and open the Resource Group that you want to use for Hopsworks.ai. Click on Add then Marketplace.
Search for User Assigned Managed Identity and click on it.
Click on Create. Then, select the Location you want to use and name the identity. Click on Review + create. Finally click on Create.
Step 2.3: Creating a Storage account#
Proceed to the Azure Portal and open the Resource Group that you want to use for Hopsworks.ai. Click on Add then Marketplace.
Search for Storage account and click on it.
Click on Create, name your storage account, select the Location you want to use and click on Review + create. Finally click on Create.
Step 2.4: Give the Managed Identity access to the storage#
Proceed to the Storage Account you just created and click on Access Control (IAM) (1). Click on Add (2), then click on Add role assignment (3). In Role select Hopsworks Storage Role (4). In Assign access to select User assigned managed identity (5). Select the identity you created in step 2.1 (6). Click on Save (7).
Step 3: Adding a ssh key to your resource group#
When deploying clusters, Hopsworks.ai installs a ssh key on the cluster's instances so that you can access them if necessary. For this purpose you need to add a ssh key to your resource group.
Proceed to the Azure Portal and open the Resource Group that you want to use for Hopsworks.ai. Click on Add then Marketplace.
Search for SSH Key and click on it. Click on Create. Then, name your key pair and choose between Generate a new key pair and Upload existing public key. Click on Review + create. Finally click on Create.
Step 4: Deploying a Hopsworks cluster#
In Hopsworks.ai, select Create cluster:
Select the Resource Group (1) in which you created your storage account and user assigned managed identity (see above).
Note
If the Resource Group does not appear in the drop-down, make sure that you properly created and set the custom role for this resource group.
Name your cluster (2). Your cluster will be deployed in the Location of your Resource Group (3).
Select the Instance type (4) and Local storage (5) size for the cluster Head node.
Select the storage account (6) you created above in Azure Storage account name. The name of the container in which the data will be stored is displayed in Azure Container name (7), you can modify it if needed.
Note
You can choose to use a container already existing in your storage account by using the name of this container, but you need to first make sure that this container is empty.
Press Next:
Select the number of workers you want to start the cluster with (2). Select the Instance type (3) and Local storage size (4) for the worker nodes.
Note
It is possible to add or remove workers or to enable autoscaling once the cluster is running.
Press Next:
Select the SSH key that you want to use to access cluster instances:
Select the User assigned managed identity that you created above:
Step 6 set the backup retention policy:#
To backup the Azure blob storage data when taking a cluster backups we need to set a retention policy for the blob storage. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on Review and submit.
Review all information and select Create:
Note
We skipped cluster creation steps that are not mandatory. You can find more details about these steps here
The cluster will start. This will take a few minutes:
As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster.
Step 5: Next steps#
Check out our other guides for how to get started with Hopsworks and the Feature Store:
- Make Hopsworks services accessible from outside services
- Get started with the Hopsworks Feature Store
- Get started with Machine Learning on Hopsworks: HopsML
- Get started with Hopsworks: User Guide
- Code examples and notebooks: hops-examples