Skip to content

Getting started with managed.hopsworks.ai (Azure)#

Managed.hopsworks.ai is our managed platform for running Hopsworks and the Feature Store in the cloud. It integrates seamlessly with third-party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up managed.hopsworks.ai with your organization's Azure account.

Prerequisites#

To follow the instruction on this page you will need the following:

  • An Azure resource group in which the Hopsworks cluster will be deployed.
  • The azure CLI installed and logged in.

Permissions#

To run all the commands on this page the user needs to have at least the following permissions on the Azure resource group:

Microsoft.Authorization/roleDefinitions/write
Microsoft.Authorization/roleAssignments/write
Microsoft.Compute/sshPublicKeys/generateKeyPair/action
Microsoft.Compute/sshPublicKeys/read
Microsoft.Compute/sshPublicKeys/write
Microsoft.ContainerRegistry/registries/operationStatuses/read
Microsoft.ContainerRegistry/registries/read
Microsoft.ContainerRegistry/registries/write
Microsoft.ManagedIdentity/userAssignedIdentities/write
Microsoft.Resources/subscriptions/resourcegroups/read
Microsoft.Storage/storageAccounts/write

You will also need to have a role such as Application Administrator on the Azure Active Directory to be able to create the hopsworks.ai service principal.

Resource providers#

For managed.hopsworks.ai to deploy a cluster the following resource providers need to be registered on your Azure subscription.

Microsoft.Network
Microsoft.Compute
Microsoft.Storage
Microsoft.ManagedIdentity
Microsoft.ContainerRegistry
This can be done by running the following commands:

Note

To run these commands you need to have the following permission on your subscription: Microsoft.Network/register/action

az provider register --namespace 'Microsoft.Network'
az provider register --namespace 'Microsoft.Compute'
az provider register --namespace 'Microsoft.Storage'
az provider register --namespace 'Microsoft.ManagedIdentity'
az provider register --namespace 'Microsoft.ContainerRegistry'

Other#

All the commands have been written for a Unix system. These commands will need to be adapted to your terminal if it is not directly compatible.

All the commands use your default location. Add the --location parameter if you want to run your cluster in another location. Make sure to create the resources in the same location as you are going to run your cluster.

Step 1: Connect your Azure account#

Managed.hopsworks.ai deploys Hopsworks clusters to your Azure account. To enable this, you have to create a service principal and a custom role for managed.hopsworks.ai granting access to your resource group.

Step 1.1: Connect your Azure account#

In managed.hopsworks.ai click on Connect to Azure or go to Settings and click on Configure next to Azure. This will direct you to a page with the instructions needed to create the service principal and set up the connection. Follow the instructions.

Note

it is possible to limit the permissions that are set up during this phase. For more details see restrictive-permissions.

Cloud account settings
Cloud account settings

Step 2: Create a storage#

Note

If you prefer using terraform, you can skip this step and the remaining steps, and instead, follow this guide.

The Hopsworks clusters deployed by managed.hopsworks.ai store their data in a storage container in your Azure account. To enable this you need to create a storage account. This is done by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.

az storage account create --resource-group $RESOURCE_GROUP --name hopsworksstorage$RANDOM

Step 3: Create an ACR Container Registry#

The Hopsworks clusters deployed by managed.hopsworks.ai store their docker images in a container registry in your Azure account. To create this storage account run the following command, replacing $RESOURCE_GROUP with the name of your resource group.

az acr create --resource-group $RESOURCE_GROUP --name hopsworksecr --sku Premium

To prevent the registry from filling up with unnecessary images and artifacts you can enable a retention policy. A retention policy will automatically remove untagged manifests after a specified number of days. To enable a retention policy, run the following command, replacing $RESOURCE_GROUP with the name of your resource group.

az acr config retention update --resource-group $RESOURCE_GROUP --registry hopsworksecr --status Enabled --days 7 --type UntaggedManifests

Step 4: Create a managed identity#

To allow the hopsworks cluster instances to access the storage account and the container registry, managed.hopsworks.ai assigns a managed identity to the cluster nodes. To enable this you need to:

  • Create a managed identity
  • Create a role with appropriate permission and assign it to the managed identity

Step 4.1: Create a managed identity#

You create a managed identity by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.

identityId=$(az identity create --name hopsworks-instance --resource-group $RESOURCE_GROUP --query principalId -o tsv)

Step 4.2: Create a role for the managed identity#

To create a new role for the managed identity, first, create a file called instance-role.json with the following content. Replace SUBSCRIPTION_ID by your subscription id and RESOURCE_GROUP by your resource group

{
  "Name": "hopsworks-instance",
  "IsCustom": true,
  "Description": "Allow the hopsworks instance to access the storage and the docker repository",
  "Actions": [
      "Microsoft.Storage/storageAccounts/blobServices/containers/write",
      "Microsoft.Storage/storageAccounts/blobServices/containers/read",
      "Microsoft.Storage/storageAccounts/blobServices/write",
      "Microsoft.Storage/storageAccounts/blobServices/read",
      "Microsoft.Storage/storageAccounts/listKeys/action",
      "Microsoft.ContainerRegistry/registries/artifacts/delete",
      "Microsoft.ContainerRegistry/registries/pull/read",
      "Microsoft.ContainerRegistry/registries/push/write"
  ],
  "NotActions": [

  ],
  "DataActions": [
      "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
      "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
      "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
      "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
  ],
  "AssignableScopes": [
    "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP"
  ]
}
Then run the following command, to create the new role.

az role definition create --role-definition instance-role.json

Finally assign the role to the managed identity by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.

az role assignment create --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP --role hopsworks-instance --assignee $identityId

Note

It takes several minutes between the time you create the managed identity and the time a role can be assigned to it. So if we get an error message starting by the following wait and retry: Cannot find user or service principal in graph database

Step 5: Add an ssh key to your resource group#

When deploying clusters, managed.hopsworks.ai installs an ssh key on the cluster's instances so that you can access them if necessary. For this purpose, you need to add an ssh key to your resource group.

To create an ssh key in your resource group run the following command, replacing $RESOURCE_GROUP with the name of your resource group.

az sshkey create --resource-group $RESOURCE_GROUP --name hopsworksKey 

Note

the command returns the path to the private and public keys associated with this ssh key. You can also create a key from an existing public key as indicated in the Azure documentation

Step 6: Deploy a Hopsworks cluster#

In managed.hopsworks.ai, select Create cluster:

Create a Hopsworks cluster
Create a Hopsworks cluster

Select the Resource Group (1) in which you created your storage account and managed identity (see above).

Note

If the Resource Group does not appear in the drop-down, make sure that the custom role you created in step 1.1 has the Microsoft.Resources/subscriptions/resourceGroups/read permission and is assigned to the hopsworks.ai user.

Name your cluster (2). Your cluster will be deployed in the Location of your Resource Group (3).

Select the Instance type (4) and Local storage (5) size for the cluster Head node.

Check if you want to Use customer-managed encryption key (6)

Select the storage account (7) you created above in Azure Storage account name. The name of the container in which the data will be stored is displayed in Azure Container name (8), you can modify it if needed.

Note

You can choose to use a container already existing in your storage account by using the name of this container, but you need to first make sure that this container is empty.

Enter the Azure container registry name (9) of the ACR registry created in Step 3.1

Press Next:

General configuration
General configuration

Select the number of workers you want to start the cluster with (2). Select the Instance type (3) and Local storage size (4) for the worker nodes.

Note

It is possible to add or remove workers or to enable autoscaling once the cluster is running.

Press Next:

Create a Hopsworks cluster, static workers configuration
Create a Hopsworks cluster, static workers configuration

Select the SSH key that you want to use to access cluster instances:

Choose SSH key
Choose SSH key

Select the User assigned managed identity that you created above:

Choose the User assigned managed identity
Choose the User assigned managed identity

To backup the Azure blob storage data when taking a cluster backup we need to set a retention policy for the blob storage. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on Review and submit.

Choose the backup retention policy
Choose the backup retention policy

Review all information and select Create:

Review cluster information
Review cluster information

Note

We skipped cluster creation steps that are not mandatory. You can find more details about these steps here

The cluster will start. This will take a few minutes:

Booting Hopsworks cluster
Booting Hopsworks cluster

As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster.

Running Hopsworks cluster
Running Hopsworks cluster

Step 7: Next steps#

Check out our other guides for how to get started with Hopsworks and the Feature Store: