Skip to content

Getting started with managed.hopsworks.ai (Azure)#

Managed.hopsworks.ai is our managed platform for running Hopsworks and the Feature Store in the cloud. It integrates seamlessly with third party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up managed.hopsworks.ai with your organization's Azure account.

Step 1: Connecting your Azure account#

Managed.hopsworks.ai deploys Hopsworks clusters to your Azure account. To enable this, you have to create a service principal and a custom role for managed.hopsworks.ai granting access to either a subscription or resource group.

Step 1.0: Prerequisite#

For managed.hopsworks.ai to deploy a cluster the following resource providers need to be registered on your Azure subscription. You can verify that they are registered by going to your subscription in the Azure portal and click on Resource providers. If one of the resource providers is not registered select it and click on Register.

    Microsoft.Network
    Microsoft.Compute
    Microsoft.Storage
    Microsoft.ManagedIdentity

Step 1.1: Creating a service principal for managed.hopsworks.ai#

On managed.hopsworks.ai, go to Settings/Cloud Accounts and choose to Configure Azure:

Cloud account settings
Cloud account settings

Select Add subscription key:

Add subscription keys
Add subscription keys

The Azure account configuration will show you the required steps and permissions. Ensure that you have the Azure CLI installed Install the Azure CLI and are logged in Sign in with Azure CLI.

Copy the Azure CLI command from the first step and open a terminal:

Connect your Azure Account
Connect your Azure Account

Paste the command into the terminal and execute it:

Add service principal
Add service principal

At this point, you might get the following error message. This means that your Azure user does not have sufficient permissions to add the service principal. In this case, please ask your Azure administrator to add it for you or give you the required permissions.

Error

az ad sp create --id d4abcc44-2c40-40bd-9bba-986df591c28f

When using this permission, the backing application of the service principal being created must in the local tenant.

Step 1.2: Creating a custom role for managed.hopsworks.ai#

Proceed to the Azure Portal and open either a Subscription or Resource Group that you want to use for managed.hopsworks.ai. Click on Access control (IAM) Select Add and choose Add custom role.

Note

Granting access to a Subscription will grant access to all Resource Groups in that Subscription. If you are uncertain if that is what you want, then start with a Resource Group.

Add custom role
Add custom role

Name the role and proceed to Assignable scopes:

Name custom role
Name custom role

Ensure the scope is set to the Subscription or Resource Group you want to use. You can change it here if required. Proceed to the JSON tab:

Review assignable scope
Review assignable scope

Select Edit and replace the actions part of the JSON with the one from managed.hopsworks.ai Azure account configuration workflow:

managed.hopsworks.ai permission list
managed.hopsworks.ai permission list

Note

If the access rights provided by managed.hopsworks.ai Azure account configuration workflow are too permissive, you can go to Limiting Azure permissions for more details on how to limit the permissions.

Press Save, proceed to Review + create and create the role:

Update permission JSON
Update permission JSON

Step 1.3: Assigning the custom role to managed.hopsworks.ai#

Back in the Subscription or Resource Group, in Access control (IAM), select Add and choose Add role assignment:

Add role assignment
Add role assignment

Choose the custom role you just created, select User, group, or service principal to Assign access to and select the hopsworks.ai service principal. Press Save:

Configure managed.hopsworks.ai as role assignment
Configure managed.hopsworks.ai as role assignment

Go back to the managed.hopsworks.ai Azure account configuration workflow and proceed to the next step. Copy the CLI command shown:

Configure subscription and tenant id
Configure subscription and tenant id

Paste the CLI command into your terminal and execute it. Note that you might have multiple entries listed here. If so, ensure that you pick the subscription that you want to use.

Show subscription and tenant id
Show subscription and tenant id

Copy the value of id and paste it into the Subscription id field on managed.hopsworks.ai. Go back to the terminal and copy the value of tenantId. Ensure to NOT use the tenantId under managedByTenants. Paste the value into the Tenant ID field on managed.hopsworks.ai and press Finish.

Congratulations, you have successfully connected you Azure account to managed.hopsworks.ai.

Store subscription and tenant id
Store subscription and tenant id

Step 2: Creating and configuring a storage#

Note

If you prefer using terraform, you can skip this step and the remaining steps, and instead follow this guide.

The Hopsworks clusters deployed by managed.hopsworks.ai store their data in a container in your Azure account. To enable this you need to perform the following operations

  • Create a restrictive role to limit access to the storage account
  • Create a User Assigned Managed Identity
  • Create a storage account and give Hopsworks clusters access to the storage using the restrictive role

Step 2.1: Creating a Restrictive Role for Accessing Storage#

Similarly to Step 1.2 create a new role named Hopsworks Storage Role. Add the following permissions to the role

"permissions": [
    {
        "actions": [
            "Microsoft.Storage/storageAccounts/blobServices/containers/write",
            "Microsoft.Storage/storageAccounts/blobServices/containers/read",
            "Microsoft.Storage/storageAccounts/blobServices/write",
            "Microsoft.Storage/storageAccounts/blobServices/read",
            "Microsoft.Storage/storageAccounts/listKeys/action"
        ],
        "notActions": [],
        "dataActions": [
            "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
            "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
            "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
            "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
        ],
        "notDataActions": []
    }
]

Note

Some of these permissions can be removed at the cost of Hopsworks features, see Limiting Azure permissions for more details.

Step 2.2: Creating a User Assigned Managed Identity#

Proceed to the Azure Portal and open the Resource Group that you want to use for managed.hopsworks.ai. Click on Add then Marketplace.

Add to resource group
Add to resource group

Search for User Assigned Managed Identity and click on it.

Search User Assigned Managed Identity
Search User Assigned Managed Identity

Click on Create. Then, select the Location you want to use and name the identity. Click on Review + create. Finally click on Create.

Create a User Assigned Managed Identity
Create a User Assigned Managed Identity

Step 2.3: Creating a Storage account#

Proceed to the Azure Portal and open the Resource Group that you want to use for managed.hopsworks.ai. Click on Add then Marketplace.

Add to resource group
Add to resource group

Search for Storage account and click on it.

Search Storage Account Identity
Search Storage Account Identity

Click on Create, name your storage account, select the Location you want to use and click on Review + create. Finally click on Create.

Create a Storage Account
Create a Storage Account

Step 2.4: Give the Managed Identity access to the storage#

Proceed to the Storage Account you just created and click on Access Control (IAM) (1). Click on Add (2), then click on Add role assignment (3). In Role select Hopsworks Storage Role (4). In Assign access to select User assigned managed identity (5). Select the identity you created in step 2.1 (6). Click on Save (7).

Add role assignment to storage
Add role assignment to storage

Step 3: Adding a ssh key to your resource group#

When deploying clusters, managed.hopsworks.ai installs a ssh key on the cluster's instances so that you can access them if necessary. For this purpose you need to add a ssh key to your resource group.

Proceed to the Azure Portal and open the Resource Group that you want to use for managed.hopsworks.ai. Click on Add then Marketplace.

Add to resource group
Add to resource group

Search for SSH Key and click on it. Click on Create. Then, name your key pair and choose between Generate a new key pair and Upload existing public key. Click on Review + create. Finally click on Create.

Create a SSH key
Add to resource group

Step 4: Deploying a Hopsworks cluster#

In managed.hopsworks.ai, select Create cluster:

Create a Hopsworks cluster
Create a Hopsworks cluster

Select the Resource Group (1) in which you created your storage account and user assigned managed identity (see above).

Note

If the Resource Group does not appear in the drop-down, make sure that you properly created and set the custom role for this resource group.

Name your cluster (2). Your cluster will be deployed in the Location of your Resource Group (3).

Select the Instance type (4) and Local storage (5) size for the cluster Head node.

Select the storage account (6) you created above in Azure Storage account name. The name of the container in which the data will be stored is displayed in Azure Container name (7), you can modify it if needed.

Note

You can choose to use a container already existing in your storage account by using the name of this container, but you need to first make sure that this container is empty.

Press Next:

General configuration
General configuration

Select the number of workers you want to start the cluster with (2). Select the Instance type (3) and Local storage size (4) for the worker nodes.

Note

It is possible to add or remove workers or to enable autoscaling once the cluster is running.

Press Next:

Create a Hopsworks cluster, static workers configuration
Create a Hopsworks cluster, static workers configuration

Select the SSH key that you want to use to access cluster instances:

Choose SSH key
Choose SSH key

Select the User assigned managed identity that you created above:

Choose the User assigned managed identity
Choose the User assigned managed identity

Step 6 set the backup retention policy:#

To backup the Azure blob storage data when taking a cluster backups we need to set a retention policy for the blob storage. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on Review and submit.

Choose the backup retention policy
Choose the backup retention policy

Review all information and select Create:

Review cluster information
Review cluster information

Note

We skipped cluster creation steps that are not mandatory. You can find more details about these steps here

The cluster will start. This will take a few minutes:

Booting Hopsworks cluster
Booting Hopsworks cluster

As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster.

Running Hopsworks cluster
Running Hopsworks cluster

Step 5: Next steps#

Check out our other guides for how to get started with Hopsworks and the Feature Store: