Skip to content

Networking#

In order for Spark to communicate with the Feature Store from Databricks, networking needs to be set up correctly. This includes deploying the Hopsworks Instance to either the same VPC or enable VPC/VNet peering between the VPC/VNet of the Databricks Cluster and the Hopsworks Cluster.

AWS#

Step 1: Ensure network connectivity#

The DataFrame API needs to be able to connect directly to the IP on which the Feature Store is listening. This means that if you deploy the Feature Store on AWS you will either need to deploy the Feature Store in the same VPC as your Databricks cluster or to set up VPC Peering between your Databricks VPC and the Feature Store VPC.

Option 1: Deploy the Feature Store in the Databricks VPC

When you deploy the Feature Store Hopsworks instance, select the Databricks VPC and Availability Zone as the VPC and Availability Zone of your Feature Store cluster. Identify your Databricks VPC by searching for VPCs containing Databricks in their name in your Databricks AWS region in the AWS Management Console:

Identify the Databricks VPC
Identify the Databricks VPC

Hopsworks installer

If you are performing an installation using the Hopsworks installer script, ensure that the virtual machines you install Hopsworks on are deployed in the EMR VPC.

managed.hopsworks.ai

If you are working on managed.hopsworks.ai, you can directly deploy the Hopsworks instance to the Databricks VPC, by simply selecting it at the VPC selection step during cluster creation.

Option 2: Set up VPC peering

Follow the guide VPC Peering to set up VPC peering between the Feature Store cluster and Databricks. Get your Feature Store VPC ID and CIDR by searching for thr Feature Store VPC in the AWS Management Console:

managed.hopsworks.ai

On managed.hopsworks.ai, the VPC is shown in the cluster details.

Identify the Feature Store VPC
Identify the Feature Store VPC

Step 2: Configure the Security Group#

The Feature Store Security Group needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store.

managed.hopsworks.ai

If you deployed your Hopsworks Feature Store with managed.hopsworks.ai, you only need to enable outside access of the Feature Store, Online Feature Store, and Kafka services.

Open your feature store instance under EC2 in the AWS Management Console and ensure that ports 443, 3306, 9083, 9085, 8020, 50010, and 9092 are reachable from the Databricks Security Group:

Hopsworks Feature Store Security Group
Hopsworks Feature Store Security Group

Connectivity from the Databricks Security Group can be allowed by opening the Security Group, adding a port to the Inbound rules and searching for dbe-worker in the source field. Selecting any of the dbe-worker Security Groups will be sufficient:

Hopsworks Feature Store Security Group details
Hopsworks Feature Store Security Group details

Azure#

Step 1: Set up VNet peering between Hopsworks and Databricks#

VNet peering between the Hopsworks and the Databricks virtual network is required to be able to connect to the Feature Store from Databricks.

In the Azure portal, go to Azure Databricks and go to Virtual Network Peering:

Azure Databricks
Azure Databricks

Select Add Peering:

Add peering
Add peering

Name the peering and select the virtual network used by your Hopsworks cluster. The virtual network is shown in the cluster details on managed.hopsworks.ai (see the next picture). Ensure to press the copy button on the bottom of the page and save the value somewhere. Press Add and create the peering:

Configure peering
Configure peering

The virtual network used by your cluster is shown under Details:

Check the Hopsworks virtual network
Check the Hopsworks virtual network

The peering connection should now be listed as initiated:

Peering connection initiated
Peering connection initiated

On the Azure portal, go to Virtual networks and search for the virtual network used by your Hopsworks cluster:

Virtual networks
Virtual networks

Open the network and select Peerings:

Select peerings
Select peerings

Choose to add a peering connection:

Add a peering connection
Add a peering connection

Name the peering connection and select I know my resource ID. Paste the string copied when creating the peering from Databricks Azure. If you haven't copied that string, then manually select the virtual network used by Databricks and press OK to create the peering:

Configure peering
Configure peering

The peering should now be Updating:

Cloud account settings
Cloud account settings

Wait for the peering to show up as Connected. There should now be bi-directional network connectivity between the Feature Store and Databricks:

Cloud account settings
Cloud account settings

Step 2: Configure the Network Security Group#

The virtual network peering will allow full access between the Hopsworks virtual network and the Databricks virtual network by default. However, if you have a different setup, ensure that the Network Security Group of the Feature Store is configured to allow traffic from your Databricks clusters.

Ensure that ports 443, 9083, 9085, 8020, 50010, and 9092 are reachable from the Databricks cluster Network Security Group.

managed.hopsworks.ai

If you deployed your Hopsworks Feature Store instance with managed.hopsworks.ai, it suffices to enable outside access of the Feature Store, Online Feature Store, and Kafka services.

Next Steps#

Continue with the Hopsworks API key guide to setup access to a Hopsworks API key from the Databricks Cluster, in order to be able to use the Hopsworks Feature Store.