Networking#
In order for Spark to communicate with the Feature Store from Databricks, networking needs to be set up correctly. This includes deploying the Hopsworks Instance to either the same VPC or enable VPC/VNet peering between the VPC/VNet of the Databricks Cluster and the Hopsworks Cluster.
AWS#
Step 1: Ensure network connectivity#
The DataFrame API needs to be able to connect directly to the IP on which the Feature Store is listening. This means that if you deploy the Feature Store on AWS you will either need to deploy the Feature Store in the same VPC as your Databricks cluster or to set up VPC Peering between your Databricks VPC and the Feature Store VPC.
Option 1: Deploy the Feature Store in the Databricks VPC
When you deploy the Feature Store Hopsworks instance, select the Databricks VPC and Availability Zone as the VPC and Availability Zone of your Feature Store cluster. Identify your Databricks VPC by searching for VPCs containing Databricks in their name in your Databricks AWS region in the AWS Management Console:
Hopsworks installer
If you are performing an installation using the Hopsworks installer script, ensure that the virtual machines you install Hopsworks on are deployed in the EMR VPC.
managed.hopsworks.ai
If you are working on managed.hopsworks.ai, you can directly deploy the Hopsworks instance to the Databricks VPC, by simply selecting it at the VPC selection step during cluster creation.
Option 2: Set up VPC peering
Follow the guide VPC Peering to set up VPC peering between the Feature Store cluster and Databricks. Get your Feature Store VPC ID and CIDR by searching for thr Feature Store VPC in the AWS Management Console:
managed.hopsworks.ai
On managed.hopsworks.ai, the VPC is shown in the cluster details.
Step 2: Configure the Security Group#
The Feature Store Security Group needs to be configured to allow traffic from your Databricks clusters to be able to connect to the Feature Store.
managed.hopsworks.ai
If you deployed your Hopsworks Feature Store with managed.hopsworks.ai, you only need to enable outside access of the Feature Store, Online Feature Store, and Kafka services.
Open your feature store instance under EC2 in the AWS Management Console and ensure that ports 443, 3306, 9083, 9085, 8020, 50010, and 9092 are reachable from the Databricks Security Group:
Connectivity from the Databricks Security Group can be allowed by opening the Security Group, adding a port to the Inbound rules and searching for dbe-worker in the source field. Selecting any of the dbe-worker Security Groups will be sufficient:
Azure#
Step 1: Set up VNet peering between Hopsworks and Databricks#
VNet peering between the Hopsworks and the Databricks virtual network is required to be able to connect to the Feature Store from Databricks.
In the Azure portal, go to Azure Databricks and go to Virtual Network Peering:
Select Add Peering:
Name the peering and select the virtual network used by your Hopsworks cluster. The virtual network is shown in the cluster details on managed.hopsworks.ai (see the next picture). Ensure to press the copy button on the bottom of the page and save the value somewhere. Press Add and create the peering:
The virtual network used by your cluster is shown under Details:
The peering connection should now be listed as initiated:
On the Azure portal, go to Virtual networks and search for the virtual network used by your Hopsworks cluster:
Open the network and select Peerings:
Choose to add a peering connection:
Name the peering connection and select I know my resource ID. Paste the string copied when creating the peering from Databricks Azure. If you haven't copied that string, then manually select the virtual network used by Databricks and press OK to create the peering:
The peering should now be Updating:
Wait for the peering to show up as Connected. There should now be bi-directional network connectivity between the Feature Store and Databricks:
Step 2: Configure the Network Security Group#
The virtual network peering will allow full access between the Hopsworks virtual network and the Databricks virtual network by default. However, if you have a different setup, ensure that the Network Security Group of the Feature Store is configured to allow traffic from your Databricks clusters.
Ensure that ports 443, 9083, 9085, 8020, 50010, and 9092 are reachable from the Databricks cluster Network Security Group.
managed.hopsworks.ai
If you deployed your Hopsworks Feature Store instance with managed.hopsworks.ai, it suffices to enable outside access of the Feature Store, Online Feature Store, and Kafka services.
Next Steps#
Continue with the Hopsworks API key guide to setup access to a Hopsworks API key from the Databricks Cluster, in order to be able to use the Hopsworks Feature Store.