ADLS
Azure Data Lake Storage (ADLS) Gen2 is a HDFS-compatible filesystem on Azure for data analytics. The ADLS Gen2 filesystem stores its data in Azure Blob storage, ensuring low-cost storage, high availability, and disaster recovery. In Hopsworks, you can access ADLS Gen2 by defining a Storage Connector and creating and granting persmissions to a service principal.
Requirements#
-
Create an Azure Data Lake Storage Gen2 account and initialize a filesystem, enabling the hierarchical namespace. Note that your storage account must belong to an Azure resource group.
-
Create an Azure AD application and service principal that can access your ADLS storage account and its resource group.
- Register the service principal, granting it a role assignment such as Storage Blob Data Contributor, on the Azure Data Lake Storage Gen2 account.
Info
When you specify the 'container name' in the ADLS storage connector, you need to have previously created that container - the Hopsworks Feature Store will not create that storage container for you.
Azure Create a ADLS Resource#
When programmatically signing in, you need to pass the tenant ID with your authentication request and the application ID. You also need a certificate or an authentication key (described in the following section). To get those values, use the following steps:
-
Select Azure Active Directory.
-
From App registrations in Azure AD, select your application.
-
Copy the Directory (tenant) ID and store it in your application code.
- Copy the Application ID and store it in your application code.
- Create an Application Secret and copy it into the Service Credential field.
Common Problems:
If you get a permission denied error when writing or reading to/from a ADLS container, it is often because the storage principal (app) does not have the correct permissions. Have you added the "Storage Blob Data Owner" or "Storage Blob Data Contributor" role to the resource group for your storage account (or the subscription for your storage group, if you apply roles at the subscription level)? Go to your resource group, then in "Access Control (IAM)", click the "Add" buton to add a "role assignment".
If you get an error "StatusCode=404 StatusDescription=The specified filesystem does not exist.", then maybe you have not created the storage account or the storage container.
References: