Hopsworks provides an administrator with a view of the projects in a Hopsworks cluster.
A Hopsworks administrator is not automatically a member of all the projects in a cluster. However, they can see which projects exist, who is the project owner, and they can limit the storage quota and compute quota for each project.
You need to be an administrator on a Hopsworks cluster.
Changing project quotas#
You can find the Project management page by clicking on your name, in the top right coner of the navigation bar, and choosing Cluster Settings from the dropdown menu and going to the Project tab.
This page will list all the projects in a cluster, their name, owner and when its quota was last updated. By clicking on the edit configuration link of a project you will be able to edit the quotas of that project.
Storage quota represents the amount of data a project can store. The storage quota is broken down in three different areas:
- Feature Store: This represents the storage quota for files and directories stored in the
_featurestore.dbdataset in the project. This dataset contains all the feature group offline data for the project.
- Hive DB: This represents the storage quota for files and directories stored in the
[projectName].dbdataset in the project. This is a general purpose Hive database for the project that can be used for analytics.
- Project: This represents the storage quota for all the data stored on any other dataset.
Each storage quota is divided into space quota, i.e., how much space the files can consume, and namespace quota, i.e., how many files and directories there can be. If Hopsworks is deployed on-premise using hard drives to store the data, i.e., Hopsworks is not configured to store its data in a S3-compliant storage system, the data is replicated across multiple nodes (by default 3) and the space quota takes the replication factor into consideration. As an example, a 100MB file stored with a replication factor of 3, will consume 300MB of space quota.
By default, all storage quotas are disabled and not enforced. Administrators can change this default by changing the following configuration in the Configuration UI and/or the cluster definition:
featurestore_default_quota: [default quota in bytes, -1 to disable]
hdfs_default_quota: [default quota in bytes, -1 to disable]
hive_default_quota: [default quota in bytes, -1 to disable]
Compute quotas represents the amount of compute a project can use to run Spark and Flink applications as well as Tez queries. Quota is expressed as number of seconds a container of size 1 CPU and 1GB of RAM can run for.
If the Hopsworks cluster is connected to a Kubernetes cluster, Python jobs, Jupyter notebooks and KServe models are not subject to the compute quota. Currently, Hopsworks does not support defining quotas for compute scheduled on the connected Kubernetes cluster.
By default, the compute quota is disabled. Administrators can change this default by changing the following configuration in the Condiguration UI and/or the cluster definition:
yarn_default_payment_type: [NOLIMIT to disable the quota, PREPAID to enable it]
yarn_default_quota: [default quota in seconds]
The values specified will be set during project creation and administrators will be able to customize each project using this UI.
Kafka is used within Hopsworks to enable users to write data to the feature store in Real-Time and from a variety of different frameworks. If a user creates a feature group with the stream APIs enabled, then a Kafka topic will be created for that feature group. By default, a project can have up to 100 Kafka topics. Administrators can increase the number of Kafka topics a project is allowed to create by increasing the quota in the project admin UI.
Force deleting a project#
Administrators have the option to force delete a project. This is useful if the project was not created or deleted properly, e.g., because of an error.
Controlling who can create projects#
Every user on Hopsworks can create projects. By default, each user can create up to 10 projects. For production environments, the number of projects should be limited and controlled for resource allocation purposes as well as closer control over the data. Administrators can control how many projects a user can provision by setting the following configuration in the Configuration UI and/or cluster definition:
max_num_proj_per_user: [Maximum number of projects each user can create]
This value will be set when the user is provisioned. Administrators can grant additional projects to a specific user through the User Administration UI.