Skip to content

How To Run A Jupyter Notebook Job#

Introduction#

All members of a project in Hopsworks can launch the following types of applications through a project's Jobs service:

  • Python (Hopsworks Enterprise only)
  • Apache Spark

Launching a job of any type is very similar process, what mostly differs between job types is the various configuration parameters each job type comes with. After following this guide you will be able to create a Jupyter Notebook job.

Kubernetes integration required

Python Jobs are only available if Hopsworks has been integrated with a Kubernetes cluster.

Hopsworks can be integrated with Amazon EKS, Azure AKS and on-premise Kubernetes clusters.

UI#

Step 1: Jobs overview#

The image below shows the Jobs overview page in Hopsworks and is accessed by clicking Jobs in the sidebar.

Jobs overview
Jobs overview

Step 2: Create new job dialog#

Click New Job and the following dialog will appear.

Create new job dialog
Create new job dialog

Step 3: Set the job type#

By default, the dialog will create a Spark job. To instead configure a Jupyter Notebook job, select PYTHON.

Select Python job type
Select Python job type

Step 4: Set the script#

Next step is to select the Jupyter Notebook to run. You can either select From project, if the file was previously uploaded to Hopsworks, or Upload new file which lets you select a file from your local filesystem as demonstrated below. By default, the job name is the same as the file name, but you can customize it as shown.

Configure program
Configure program

Then click Create job to create the job.

Step 5 (optional): Set the Jupyter Notebook arguments#

In the job settings, you can specify arguments for your notebook script. Arguments must be in the format of -arg1 value1 -arg2 value2. For each argument, you must provide the parameter name (e.g. arg1) preceded by a hyphen (-), followed by its value (e.g. value1). You do not need to handle the arguments in your notebook. Our system uses Papermill to insert a new cell containing the initialized parameters.

Configure notebook arguments
Configure notebook arguments

Step 6 (optional): Additional configuration#

It is possible to also set following configuration settings for a PYTHON job.

  • Container memory: The amount of memory in MB to be allocated to the Jupyter Notebook script
  • Container cores: The number of cores to be allocated for the Jupyter Notebook script
  • Additional files: List of files that will be locally accessible by the application You can always modify the arguments in the job settings.

Set the job type
Set the job type

Step 7: Execute the job#

Now click the Run button to start the execution of the job. You will be redirected to the Executions page where you can see the list of all executions.

Start job execution
Start job execution

Step 8: Visualize output notebook#

Once the execution is finished, click Logs and then notebook out to see the logs for the execution.

Visualize output notebook
Visualize output notebook

You can directly edit and save the output notebook by clicking Open Notebook.

Code#

Step 1: Upload the Jupyter Notebook script#

This snippet assumes the Jupyter Notebook script is in the current working directory and named notebook.ipynb.

It will upload the Jupyter Notebook script to the Resources dataset in your project.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

uploaded_file_path = dataset_api.upload("notebook.ipynb", "Resources")

Step 2: Create Jupyter Notebook job#

In this snippet we get the JobsApi object to get the default job configuration for a PYTHON job, set the Jupyter Notebook script to run and create the Job object.

jobs_api = project.get_jobs_api()

notebook_job_config = jobs_api.get_configuration("PYTHON")

notebook_job_config['appPath'] = uploaded_file_path

job = jobs_api.create_job("notebook_job", notebook_job_config)

Step 3: Execute the job#

In this code snippet, we execute the job with arguments and wait until it reaches a terminal state.

# Run the job
execution = job.run(args='-a 2 -b 5', await_termination=True)

API Reference#

Jobs

Executions

Conclusion#

In this guide you learned how to create and run a Jupyter Notebook job.