Skip to content

Datasets API#

Handle#

[source]

get_dataset_api#

Project.get_dataset_api()

Get the dataset api for the project.

Returns

DatasetApi: The Datasets Api handle


Methods#

[source]

chmod#

DatasetApi.chmod(remote_path, permissions)

Change permissions of a file or a directory in the Hopsworks Filesystem.

Arguments

  • remote_path str: path to change the permissions of.
  • permissions str: permissions string, for example "u+x".

Returns

dict: the updated dataset metadata

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

copy#

DatasetApi.copy(source_path, destination_path, overwrite=False)

Copy a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to copy
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • hopsworks.client.exceptions.DatasetException: If the destination path already exists and overwrite is not set to True
  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

download#

DatasetApi.download(path, local_path=None, overwrite=False, chunk_size=1048576)

Download file from Hopsworks Filesystem to the current working directory.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
Arguments

  • path str: path in Hopsworks filesystem to the file
  • local_path str | None: path where to download the file in the local filesystem
  • overwrite bool | None: overwrite local file if exists
  • chunk_size int: upload chunk size in bytes. Default 1 MB

Returns

str: Path to downloaded file

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

exists#

DatasetApi.exists(path)

Check if a file exists in the Hopsworks Filesystem.

Arguments

  • path str: path to check

Returns

bool: True if exists, otherwise False


[source]

mkdir#

DatasetApi.mkdir(path)

Create a directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.mkdir("Resources/my_dir")
Arguments

  • path str: path to directory

Returns

str: Path to created directory

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

move#

DatasetApi.move(source_path, destination_path, overwrite=False)

Move a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to move
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • hopsworks.client.exceptions.DatasetException: If the destination path already exists and overwrite is not set to True
  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

read_content#

DatasetApi.read_content(path, dataset_type="DATASET")

[source]

remove#

DatasetApi.remove(path)

Remove a path in the Hopsworks Filesystem.

Arguments

  • path str: path to remove

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

unzip#

DatasetApi.unzip(remote_path, block=False, timeout=120)

Unzip an archive in the dataset.

Arguments

  • remote_path str: path to file or directory to unzip.
  • block bool: if the operation should be blocking until complete, defaults to False.
  • timeout int | None: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

Returns

bool: whether the operation completed in the specified timeout; if non-blocking, always returns True.

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

upload#

DatasetApi.upload(
    local_path,
    upload_path,
    overwrite=False,
    chunk_size=10485760,
    simultaneous_uploads=3,
    simultaneous_chunks=3,
    max_chunk_retries=1,
    chunk_retry_interval=1,
)

Upload a file or directory to the Hopsworks filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")

# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
Arguments

  • local_path str: local path to file or directory to upload, can be relative or absolute
  • upload_path str: path to directory where to upload the file in Hopsworks Filesystem
  • overwrite bool: overwrite file or directory if exists
  • chunk_size int: upload chunk size in bytes. Default 10 MB
  • simultaneous_chunks int: number of simultaneous chunks to upload for each file upload. Default 3
  • simultaneous_uploads int: number of simultaneous files to be uploaded for directories. Default 3
  • max_chunk_retries int: maximum retry for a chunk. Default is 1
  • chunk_retry_interval int: chunk retry interval in seconds. Default is 1sec

Returns

str: Path to uploaded file or directory

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

upload_feature_group#

DatasetApi.upload_feature_group(feature_group, path, dataframe)

[source]

zip#

DatasetApi.zip(remote_path, destination_path=None, block=False, timeout=120)

Zip a file or directory in the dataset.

Arguments

  • remote_path str: path to file or directory to unzip.
  • destination_path str | None: path to upload the zip, defaults to None.
  • block bool: if the operation should be blocking until complete, defaults to False.
  • timeout int | None: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

Returns

bool: whether the operation completed in the specified timeout; if non-blocking, always returns True.

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request