Skip to content

Datasets API#

Handle#

[source]

get_dataset_api#

Project.get_dataset_api()

Get the dataset api for the project.

Returns

DatasetApi: The Datasets Api handle


Methods#

[source]

add#

DatasetApi.add(path, name, value)

Attach a name/value tag to a model.

A tag consists of a name/value pair. Tag names are unique identifiers. The value of a tag can be any valid json - primitives, arrays or json objects.

:param path: path to add the tag :type path: str :param name: name of the tag to be added :type name: str :param value: value of the tag to be added :type value: str


[source]

chmod#

DatasetApi.chmod(remote_path, permissions)

Chmod operation on file or directory in datasets.

:param remote_path: path to chmod :type remote_path: str :param permissions: permissions string, for example u+x :type permissions: str


[source]

copy#

DatasetApi.copy(source_path, destination_path, overwrite=False)

Copy a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to copy
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

delete#

DatasetApi.delete(path, name)

Delete a tag.

Tag names are unique identifiers.

:param path: path to delete the tags :type path: str :param name: name of the tag to be removed :type name: str


[source]

download#

DatasetApi.download(path, local_path=None, overwrite=False, chunk_size=1048576)

Download file from Hopsworks Filesystem to the current working directory.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
Arguments

  • path str: path in Hopsworks filesystem to the file
  • local_path str | None: path where to download the file in the local filesystem
  • overwrite bool | None: overwrite local file if exists
  • chunk_size int: upload chunk size in bytes. Default 1 MB

Returns

str: Path to downloaded file

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

exists#

DatasetApi.exists(path)

Check if a file exists in the Hopsworks Filesystem.

Arguments

  • path str: path to check

Returns

bool: True if exists, otherwise False

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

get#

DatasetApi.get(path)

Get dataset.

:param path: path to check :type path: str :return: dataset metadata :rtype: dict


[source]

get_tags#

DatasetApi.get_tags(path, name=None)

Get the tags.

Gets all tags if no tag name is specified.

:param path: path to get the tags :type path: str :param name: tag name :type name: str :return: dict of tag name/values :rtype: dict


[source]

list#

DatasetApi.list(remote_path, sort_by=None, offset=0, limit=1000)

List all files in a directory in datasets.

:param remote_path: path to list :type remote_path: str :param sort_by: sort string :type sort_by: str :param limit: max number of returned files :type limit: int


[source]

list_files#

DatasetApi.list_files(path, offset, limit)

[source]

mkdir#

DatasetApi.mkdir(path)

Create a directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.mkdir("Resources/my_dir")
Arguments

  • path str: path to directory

Returns

str: Path to created directory

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

move#

DatasetApi.move(source_path, destination_path, overwrite=False)

Move a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to move
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

path_exists#

DatasetApi.path_exists(remote_path)

Check if a path exists in datasets.

:param remote_path: path to check :type remote_path: str :return: boolean whether path exists :rtype: bool


[source]

read_content#

DatasetApi.read_content(path, dataset_type="DATASET")

[source]

remove#

DatasetApi.remove(path)

Remove a path in the Hopsworks Filesystem.

Arguments

  • path str: path to remove

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

rm#

DatasetApi.rm(remote_path)

Remove a path in the Hopsworks Filesystem.

Arguments

  • remote_path str: path to remove

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

unzip#

DatasetApi.unzip(remote_path, block=False, timeout=120)

Unzip an archive in the dataset.

Arguments

  • remote_path str: path to file or directory to unzip.
  • block bool: if the operation should be blocking until complete, defaults to False.
  • timeout int | None: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

Returns

bool: whether the operation completed in the specified timeout; if non-blocking, always returns True.


[source]

upload#

DatasetApi.upload(
    local_path,
    upload_path,
    overwrite=False,
    chunk_size=10485760,
    simultaneous_uploads=3,
    simultaneous_chunks=3,
    max_chunk_retries=1,
    chunk_retry_interval=1,
)

Upload a file or directory to the Hopsworks filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")

# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
Arguments

  • local_path str: local path to file or directory to upload, can be relative or absolute
  • upload_path str: path to directory where to upload the file in Hopsworks Filesystem
  • overwrite bool: overwrite file or directory if exists
  • chunk_size int: upload chunk size in bytes. Default 10 MB
  • simultaneous_chunks int: number of simultaneous chunks to upload for each file upload. Default 3
  • simultaneous_uploads int: number of simultaneous files to be uploaded for directories. Default 3
  • max_chunk_retries int: maximum retry for a chunk. Default is 1
  • chunk_retry_interval int: chunk retry interval in seconds. Default is 1sec

Returns

str: Path to uploaded file or directory

Raises

  • hopsworks.client.exceptions.RestAPIError: If the backend encounters an error when handling the request

[source]

upload_feature_group#

DatasetApi.upload_feature_group(feature_group, path, dataframe)

[source]

zip#

DatasetApi.zip(remote_path, destination_path=None, block=False, timeout=120)

Zip a file or directory in the dataset.

Arguments

  • remote_path str: path to file or directory to unzip.
  • destination_path str | None: path to upload the zip, defaults to None.
  • block bool: if the operation should be blocking until complete, defaults to False.
  • timeout int | None: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

Returns

bool: whether the operation completed in the specified timeout; if non-blocking, always returns True.