Skip to content

Datasets API#

Handle#

[source]

get_dataset_api#

Project.get_dataset_api()

Get the dataset api for the project.

Returns

DatasetApi: The Datasets Api handle


Methods#

[source]

copy#

DatasetApi.copy(source_path, destination_path, overwrite=False)

Copy a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to copy
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • RestAPIError: If unable to perform the copy

[source]

download#

DatasetApi.download(path, local_path=None, overwrite=False)

Download file from Hopsworks Filesystem to the current working directory.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
Arguments

  • path str: path in Hopsworks filesystem to the file
  • local_path str: path where to download the file in the local filesystem
  • overwrite bool: overwrite local file if exists

Returns

str: Path to downloaded file

Raises

  • RestAPIError: If unable to download the file

[source]

exists#

DatasetApi.exists(path)

Check if a file exists in the Hopsworks Filesystem.

Arguments

  • path str: path to check

Returns

bool: True if exists, otherwise False

Raises

  • RestAPIError: If unable to check existence for the path

[source]

list_files#

DatasetApi.list_files(path, offset, limit)

[source]

mkdir#

DatasetApi.mkdir(path)

Create a directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.mkdir("Resources/my_dir")
Arguments

  • path str: path to directory

Returns

str: Path to created directory

Raises

  • RestAPIError: If unable to create the directory

[source]

move#

DatasetApi.move(source_path, destination_path, overwrite=False)

Move a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
Arguments

  • source_path str: the source path to move
  • destination_path str: the destination path
  • overwrite bool: overwrite destination if exists

Raises

  • RestAPIError: If unable to perform the move

[source]

read_content#

DatasetApi.read_content(path, dataset_type="DATASET")

[source]

remove#

DatasetApi.remove(path)

Remove a path in the Hopsworks Filesystem.

Arguments

  • path str: path to remove

Raises

  • RestAPIError: If unable to remove the path

[source]

upload#

DatasetApi.upload(
    local_path,
    upload_path,
    overwrite=False,
    chunk_size=1048576,
    simultaneous_uploads=3,
    max_chunk_retries=1,
    chunk_retry_interval=1,
)

Upload a file to the Hopsworks filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
Arguments

  • local_path str: local path to file to upload
  • upload_path str: path to directory where to upload the file in Hopsworks Filesystem
  • overwrite bool: overwrite file if exists
  • chunk_size: upload chunk size in bytes. Default 1048576 bytes
  • simultaneous_uploads: number of simultaneous chunks to upload. Default 3
  • max_chunk_retries: maximum retry for a chunk. Default is 1
  • chunk_retry_interval: chunk retry interval in seconds. Default is 1sec

Returns

str: Path to uploaded file

Raises

  • RestAPIError: If unable to upload the file

[source]

upload_feature_group#

DatasetApi.upload_feature_group(feature_group, path, dataframe)