Datasets API#
Handle#
get_dataset_api#
Project.get_dataset_api()
Get the dataset api for the project.
Returns
DatasetApi: The Datasets Api handle
Methods#
copy#
DatasetApi.copy(source_path, destination_path, overwrite=False)
Copy a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
- source_path str: the source path to copy
- destination_path str: the destination path
- overwrite bool: overwrite destination if exists
Raises
- RestAPIError: If unable to perform the copy
download#
DatasetApi.download(path, local_path=None, overwrite=False)
Download file from Hopsworks Filesystem to the current working directory.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
- path str: path in Hopsworks filesystem to the file
- local_path str: path where to download the file in the local filesystem
- overwrite bool: overwrite local file if exists
Returns
str: Path to downloaded file
Raises
- RestAPIError: If unable to download the file
exists#
DatasetApi.exists(path)
Check if a file exists in the Hopsworks Filesystem.
Arguments
- path str: path to check
Returns
bool: True if exists, otherwise False
Raises
- RestAPIError: If unable to check existence for the path
list_files#
DatasetApi.list_files(path, offset, limit)
mkdir#
DatasetApi.mkdir(path)
Create a directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.mkdir("Resources/my_dir")
- path str: path to directory
Returns
str: Path to created directory
Raises
- RestAPIError: If unable to create the directory
move#
DatasetApi.move(source_path, destination_path, overwrite=False)
Move a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
- source_path str: the source path to move
- destination_path str: the destination path
- overwrite bool: overwrite destination if exists
Raises
- RestAPIError: If unable to perform the move
read_content#
DatasetApi.read_content(path, dataset_type="DATASET")
remove#
DatasetApi.remove(path)
Remove a path in the Hopsworks Filesystem.
Arguments
- path str: path to remove
Raises
- RestAPIError: If unable to remove the path
upload#
DatasetApi.upload(
    local_path,
    upload_path,
    overwrite=False,
    chunk_size=1048576,
    simultaneous_uploads=3,
    max_chunk_retries=1,
    chunk_retry_interval=1,
)
Upload a file to the Hopsworks filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
- local_path str: local path to file to upload
- upload_path str: path to directory where to upload the file in Hopsworks Filesystem
- overwrite bool: overwrite file if exists
- chunk_size: upload chunk size in bytes. Default 1048576 bytes
- simultaneous_uploads: number of simultaneous chunks to upload. Default 3
- max_chunk_retries: maximum retry for a chunk. Default is 1
- chunk_retry_interval: chunk retry interval in seconds. Default is 1sec
Returns
str: Path to uploaded file
Raises
- RestAPIError: If unable to upload the file
upload_feature_group#
DatasetApi.upload_feature_group(feature_group, path, dataframe)