Datasets API#
Handle#
get_dataset_api#
Project.get_dataset_api()
Get the dataset api for the project.
Returns
DatasetApi
: The Datasets Api handle
Methods#
copy#
DatasetApi.copy(source_path, destination_path, overwrite=False)
Copy a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to copy - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
RestAPIError
: If unable to perform the copy
download#
DatasetApi.download(path, local_path=None, overwrite=False)
Download file from Hopsworks Filesystem to the current working directory.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
- path
str
: path in Hopsworks filesystem to the file - local_path
str
: path where to download the file in the local filesystem - overwrite
bool
: overwrite local file if exists
Returns
str
: Path to downloaded file
Raises
RestAPIError
: If unable to download the file
exists#
DatasetApi.exists(path)
Check if a file exists in the Hopsworks Filesystem.
Arguments
- path
str
: path to check
Returns
bool
: True if exists, otherwise False
Raises
RestAPIError
: If unable to check existence for the path
list_files#
DatasetApi.list_files(path, offset, limit)
mkdir#
DatasetApi.mkdir(path)
Create a directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.mkdir("Resources/my_dir")
- path
str
: path to directory
Returns
str
: Path to created directory
Raises
RestAPIError
: If unable to create the directory
move#
DatasetApi.move(source_path, destination_path, overwrite=False)
Move a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to move - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
RestAPIError
: If unable to perform the move
read_content#
DatasetApi.read_content(path, dataset_type="DATASET")
remove#
DatasetApi.remove(path)
Remove a path in the Hopsworks Filesystem.
Arguments
- path
str
: path to remove
Raises
RestAPIError
: If unable to remove the path
upload#
DatasetApi.upload(
local_path,
upload_path,
overwrite=False,
chunk_size=1048576,
simultaneous_uploads=3,
max_chunk_retries=1,
chunk_retry_interval=1,
)
Upload a file to the Hopsworks filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
- local_path
str
: local path to file to upload - upload_path
str
: path to directory where to upload the file in Hopsworks Filesystem - overwrite
bool
: overwrite file if exists - chunk_size: upload chunk size in bytes. Default 1048576 bytes
- simultaneous_uploads: number of simultaneous chunks to upload. Default 3
- max_chunk_retries: maximum retry for a chunk. Default is 1
- chunk_retry_interval: chunk retry interval in seconds. Default is 1sec
Returns
str
: Path to uploaded file
Raises
RestAPIError
: If unable to upload the file
upload_feature_group#
DatasetApi.upload_feature_group(feature_group, path, dataframe)