Datasets API#
Handle#
get_dataset_api#
Project.get_dataset_api()
Get the dataset api for the project.
Returns
DatasetApi
: The Datasets Api handle
Methods#
chmod#
DatasetApi.chmod(remote_path, permissions)
Change permissions of a file or a directory in the Hopsworks Filesystem.
Arguments
- remote_path
str
: path to change the permissions of. - permissions
str
: permissions string, for example"u+x"
.
Returns
dict
: the updated dataset metadata
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
copy#
DatasetApi.copy(source_path, destination_path, overwrite=False)
Copy a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to copy - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
hopsworks.client.exceptions.DatasetException
: If the destination path already exists and overwrite is not set to Truehopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
download#
DatasetApi.download(path, local_path=None, overwrite=False, chunk_size=1048576)
Download file from Hopsworks Filesystem to the current working directory.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
- path
str
: path in Hopsworks filesystem to the file - local_path
str | None
: path where to download the file in the local filesystem - overwrite
bool | None
: overwrite local file if exists - chunk_size
int
: upload chunk size in bytes. Default 1 MB
Returns
str
: Path to downloaded file
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
exists#
DatasetApi.exists(path)
Check if a file exists in the Hopsworks Filesystem.
Arguments
- path
str
: path to check
Returns
bool
: True if exists, otherwise False
mkdir#
DatasetApi.mkdir(path)
Create a directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.mkdir("Resources/my_dir")
- path
str
: path to directory
Returns
str
: Path to created directory
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
move#
DatasetApi.move(source_path, destination_path, overwrite=False)
Move a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to move - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
hopsworks.client.exceptions.DatasetException
: If the destination path already exists and overwrite is not set to Truehopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
read_content#
DatasetApi.read_content(path, dataset_type="DATASET")
remove#
DatasetApi.remove(path)
Remove a path in the Hopsworks Filesystem.
Arguments
- path
str
: path to remove
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
unzip#
DatasetApi.unzip(remote_path, block=False, timeout=120)
Unzip an archive in the dataset.
Arguments
- remote_path
str
: path to file or directory to unzip. - block
bool
: if the operation should be blocking until complete, defaults to False. - timeout
int | None
: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.
Returns
bool
: whether the operation completed in the specified timeout; if non-blocking, always returns True.
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
upload#
DatasetApi.upload(
local_path,
upload_path,
overwrite=False,
chunk_size=10485760,
simultaneous_uploads=3,
simultaneous_chunks=3,
max_chunk_retries=1,
chunk_retry_interval=1,
)
Upload a file or directory to the Hopsworks filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
- local_path
str
: local path to file or directory to upload, can be relative or absolute - upload_path
str
: path to directory where to upload the file in Hopsworks Filesystem - overwrite
bool
: overwrite file or directory if exists - chunk_size
int
: upload chunk size in bytes. Default 10 MB - simultaneous_chunks
int
: number of simultaneous chunks to upload for each file upload. Default 3 - simultaneous_uploads
int
: number of simultaneous files to be uploaded for directories. Default 3 - max_chunk_retries
int
: maximum retry for a chunk. Default is 1 - chunk_retry_interval
int
: chunk retry interval in seconds. Default is 1sec
Returns
str
: Path to uploaded file or directory
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
upload_feature_group#
DatasetApi.upload_feature_group(feature_group, path, dataframe)
zip#
DatasetApi.zip(remote_path, destination_path=None, block=False, timeout=120)
Zip a file or directory in the dataset.
Arguments
- remote_path
str
: path to file or directory to unzip. - destination_path
str | None
: path to upload the zip, defaults to None. - block
bool
: if the operation should be blocking until complete, defaults to False. - timeout
int | None
: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.
Returns
bool
: whether the operation completed in the specified timeout; if non-blocking, always returns True.
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request