Datasets API#
Handle#
get_dataset_api#
Project.get_dataset_api()
Get the dataset api for the project.
Returns
DatasetApi
: The Datasets Api handle
Methods#
add#
DatasetApi.add(path, name, value)
Attach a name/value tag to a model.
A tag consists of a name/value pair. Tag names are unique identifiers. The value of a tag can be any valid json - primitives, arrays or json objects.
:param path: path to add the tag :type path: str :param name: name of the tag to be added :type name: str :param value: value of the tag to be added :type value: str
chmod#
DatasetApi.chmod(remote_path, permissions)
Chmod operation on file or directory in datasets.
:param remote_path: path to chmod :type remote_path: str :param permissions: permissions string, for example u+x :type permissions: str
copy#
DatasetApi.copy(source_path, destination_path, overwrite=False)
Copy a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to copy - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
delete#
DatasetApi.delete(path, name)
Delete a tag.
Tag names are unique identifiers.
:param path: path to delete the tags :type path: str :param name: name of the tag to be removed :type name: str
download#
DatasetApi.download(path, local_path=None, overwrite=False, chunk_size=1048576)
Download file from Hopsworks Filesystem to the current working directory.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
- path
str
: path in Hopsworks filesystem to the file - local_path
str | None
: path where to download the file in the local filesystem - overwrite
bool | None
: overwrite local file if exists - chunk_size
int
: upload chunk size in bytes. Default 1 MB
Returns
str
: Path to downloaded file
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
exists#
DatasetApi.exists(path)
Check if a file exists in the Hopsworks Filesystem.
Arguments
- path
str
: path to check
Returns
bool
: True if exists, otherwise False
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
get#
DatasetApi.get(path)
Get dataset.
:param path: path to check :type path: str :return: dataset metadata :rtype: dict
get_tags#
DatasetApi.get_tags(path, name=None)
Get the tags.
Gets all tags if no tag name is specified.
:param path: path to get the tags :type path: str :param name: tag name :type name: str :return: dict of tag name/values :rtype: dict
list#
DatasetApi.list(remote_path, sort_by=None, offset=0, limit=1000)
List all files in a directory in datasets.
:param remote_path: path to list :type remote_path: str :param sort_by: sort string :type sort_by: str :param limit: max number of returned files :type limit: int
list_files#
DatasetApi.list_files(path, offset, limit)
mkdir#
DatasetApi.mkdir(path)
Create a directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.mkdir("Resources/my_dir")
- path
str
: path to directory
Returns
str
: Path to created directory
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
move#
DatasetApi.move(source_path, destination_path, overwrite=False)
Move a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
- source_path
str
: the source path to move - destination_path
str
: the destination path - overwrite
bool
: overwrite destination if exists
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
path_exists#
DatasetApi.path_exists(remote_path)
Check if a path exists in datasets.
:param remote_path: path to check :type remote_path: str :return: boolean whether path exists :rtype: bool
read_content#
DatasetApi.read_content(path, dataset_type="DATASET")
remove#
DatasetApi.remove(path)
Remove a path in the Hopsworks Filesystem.
Arguments
- path
str
: path to remove
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
rm#
DatasetApi.rm(remote_path)
Remove a path in the Hopsworks Filesystem.
Arguments
- remote_path
str
: path to remove
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
unzip#
DatasetApi.unzip(remote_path, block=False, timeout=120)
Unzip an archive in the dataset.
Arguments
- remote_path
str
: path to file or directory to unzip. - block
bool
: if the operation should be blocking until complete, defaults to False. - timeout
int | None
: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.
Returns
bool
: whether the operation completed in the specified timeout; if non-blocking, always returns True.
upload#
DatasetApi.upload(
local_path,
upload_path,
overwrite=False,
chunk_size=10485760,
simultaneous_uploads=3,
simultaneous_chunks=3,
max_chunk_retries=1,
chunk_retry_interval=1,
)
Upload a file or directory to the Hopsworks filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
- local_path
str
: local path to file or directory to upload, can be relative or absolute - upload_path
str
: path to directory where to upload the file in Hopsworks Filesystem - overwrite
bool
: overwrite file or directory if exists - chunk_size
int
: upload chunk size in bytes. Default 10 MB - simultaneous_chunks
int
: number of simultaneous chunks to upload for each file upload. Default 3 - simultaneous_uploads
int
: number of simultaneous files to be uploaded for directories. Default 3 - max_chunk_retries
int
: maximum retry for a chunk. Default is 1 - chunk_retry_interval
int
: chunk retry interval in seconds. Default is 1sec
Returns
str
: Path to uploaded file or directory
Raises
hopsworks.client.exceptions.RestAPIError
: If the backend encounters an error when handling the request
upload_feature_group#
DatasetApi.upload_feature_group(feature_group, path, dataframe)
zip#
DatasetApi.zip(remote_path, destination_path=None, block=False, timeout=120)
Zip a file or directory in the dataset.
Arguments
- remote_path
str
: path to file or directory to unzip. - destination_path
str | None
: path to upload the zip, defaults to None. - block
bool
: if the operation should be blocking until complete, defaults to False. - timeout
int | None
: timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.
Returns
bool
: whether the operation completed in the specified timeout; if non-blocking, always returns True.