Skip to content

hsfs.core.data_source #

DataSource #

Metadata object used to provide data source information.

You can obtain data sources using FeatureStore.get_data_source.

The DataSource class encapsulates the details of a data source that can be used for reading or writing data. It supports various types of sources, such as SQL queries, database tables, file paths, and storage connectors.

PARAMETER DESCRIPTION
query

SQL query string for the data source, if applicable.

TYPE: str | None DEFAULT: None

database

Name of the database containing the data source.

TYPE: str | None DEFAULT: None

group

Group or schema name for the data source.

TYPE: str | None DEFAULT: None

table

Table name for the data source.

TYPE: str | None DEFAULT: None

path

File system path for the data source.

TYPE: str | None DEFAULT: None

storage_connector

Storage connector object holds configuration for accessing the data source.

TYPE: sc.StorageConnector | dict[str, Any] | None DEFAULT: None

metrics

List of metric column names for the data source.

TYPE: list[str] | None DEFAULT: None

dimensions

List of dimension column names for the data source.

TYPE: list[str] | None DEFAULT: None

rest_endpoint

REST endpoint configuration for the data source.

TYPE: RestEndpointConfig | dict | None DEFAULT: None

query property writable #

query: str | None

Get or set the SQL query string for the data source.

database property writable #

database: str | None

Get or set the database name for the data source.

group property writable #

group: str | None

Get or set the group/schema name for the data source.

table property writable #

table: str | None

Get or set the table name for the data source.

path property writable #

path: str | None

Get or set the file system path for the data source.

storage_connector property writable #

storage_connector: sc.StorageConnector | None

Get or set the storage connector for the data source.

get_databases #

get_databases() -> list[str]

Retrieve the list of available databases.

Example
# connect to the Feature Store
fs = ...

data_source = fs.get_data_source("test_data_source")

databases = data_source.get_databases()
RETURNS DESCRIPTION
list[str]

A list of database names available in the data source.

get_tables #

get_tables(database: str | None = None) -> list[DataSource]

Retrieve the list of tables from the specified database.

Example
# connect to the Feature Store
fs = ...

data_source = fs.get_data_source("test_data_source")

tables = data_source.get_tables()
PARAMETER DESCRIPTION
database

The name of the database to list tables from. If not provided, the default database is used.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
list[DataSource]

A list of DataSource objects representing the tables.

get_data #

get_data(use_cached: bool = True) -> dsd.DataSourceData

Retrieve the data from the data source.

Example
# connect to the Feature Store
fs = ...

table = fs.get_data_source("test_data_source").get_tables()[0]

data = table.get_data()
PARAMETER DESCRIPTION
use_cached

Whether to use cached data if available. Only supported for CRM and REST connectors. Defaults to True.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
dsd.DataSourceData

An object containing the data retrieved from the data source.

get_metadata #

get_metadata() -> dict

Retrieve metadata information about the data source.

Example
# connect to the Feature Store
fs = ...

table = fs.get_data_source("test_data_source").get_tables()[0]

metadata = table.get_metadata()
RETURNS DESCRIPTION
dict

A dictionary containing metadata about the data source.

get_feature_groups_provenance #

get_feature_groups_provenance() -> Links | None

Get the generated feature groups using this data source, based on explicit provenance.

These feature groups can be accessible or inaccessible. Explicit provenance does not track deleted generated feature group links, so deleted will always be empty. For inaccessible feature groups, only a minimal information is returned.

RETURNS DESCRIPTION
Links | None

The feature groups generated using this data source or None if none were created.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

In case the backend encounters an issue.

get_feature_groups #

get_feature_groups() -> list[fg.FeatureGroup]

Get the feature groups using this data source, based on explicit provenance.

Only the accessible feature groups are returned. For more items use the base method, DataSource.get_feature_groups_provenance.

RETURNS DESCRIPTION
list[fg.FeatureGroup]

List of feature groups.

get_training_datasets_provenance #

get_training_datasets_provenance() -> Links

Get the generated training datasets using this data source, based on explicit provenance.

These training datasets can be accessible or inaccessible. Explicit provenance does not track deleted generated training dataset links, so deleted will always be empty. For inaccessible training datasets, only a minimal information is returned.

RETURNS DESCRIPTION
Links

The training datasets generated using this data source or None if none were created.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

In case the backend encounters an issue.

get_training_datasets #

get_training_datasets() -> list[TrainingDataset]

Get the training datasets using this data source, based on explicit provenance.

Only the accessible training datasets are returned. For more items use the base method, get_training_datasets_provenance.

RETURNS DESCRIPTION
list[TrainingDataset]

List of training datasets.