Hopsworks Client#
hopsworks is the python API for interacting with a Hopsworks cluster. Don't have a Hopsworks cluster just yet? Register an account on Hopsworks Serverless and get started for free. Once connected to your project, you can: - Insert dataframes into the online or offline Store, create training datasets or serve real-time feature vectors in the Feature Store via the Feature Store API. Already have data somewhere you want to import, checkout our Storage Connectors documentation. - register ML models in the model registry and deploy them via model serving via the Machine Learning API. - manage environments, executions, kafka topics and more once you deploy your own Hopsworks cluster, either on-prem or in the cloud. Hopsworks is open-source and has its own Community Edition.
Our tutorials cover a wide range of use cases and example of what you can build using Hopsworks.
Getting Started On Hopsworks#
Once you created a project on Hopsworks Serverless and created a new Api Key, just use your favourite virtualenv and package manager to install the library:
pip install hopsworks
Fire up a notebook and connect to your project, you will be prompted to enter your newly created API key:
import hopsworks
project = hopsworks.login()
Access the Feature Store of your project to use as a central repository for your feature data. Use your favourite data engineering library (pandas, polars, Spark, etc...) to insert data into the Feature Store, create training datasets or serve real-time feature vectors. Want to predict likelyhood of e-scooter accidents in real-time? Here's how you can do it:
fs = project.get_feature_store()
# Write to Feature Groups
bike_ride_fg = fs.get_or_create_feature_group(
name="bike_rides",
version=1,
primary_key=["ride_id"],
event_time="activation_time",
online_enabled=True,
)
fg.insert(bike_rides_df)
# Read from Feature Views
profile_fg = fs.get_feature_group("user_profile", version=1)
bike_ride_fv = fs.get_or_create_feature_view(
name="bike_rides_view",
version=1,
query=bike_ride_fg.select_except(["ride_id"]).join(profile_fg.select(["age", "has_license"]), on="user_id")
)
bike_rides_Q1_2021_df = bike_ride_fv.get_batch_data(
start_date="2021-01-01",
end_date="2021-01-31"
)
# Create a training dataset
version, job = bike_ride_fv.create_train_test_split(
test_size=0.2,
description='Description of a dataset',
# you can have different data formats such as csv, tsv, tfrecord, parquet and others
data_format='csv'
)
# Predict the probability of accident in real-time using new data + context data
bike_ride_fv.init_serving()
while True:
new_ride_vector = poll_ride_queue()
feature_vector = bike_ride_fv.get_online_feature_vector(
{"user_id": new_ride_vector["user_id"]},
passed_features=new_ride_vector
)
accident_probability = model.predict(feature_vector)
Or you can use the Machine Learning API to register models and deploy them for serving:
mr = project.get_model_registry()
# or
ms = project.get_model_serving()
Tutorials#
Need more inspiration or want to learn more about the Hopsworks platform? Check out our tutorials.
Documentation#
Documentation is available at Hopsworks Documentation.
Issues#
For general questions about the usage of Hopsworks and the Feature Store please open a topic on Hopsworks Community.
Please report any issue using Github issue tracking.
Contributing#
If you would like to contribute to this library, please see the Contribution Guidelines.