Skip to content

Hopsworks Client#

Hopsworks Community Hopsworks Documentation python PyPiStatus Downloads Ruff License

hopsworks is the python API for interacting with a Hopsworks cluster. Don't have a Hopsworks cluster just yet? Register an account on Hopsworks Serverless and get started for free. Once connected to your project, you can: - Insert dataframes into the online or offline Store, create training datasets or serve real-time feature vectors in the Feature Store via the Feature Store API. Already have data somewhere you want to import, checkout our Storage Connectors documentation. - register ML models in the model registry and deploy them via model serving via the Machine Learning API. - manage environments, executions, kafka topics and more once you deploy your own Hopsworks cluster, either on-prem or in the cloud. Hopsworks is open-source and has its own Community Edition.

Our tutorials cover a wide range of use cases and example of what you can build using Hopsworks.

Getting Started On Hopsworks#

Once you created a project on Hopsworks Serverless and created a new Api Key, just use your favourite virtualenv and package manager to install the library:

pip install hopsworks

Fire up a notebook and connect to your project, you will be prompted to enter your newly created API key:

import hopsworks

project = hopsworks.login()

Access the Feature Store of your project to use as a central repository for your feature data. Use your favourite data engineering library (pandas, polars, Spark, etc...) to insert data into the Feature Store, create training datasets or serve real-time feature vectors. Want to predict likelyhood of e-scooter accidents in real-time? Here's how you can do it:

fs = project.get_feature_store()

# Write to Feature Groups
bike_ride_fg = fs.get_or_create_feature_group(
  name="bike_rides", 
  version=1, 
  primary_key=["ride_id"], 
  event_time="activation_time",
  online_enabled=True,
)

fg.insert(bike_rides_df)

# Read from Feature Views
profile_fg = fs.get_feature_group("user_profile", version=1)

bike_ride_fv = fs.get_or_create_feature_view(
  name="bike_rides_view", 
  version=1, 
  query=bike_ride_fg.select_except(["ride_id"]).join(profile_fg.select(["age", "has_license"]), on="user_id")
)

bike_rides_Q1_2021_df = bike_ride_fv.get_batch_data(
  start_date="2021-01-01", 
  end_date="2021-01-31"
)

# Create a training dataset
version, job = bike_ride_fv.create_train_test_split(
    test_size=0.2,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

# Predict the probability of accident in real-time using new data + context data
bike_ride_fv.init_serving()

while True:
    new_ride_vector = poll_ride_queue()
    feature_vector = bike_ride_fv.get_online_feature_vector(
      {"user_id": new_ride_vector["user_id"]}, 
      passed_features=new_ride_vector
    )
    accident_probability = model.predict(feature_vector)

Or you can use the Machine Learning API to register models and deploy them for serving:

mr = project.get_model_registry()
# or
ms = project.get_model_serving()

Tutorials#

Need more inspiration or want to learn more about the Hopsworks platform? Check out our tutorials.

Documentation#

Documentation is available at Hopsworks Documentation.

Issues#

For general questions about the usage of Hopsworks and the Feature Store please open a topic on Hopsworks Community.

Please report any issue using Github issue tracking.

Contributing#

If you would like to contribute to this library, please see the Contribution Guidelines.