Home

Enterprise Data

Feature Engineering

Frameworks

Storage connectors

Enterprise Feature Store

Govern & Monitor

Serve

Share & Re-use

Create

Project Based Collaboration

Write API

Feature Groups

External Feature Groups

Read API

Feature Views

Training Data

Feature Vectors

Search, Versioning, Statistics, Monitoring

Provenance & Lineage

Enterprise AI

MLOps

Experiments
& Model Training

Hopsworks is a data platform for ML with a Python-centric Feature Store and MLOps capabilities. Hopsworks is a modular platform. You can use it as a standalone Feature Store, you can use it to manage, govern, and serve your models, and you can even use it to develop and operate feature, training and inference pipelines. Hopsworks brings collaboration for ML teams, providing a secure, governed platform for developing, managing, and sharing ML assets - features, models, training data, batch scoring data, logs, and more.

Python-Centric Feature Store#

Hopsworks is widely used as a standalone Feature Store. Hopsworks breaks the monolithic model development pipeline into separate feature and training pipelines, enabling both feature reuse and better tested ML assets. You can develop features by building feature pipelines in any Python (or Spark or Flink) environment, either inside or outside Hopsworks. You can use the Python frameworks you are familiar with to build production feature pipelines. You can compute aggregations in Pandas, validate feature data with Great Expectations, reduce your data dimensionality with embeddings and PCA, test your feature logic and features end-to-end with PyTest, and transform your categorical and numerical features with Scikit-Learn, TensorFlow, and PyTorch. You can orchestrate your feature pipelines with your Python framework of choice, including Hopsworks' own Airflow support.

The Widest Feature Store Capabilities#

Hopsworks Feature Store also supports feature pipelines in PySpark, Spark, Flink, and SQL. Offline features can either be stored in Hopsworks, as Hudi tables on object storage, or in external data lakehouses (Snowflake, Databricks, Redshift, BigQuery, any JDBC-enabled platform) via External Feature Groups. Online features are served by RonDB, developed by Hopsworks as the lowest latency, highest throughput, highest availability data store for your features.

MLOps on Hopsworks#

Hopsworks provides model serving capabilities through KServe, with additional support for feature/prediction logging to Kafka (also part of Hopsworks), and secure, low-latency model deployments via Istio. Hopsworks also has a Model Registry for KServe, with support for versioning both models and model assets (such as KServe transformers). Hopsworks also includes a vector database to provide similarity search capabilities for embeddings, based on OpenSearch.

Project-based Multi-Tenancy and Team Collaboration#

Hopsworks provides projects as a secure sandbox in which teams can collaborate and share ML assets. Hopsworks' unique multi-tenant project model even enables sensitive data to be stored in a shared cluster, while still providing fine-grained sharing capabilities for ML assets across project boundaries. Projects can be used to structure teams so that they have end-to-end responsibility from raw data to managed features and models. Projects can also be used to create development, staging, and production environments for data teams. All ML assets support versioning, lineage, and provenance provide all Hopsworks users with a complete view of the MLOps life cycle, from feature engineering through model serving.

Development and Operations#

Hopsworks provides a FTI (feature/training/inference) pipeline architecture for ML systems. Each part of the pipeline is defined in a Hopsworks job which corresponds to a Jupyter notebook, a python script or a jar. The production pipelines are then orchestrated with Airflow which is bundled in Hopsworks. Hopsworks provides several python environments that can be used and customized for each part of the FTI pipeline, for example switching between using PyTorch or TensorFlow in the training pipeline. You can train models on as many GPUs as are installed in a Hopsworks cluster and easily share them among users. You can also run Spark, Spark Streaming, or Flink programs on Hopsworks. JupyterLab is also bundled which can be used to run Python and Spark interactively.

Available on any Platform#

Hopsworks is available to be installed on a kubernetes cluster in the cloud on AWS, Azure, and GCP, and On-Prem (Ubuntu/Redhat compatible), even in air-gapped data centers. Hopsworks is also available as a serverless platform that manages and serves both your features and models.

Join the community#

Ask questions and give us feedback in the Hopsworks Community
Follow us on Twitter
Check out all our latest product releases
Join our public slack-channel

Contribute#

We are building the most complete and modular ML platform available in the market, and we count on your support to continuously improve Hopsworks. Feel free to give us suggestions, report bugs and add features to our library anytime.

Open-Source#

Hopsworks is available under the AGPL-V3 license. In plain English this means that you are free to use Hopsworks and even build paid services on it, but if you modify the source code, you should also release back your changes and any systems built around it as AGPL-V3.

We're the best at what we do, and we strive to keep the same standard for our community! Our many thanks to the contributors of Hopsworks.