Contributing
Python development setup#
-
Fork and clone the repository
-
Create a new Python environment with your favourite environment manager (e.g. virtualenv or conda) and Python 3.9 (newer versions will return a library conflict in
auto_doc.py
) -
Install repository in editable mode with development dependencies:
cd python
pip install -e ".[python,dev]"
- Install pre-commit and then activate its hooks. pre-commit is a framework for managing and maintaining multi-language pre-commit hooks. The Feature Store uses pre-commit to ensure code-style and code formatting through ruff. Run the following commands from the
python
directory:
cd python
pip install --user pre-commit
pre-commit install
Afterwards, pre-commit will run whenever you commit.
- To run formatting and code-style separately, you can configure your IDE, such as VSCode, to use
ruff
, or run it via the command line:
# linting
ruff check python --fix
# formatting
ruff format python
Python documentation#
We follow a few best practices for writing the Python documentation:
- Use the google docstring style:
"""[One Line Summary]
[Extended Summary]
[!!! example
import xyz
]
# Arguments
arg1: Type[, optional]. Description[, defaults to `default`]
arg2: Type[, optional]. Description[, defaults to `default`]
# Returns
Type. Description.
# Raises
Exception. Description.
"""
If Python 3 type annotations are used, they are inserted automatically.
- Feature store entity engine methods (e.g. FeatureGroupEngine etc.) only require a single line docstring.
- REST Api implementations (e.g. FeatureGroupApi etc.) should be fully documented with docstrings without defaults.
- Public Api such as metadata objects should be fully documented with defaults.
Setup and Build Documentation#
We use mkdocs
together with mike
(for versioning) to build the documentation and a plugin called keras-autodoc
to auto generate Python API documentation from docstrings.
Background about mike
: mike
builds the documentation and commits it as a new directory to the gh-pages branch. Each directory corresponds to one version of the documentation. Additionally, mike
maintains a json in the root of gh-pages with the mappings of versions/aliases for each of the directories available. With aliases you can define extra names like dev
or latest
, to indicate stable and unstable releases.
- Currently we are using our own version of
keras-autodoc
pip install git+https://github.com/logicalclocks/keras-autodoc
- Install HSFS with
docs
extras:
pip install -e ".[python,dev,docs]"
- To build the docs, first run the auto doc script:
cd ..
python auto_doc.py
Option 1: Build only current version of docs#
- Either build the docs, or serve them dynamically:
Note: Links and pictures might not resolve properly later on when checking with this build. The reason for that is that the docs are deployed with versioning on docs.hopsworks.ai and therefore another level is added to all paths, e.g. docs.hopsworks.ai/[version-or-alias]
. Using relative links should not be affected by this, however, building the docs with version (Option 2) is recommended.
mkdocs build
# or
mkdocs serve
Option 2 (Preferred): Build multi-version doc with mike
#
Versioning on docs.hopsworks.ai#
On docs.hopsworks.ai we implement the following versioning scheme:
- current master branches (e.g. of hsfs corresponding to master of Hopsworks): rendered as current Hopsworks snapshot version, e.g. 2.2.0-SNAPSHOT [dev], where
dev
is an alias to indicate that this is an unstable version. - the latest release: rendered with full current version, e.g. 2.1.5 [latest] with
latest
alias to indicate that this is the latest stable release. - previous stable releases: rendered without alias, e.g. 2.1.4.
Build Instructions#
-
For this you can either checkout and make a local copy of the
upstream/gh-pages
branch, wheremike
maintains the current state of docs.hopsworks.ai, or just build documentation for the branch you are updating:Building one branch:
Checkout your dev branch with modified docs:
git checkout [dev-branch]
Generate API docs if necessary:
python auto_doc.py
Build docs with a version and alias
mike deploy [version] [alias] --update-alias # for example, if you are updating documentation to be merged to master, # which will become the new SNAPSHOT version: mike deploy 2.2.0-SNAPSHOT dev --update-alias # if you are updating docs of the latest stable release branch mike deploy [version] latest --update-alias # if you are updating docs of a previous stable release branch mike deploy [version]
If no gh-pages branch existed in your local repository, this will have created it.
Important: If no previous docs were built, you will have to choose a version as default to be loaded as index, as follows
mike set-default [version-or-alias]
You can now checkout the gh-pages branch and serve:
git checkout gh-pages mike serve
You can also list all available versions/aliases:
mike list
Delete and reset your local gh-pages branch:
mike delete --all # or delete single version mike delete [version-or-alias]
Adding new API documentation#
To add new documentation for APIs, you need to add information about the method/class to document to the auto_doc.py
script:
PAGES = {
"connection.md": [
"hsfs.connection.Connection.connection"
]
"new_template.md": [
"module",
"xyz.asd"
]
}
Now you can add a template markdown file to the docs/templates
directory with the name you specified in the auto-doc script. The new_template.md
file should contain a tag to identify the place at which the API documentation should be inserted:
## The XYZ package
{{module}}
Some extra content here.
!!! example
```python
import xyz
```
{{xyz.asd}}
Finally, run the auto_doc.py
script, as decribed above, to update the documentation.
For information about Markdown syntax and possible Admonitions/Highlighting etc. see the Material for Mkdocs themes reference documentation.