Skip to content


Query objects are strictly generated by HSFS APIs called on Feature Group objects. Users will never construct a Query object using the constructor of the class. For this reason we do not provide the full documentation of the class here.





Perform time travel on the given Query.

This method returns a new Query object at the specified point in time.


The wallclock_time needs to be a time included into the Hudi active timeline. By default Hudi keeps the last 20 to 30 commits in the active timeline. If you need to keep a longer active timeline, you can overwrite the options: hoodie.keep.min.commits and hoodie.keep.max.commits when calling the insert() method.

This can then either be read into a Dataframe or used further to perform joins or construct a training dataset.


  • wallclock_time: datatime.datetime,, unix timestamp in seconds (int), or string. The String should be formatted in one of the following formats %Y%m%d, %Y%m%d%H, %Y%m%d%H%M, or %Y%m%d%H%M%S.


Query. The query object with the applied time travel condition.




Apply filter to the feature group.

Selects all features and returns the resulting Query with the applied filter.

from hsfs.feature import Feature

query.filter(Feature("weekly_sales") > 1000)

If you are planning to join the filtered feature group later on with another feature group, make sure to select the filtered feature explicitly from the respective feature group:

query.filter(fg.feature1 == 1).show(10)

Composite filters require parenthesis:

query.filter((fg.feature1 == 1) | (fg.feature2 >= 2))


  • f Union[hsfs.constructor.filter.Filter, hsfs.constructor.filter.Logic]: Filter object.


Query. The query object with the applied filter.









Query.join(sub_query, on=[], left_on=[], right_on=[], join_type="inner", prefix=None)

Join Query with another Query.

If no join keys are specified, Hopsworks will use the maximal matching subset of the primary keys of the feature groups you are joining. Joins of one level are supported, no neted joins.


  • sub_query hsfs.constructor.query.Query: Right-hand side query to join.
  • on Optional[List[str]]: List of feature names to join on if they are available in both feature groups. Defaults to [].
  • left_on Optional[List[str]]: List of feature names to join on from the left feature group of the join. Defaults to [].
  • right_on Optional[List[str]]: List of feature names to join on from the right feature group of the join. Defaults to [].
  • join_type Optional[str]: Type of join to perform, can be "inner", "outer", "left" or "right". Defaults to "inner".
  • prefix Optional[str]: User provided prefix to avoid feature name clash. Prefix is applied to the right feature group of the query. Defaults to None.


Query: A new Query object representing the join.



Query.pull_changes(wallclock_start_time, wallclock_end_time)


read#, dataframe_type="default", read_options={})

Read the specified query into a DataFrame.

It is possible to specify the storage (online/offline) to read from and the type of the output DataFrame (Spark, Pandas, Numpy, Python Lists).

External Feature Group Engine Support

Spark only

Reading a Query containing an External Feature Group directly into a Pandas Dataframe using Python/Pandas as Engine is not supported, however, you can use the Query API to create Feature Views/Training Data containing External Feature Groups.


  • online Optional[bool]: Read from online storage. Defaults to False.
  • dataframe_type Optional[str]: DataFrame type to return. Defaults to "default".
  • read_options Optional[dict]: Optional dictionary with read options for Spark. Defaults to {}.


DataFrame: DataFrame depending on the chosen type.


show#, online=False)

Show the first N rows of the Query.


  • n int: Number of rows to show.
  • online Optional[bool]: Show from online storage. Defaults to False.