Query Engine (Trino)#

The Query Engine in Hopsworks is powered by Trino, a distributed SQL query engine that allows you to run interactive analytics on your data. Use it to explore feature groups, run ad-hoc queries, and analyze data across your project.

Accessing the Query Engine#

Navigate to the Query Engine from your project's left sidebar. The Query Engine interface provides access to the SQL runner, cluster information, and query history.

SQL Runner#

The SQL runner is where you write and execute SQL queries against your data.

To run a query:

Write your SQL query in the editor
Select the database/catalog you want to query
Click "Run" to execute the query
View results in the table below the editor

The SQL runner supports standard SQL syntax and provides auto-completion for databases, tables, and columns.

SQL Statement Syntax Help#

Need help with SQL syntax? Click the help icon in the SQL runner to access the complete reference of all allowed SQL statement syntax. This includes SELECT statements, functions, data types, operators, and more. The syntax reference is readily available without leaving the query interface.

Cluster Overview#

The cluster overview shows the health and status of your Trino cluster. Here you can monitor:

Active workers: Number of workers currently processing queries
Running queries: Queries currently being executed
Resource utilization: CPU and memory usage across the cluster
Worker status: Health status of individual worker nodes

This information helps you understand cluster performance and capacity.

Queries#

The Queries tab displays a history of all executed queries. For each query, you can see:

Query ID: Unique identifier for the query
Status: Completed, failed, or running
Duration: How long the query took to execute
User: Who submitted the query
Timestamp: When the query was run

Click on any query to view detailed execution information.

Query Details#

Clicking on a query opens the detailed view with comprehensive execution information.

Overview#

The overview tab shows query metadata, execution timeline, and performance metrics including:

Query text
Execution time
Data processed
Rows returned
Resource consumption

Live Plan#

The live plan visualizes the query execution plan in real-time, showing how Trino processes your query across different stages and operators.

Stages#

The stages view breaks down query execution into individual stages, showing:

Stage dependencies
Data flow between stages
Resource usage per stage
Execution time for each stage

This helps identify performance bottlenecks in complex queries.

Query details stages — Query details: stages

Splits#

Splits show how Trino parallelizes query execution. Each split represents a portion of data processed by a worker. View split-level metrics to understand query parallelism and data distribution.

Query details split — Query details: split

References#

The references tab lists all tables and data sources accessed by the query, helping you understand data dependencies.

JSON#

The JSON view provides the complete query execution plan and statistics in JSON format, useful for programmatic analysis or debugging.

Best Practices#

Limit result sets: Use LIMIT clauses for exploratory queries to reduce resource usage
Filter early: Apply WHERE clauses to reduce data scanned
Monitor query performance: Check the Queries tab to identify slow or failed queries
Use the live plan: For complex queries, review the execution plan to optimize performance
Check cluster status: Ensure adequate resources are available before running large queries