Query Engine (Trino)#
The Query Engine in Hopsworks is powered by Trino, a distributed SQL query engine that allows you to run interactive analytics on your data. Use it to explore feature groups, run ad-hoc queries, and analyze data across your project.
Accessing the Query Engine#
Navigate to the Query Engine from your project's left sidebar. The Query Engine interface provides access to the SQL runner, cluster information, and query history.
SQL Runner#
The SQL runner is where you write and execute SQL queries against your data.
To run a query:
- Write your SQL query in the editor
- Select the database/catalog you want to query
- Click "Run" to execute the query
- View results in the table below the editor
The SQL runner supports standard SQL syntax and provides auto-completion for databases, tables, and columns.
SQL Statement Syntax Help#
Need help with SQL syntax? Click the help icon in the SQL runner to access the complete reference of all allowed SQL statement syntax. This includes SELECT statements, functions, data types, operators, and more. The syntax reference is readily available without leaving the query interface.
Cluster Overview#
The cluster overview shows the health and status of your Trino cluster. Here you can monitor:
- Active workers: Number of workers currently processing queries
- Running queries: Queries currently being executed
- Resource utilization: CPU and memory usage across the cluster
- Worker status: Health status of individual worker nodes
This information helps you understand cluster performance and capacity.
Queries#
The Queries tab displays a history of all executed queries. For each query, you can see:
- Query ID: Unique identifier for the query
- Status: Completed, failed, or running
- Duration: How long the query took to execute
- User: Who submitted the query
- Timestamp: When the query was run
Click on any query to view detailed execution information.
Query Details#
Clicking on a query opens the detailed view with comprehensive execution information.
Overview#
The overview tab shows query metadata, execution timeline, and performance metrics including:
- Query text
- Execution time
- Data processed
- Rows returned
- Resource consumption
Live Plan#
The live plan visualizes the query execution plan in real-time, showing how Trino processes your query across different stages and operators.
Stages#
The stages view breaks down query execution into individual stages, showing:
- Stage dependencies
- Data flow between stages
- Resource usage per stage
- Execution time for each stage
This helps identify performance bottlenecks in complex queries.
Splits#
Splits show how Trino parallelizes query execution. Each split represents a portion of data processed by a worker. View split-level metrics to understand query parallelism and data distribution.
References#
The references tab lists all tables and data sources accessed by the query, helping you understand data dependencies.
JSON#
The JSON view provides the complete query execution plan and statistics in JSON format, useful for programmatic analysis or debugging.
Best Practices#
- Limit result sets: Use
LIMITclauses for exploratory queries to reduce resource usage - Filter early: Apply
WHEREclauses to reduce data scanned - Monitor query performance: Check the Queries tab to identify slow or failed queries
- Use the live plan: For complex queries, review the execution plan to optimize performance
- Check cluster status: Ensure adequate resources are available before running large queries