How To Run An Agent Job#
Introduction#
Agent Jobs are fire-and-forget AI agents that run as first-class Hopsworks jobs. You write a prompt, select a provider (Claude Code or OpenAI Codex), grant the agent a set of permissions, point it at the project resources it needs, and launch it. The agent runs to completion inside a Kubernetes pod, writes its results to HopsFS, and the pod terminates. No interactive UI, no WebSocket — the same lifecycle as a Python or Spark job.
Common use cases:
- Scheduled data-quality audits — point the agent at a Feature Group and ask it to look for null spikes, schema drift, or freshness issues each night
- Model evaluation — give it a Model + Deployment ref and have it grade recent predictions against a reference dataset
- Operational runbooks — turn the wiki page you'd hand a new on-call into a prompt the agent runs whenever a
LONG_RUNNINGalert fires on a pipeline - Ad-hoc exploration — fire one off from the CLI with
hops job startwhenever you'd otherwise open Jupyter for a one-shot question
Because Agent Jobs use the existing Jobs subsystem, you get the same scheduling, alerts, log capture, and execution history you do for any other Hopsworks job.
Prerequisites#
-
Feature flag — agent jobs are gated cluster-wide. An admin must set:
UPDATE variables SET value='true' WHERE id='agent_jobs_enabled'; -
Authentication credentials for whichever provider you choose. The agent can pick these up from three sources (see Authentication below) — the simplest is to log in to Claude or Codex once from an interactive Hopsworks Terminal session; the credentials persist into HopsFS and every subsequent Agent Job in any of your projects reuses them.
-
The
agent-jobPython environment (or a clone of it) — this is the runtime base image. It's installed by default; clone it in Project Settings → Python Environments if you need to layer additional libraries on top.
UI#
Step 1: Open Jobs and create a new Agent Job#
Click Jobs in the project sidebar, then New Job, and pick AGENT as the job type.
Step 2: Write the prompt#
The prompt is the task description — what you want the agent to do. Be explicit about the inputs (which Feature Group, which Model) and the expected output (a report file at ${AGENT_OUTPUT_PATH}/result.md, a Slack message, etc).
Inspect feature group `transactions` v3 for null spikes in the `amount` and
`merchant_id` columns over the last 7 days. Report findings as Markdown at
${AGENT_OUTPUT_PATH}/result.md. If you find a column with more than 1% nulls,
post a Slack alert.
Step 3: Pick the provider#
| Provider | When to use |
|---|---|
| Claude (default) | Most tasks; supports hooks, max-turns and max-budget controls. |
| Codex | When you want OpenAI's coding agent. Supports free-form CLI flag overrides via cliArgs. |
Codex-only fields (cliArgs) appear when you select Codex; Claude-only fields (maxTurns, maxBudgetUsd, hooks) are hidden in that case.
Step 4: Configure permissions#
Permissions are an explicit allowlist of tools the agent may invoke — they map directly to the underlying CLI's tool patterns. Pick a preset for most jobs:
| Preset | Effect |
|---|---|
READ_ONLY (default) | Inspect feature groups, feature views, models — no writes. |
OPERATOR | Run any hops CLI command + python *, plus write under /hopsfs/*. |
FULL | A curated allowlist of common shell commands (ls, cat, grep, find, curl, wget, echo, mkdir, cp, mv), plus Read(*) and Write(/hopsfs/*). Not unrestricted shell access — pick Custom if you need that. |
Custom | A list of patterns you provide explicitly (e.g. Bash(hops fg info *), Read(*), Write(/hopsfs/Resources/*)). |
The agent runs with project-scoped JWT auth, so any data access still goes through normal Hopsworks ACLs — permissions just bound what the agent is allowed to attempt.
Step 5: Attach resource references (optional)#
Refs inject Hopsworks resource metadata into the agent pod as JSON files under /context/. The agent can cat these to know which resources to operate on without you having to spell every detail out in the prompt:
| Ref type | Required version? | Context file |
|---|---|---|
feature_group | yes | /context/fg_<name>_v<version>.json |
feature_view | yes | /context/fv_<name>_v<version>.json |
model | yes | /context/model_<name>_v<version>.json |
deployment | no | /context/deployment_<name>.json |
job | no | /context/job_<name>.json |
Refs are resolved server-side at launch time by the platform — you don't need to write any glue code.
Step 6: Environment variables (optional)#
Use this card to supply credentials and other env vars the agent needs at runtime. Vars set here are merged on top of:
- Platform-injected vars (project ID, JWT path, HopsFS user home, …) — these are reserved and rejected if you try to set them
- AI-provider secrets (from your AI provider secrets table, if any)
- Your account-level env vars (visible across all your runtimes — manage them under Account → Environment variables)
So a per-job ANTHROPIC_API_KEY overrides the account-level value, which in turn overrides anything in the secrets table.
Reserved-name prefixes that you may not set: HOPS_, HOPSWORKS_, HOPSFS_, AGENT_. The platform injects several AGENT_* variables that you can read from your prompt — most usefully AGENT_OUTPUT_PATH (where your final artifact must be written). They are reserved for the platform, so you can't override them, but referencing them inside the prompt (e.g. ${AGENT_OUTPUT_PATH}/result.md) is the intended pattern.
The platform also injects SHARED_DATASETS_DIR pointing at the project's shared-datasets folder under /hopsfs/, so prompts can refer to it as ${SHARED_DATASETS_DIR}/... without hard-coding the path.
Step 7: Resources, environment, alerts#
The bottom of the form is shared with other job types:
- Environment — defaults to
agent-job. Pick a clone if you've added Python libraries on top. - Resource configuration — CPU cores, memory, GPU count. Default is 0.5 cores / 1024 MB / 0 GPU; bump this for heavier tasks.
- Schedule — same as other jobs. Cron expressions or interval runs.
- Alerts — FINISHED / FAILED / KILLED / LONG_RUNNING, routed to a Slack / webhook / email receiver. Setting
passToAgent: trueon a Slack alert injectsSLACK_WEBHOOK_URLandSLACK_CHANNELinto the agent pod's env so the agent can post during execution via the bundledslack-posthelper.
Step 8: Run it#
Click Save, then Run. The execution lands in the same Executions table as Python and Spark jobs. Click an execution to see its state, logs, and output.
Authentication#
The agent pod resolves provider credentials in this priority order:
- Reused terminal-login credentials. If you've opened a Hopsworks Terminal in any of your projects and run
claude login(orcodex login), the credentials persist in HopsFS at/hopsfs/Users/<you>/.claude/.credentials.jsonand/hopsfs/Users/<you>/.codex/auth.json. The agent pod symlinks~/.claudeand~/.codexto those paths automatically — no API key needed. ANTHROPIC_API_KEY/OPENAI_API_KEYfrom env. If terminal credentials are missing or invalid, the agent falls back to these env vars. Set them under Account → Environment variables (applies to all your jobs) or in this job's Environment variables section (applies just to this job).- AI provider secrets table. If you've configured AI providers under Project Settings, those keys are also injected. Per-job env vars override them.
Most users only need to do step 1 once.
CLI#
You can also create and run agent jobs via the hops CLI:
# Create from a JSON config
cat > my-agent.json <<EOF
{
"type": "agentJobConfiguration",
"appName": "data-quality-check",
"prompt": "Check all feature groups for nulls and report findings",
"model": "claude-sonnet-4-5",
"maxTurns": 25,
"permissionPreset": "READ_ONLY",
"refs": [
{"type": "feature_group", "name": "transactions", "version": 3}
],
"envVars": [
"PYTHONUNBUFFERED=1"
]
}
EOF
hops job create -f my-agent.json
hops job start data-quality-check
hops job logs data-quality-check --tail
The hops CLI is available inside the agent pod too, so the agent can launch or query other jobs as long as you grant it Bash(hops *).
REST API#
Agent Jobs use the existing Jobs REST API; the only difference is the configuration payload, where type is "agentJobConfiguration" and the fields are:
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
type | string | yes | — | Must be "agentJobConfiguration" |
appName | string | yes | — | Job name |
prompt | string | yes | — | The task |
provider | string | no | "claude" | "claude" or "codex" |
cliArgs | string | no | — | Extra codex CLI flags (codex only) |
model | string | no | claude-sonnet-4-5 | Provider-specific model ID |
maxTurns | int | no | 50 | Claude only |
maxBudgetUsd | float | no | — | Claude only, max 100 |
permissionPreset | string | no | READ_ONLY | READ_ONLY / OPERATOR / FULL |
permissions | string[] | no | — | Custom — overrides preset if set |
refs | AgentRef[] | no | — | Resource refs (see Step 5) |
hooks | AgentHook[] | no | — | Pre/post tool-use hooks (claude only) |
skillsRef | string | no | — | Path to a .md skills file on HopsFS |
envVars | string[] | no | — | "KEY=VALUE" strings |
environmentName | string | no | agent-job | Must be agent-job or a clone of it |
resourceConfig | object | no | default | { cores, memory, gpus } |
outputPath | string | no | auto | Where to write results in HopsFS |
Full reference: see the [HWORKS-2667] PR description and the docs/features/agent-jobs/ directory in hopsworks-ee for the developer-side spec.
Output#
By default the agent writes to:
/hopsfs/Resources/jobs/<job-name>/<execution-id>/
├── result.md # primary, human-readable result — rendered by the UI
├── metadata.json # { "exit_code": 0, "completed_at": "<ISO8601>" }
└── (any other files the agent chose to write)
The agent's stdout and stderr are captured by the standard execution-log path, visible via the Logs button on the execution row.
Limitations#
- Non-interactive only. The agent cannot ask follow-up questions — the prompt is the whole input. If you need interactive iteration, use a Terminal session instead.
maxTurnsis a hard safety net (claude only) — set it high enough that reasonable runs don't trip it. Default 50.- The image must be
agent-jobor a clone — other Python environments don't ship the Claude/Codex CLIs and will be rejected at job-create time. - Reserved env-var prefixes — see Step 6;
HOPS_,HOPSWORKS_,HOPSFS_, andAGENT_are owned by the platform.