How To Run An Agent Deployment#

Introduction#

Agent Deployments are server-only KServe deployments with no model attached. You provide an entrypoint Python script that starts a REST server. Hopsworks exposes it behind an endpoint. Use Agent Deployments for interactive agents and LLM workflows. The service needs to stay up and answer requests.

If you need a fire-and-forget background run, use Agent Tasks instead.

Common use cases:

Interactive assistants - a chat or tool-using agent that stays online
LLM workflows - a deterministic sequence of retrieval, reasoning, and generation steps that you want to expose as a service
RAG-backed services - an agent that reads Feature Store context and uses it to answer questions

Where to find it in the UI#

In the project sidebar, go to Agents and then Agent Deployments. The list page shows the agent deployments in the project, along with the same type of detail page you use for model deployments.

Create and manage an agent deployment#

The most common way to create one is with the SDK or the CLI:

hops agent list
hops agent create my_agent.py --name my_agent --requirements requirements.txt --environment my_agent
hops agent start my_agent
hops agent query my_agent --data '{"prompt": "hello"}'
hops agent logs my_agent
hops agent info my_agent
hops agent stop my_agent
hops agent delete my_agent --yes

Use hops agent list first to confirm auth and serving are reachable.

import hopsworks


project = hopsworks.login()
ms = project.get_model_serving()

deployment = ms.deploy_agent(
    entry="my_agent.py",  # .py file or a dir with pyproject.toml
    name="my_agent",
    requirements="requirements.txt",
    environment="my_agent",
    upload_dir="Resources/agents",  # default
)
deployment.start(await_running=600)
print(deployment.predict(inputs={"prompt": "hello"}))
# After editing the code: re-create, then deployment.restart()

After creation, the deployment appears in the Agent Deployments list where you can inspect its status, logs, endpoints, and configuration.

Deploy from a Git repository#

An agent can be served from a Git repository instead of a project file. The repository is cloned every time the deployment starts, so a restart picks up whatever the branch points at.

Supported providers are GitHub, GitLab, and BitBucket. Configure the provider credentials once under project settings, see Configure a Git Provider.

deployment = ms.deploy_agent(
    entry="src/agent.py",  # path inside the repository
    name="my_agent",
    git_url="https://github.com/my-org/my-agent.git",
    git_provider="GitHub",
    git_branch="main",
    environment="my_agent",
)

entry is interpreted relative to the repository root rather than as a HopsFS path. If you leave git_branch unset, the clone follows the repository's default branch.

Auto-redeploy on new commits#

A Git-backed agent can roll itself onto the branch HEAD whenever a new commit is pushed:

deployment = ms.deploy_agent(
    entry="src/agent.py",
    name="my_agent",
    git_url="https://github.com/my-org/my-agent.git",
    git_provider="GitHub",
    git_branch="main",
    git_auto_redeploy=True,
    environment="my_agent",
)

Hopsworks polls the remote branch and rolls the deployment when it moves. The running version keeps serving requests until the new one is ready.

The flag only applies to Git-backed agents, and Hopsworks rejects it for an agent deployed from a project file. A stopped deployment is not rolled; it clones the branch HEAD on its next start.

The deployment's Artifact files card shows the repository, the branch, the commit it is running, and whether auto-redeploy is enabled. The entrypoint links to the file in the repository at that commit.

Small example#

The file below shows a simple agent program that uses LlamaIndex, FastAPI, and OpenTelemetry.

Set ANTHROPIC_API_KEY in the deployment environment. Hopsworks injects the OTEL_EXPORTER_OTLP_* environment variables for the deployment, so the OpenTelemetry exporter can stay configuration-free.

import asyncio
import os

import uvicorn
from fastapi import FastAPI
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.anthropic import Anthropic
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor


def add(a: float, b: float) -> float:
    """Adds two numbers."""
    return a + b


def subtract(a: float, b: float) -> float:
    """Subtracts two numbers."""
    return a - b


def multiply(a: float, b: float) -> float:
    """Multiplies two numbers."""
    return a * b


def divide(a: float, b: float) -> float:
    """Divides two numbers."""
    if b == 0:
        raise ValueError("Cannot divide by zero.")
    return a / b


def build_tracer_provider():
    endpoint = os.environ.get(
        "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT",
        "http://localhost:4318/v1/traces",
    )

    tracer_provider = trace_sdk.TracerProvider()
    tracer_provider.add_span_processor(
        SimpleSpanProcessor(OTLPSpanExporter(endpoint=endpoint))
    )
    return tracer_provider


class AgentPredictor:
    def __init__(self):
        self.tracer_provider = build_tracer_provider()

        LlamaIndexInstrumentor().instrument(tracer_provider=self.tracer_provider)

        llm = Anthropic(
            model="claude-haiku-4-5-20251001",
            max_tokens=1024,
            temperature=0.0,
        )

        tools = [
            FunctionTool.from_defaults(add),
            FunctionTool.from_defaults(subtract),
            FunctionTool.from_defaults(multiply),
            FunctionTool.from_defaults(divide),
        ]

        self.agent = ReActAgent(
            tools=tools,
            llm=llm,
        )

    async def _predict_async(self, inputs):
        prompt = inputs.get("prompt", "")
        result = await self.agent.run(prompt)
        return {"answer": str(result)}

    def predict(self, inputs):
        return asyncio.run(self._predict_async(inputs))


predictor = AgentPredictor()

agent_app = FastAPI()

FastAPIInstrumentor.instrument_app(
    agent_app,
    tracer_provider=predictor.tracer_provider,
)


@agent_app.post("/query")
def query(payload: dict):
    return predictor.predict(payload)


if __name__ == "__main__":
    uvicorn.run(agent_app, host="0.0.0.0", port=8080)

Tracing#

Agent Deployments can be configured with OpenTelemetry tracing.

When tracing is enabled, Hopsworks automatically provisions four online, Delta-backed feature groups in the project's Feature Store:

otel_spans - root spans and trace summary fields
otel_span_attributes - span attributes as key-value pairs
otel_events - span events
otel_event_attributes - event attributes as key-value pairs

The Traces UI reads from these feature groups, and the first traced deployment in a project creates them automatically if they do not already exist.

After that, choose one of these storage modes:

online - the default; writes traces to only online
offline - writes traces to only offline. You will no be able to see the traces summaries in the UI, but you can use the hopsworks-api to read the offline feature groups and reconstruct the traces from there.
both - export traces to both online and offline feature groups. This is the recommended option for production deployments, as it allows you to see the traces in the UI and also have them stored cost-effectively for long-term retention.

Next steps#

Scheduled, non-interactive coding agent: Agent Tasks
Model-backed online predictor: use the Model Deployments guides under MLOps
Agent-serving dependencies: see the environment guides for cloning Python environments and installing requirements