Hopsworks Model Serving REST API#

Introduction#

Hopsworks provides model serving capabilities by leveraging KServe as the model serving platform and Istio as the ingress gateway to the model deployments.

This document explains how to interact with a model deployment via REST API.

Base URL#

Deployed models are accessible through the Istio ingress gateway. The URL to interact with a model deployment is provided on the model deployment page in the Hopsworks UI.

The URL follows the format http://<ISTIO_GATEWAY_IP>/<RESOURCE_PATH>, where RESOURCE_PATH depends on the model server (e.g. vLLM, TensorFlow Serving, SKLearn ModelServer).

Authentication#

All requests must include an API Key for authentication. You can create an API by following this guide.

Include the key in the Authorization header:

Authorization: ApiKey <API_KEY_VALUE>

Headers#

Header	Description	Example Value
`Host`	Model’s hostname, provided in Hopsworks UI.	`fraud.test.hopsworks.ai`
`Authorization`	API key for authentication.	`ApiKey <your_api_key>`
`Content-Type`	Request payload type (always JSON).	`application/json`

Request Format#

The request format depends on the model sever being used.

For predictive inference (i.e. for Tensorflow or SkLearn or Python Serving). The request must be sent as a JSON object containing an inputs or instances field. You can find more information on the request format here. An example for this is given below.

PythonCurl

REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)

import requests

data = {
    "inputs": [
        [
            4641025220953719,
            4920355418495856
        ]
    ]
}

headers = {
    "Host": "fraud.test.hopsworks.ai",
    "Authorization": "ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp",
    "Content-Type": "application/json"
}

response = requests.post(
    "http://10.87.42.108/v1/models/fraud:predict",
    headers=headers,
    json=data
)
print(response.json())

REST API example for Predictive Inference (Tensorflow or SkLearn or Python Serving)

curl -X POST "http://10.87.42.108/v1/models/fraud:predict" \
  -H "Host: fraud.test.hopsworks.ai" \
  -H "Authorization: ApiKey 8kDOlnRlJU4kiV1Y.RmFNJY3XKAUSqmJZ03kbUbXKMQSHveSBgMIGT84qrM5qXMjLib7hdlfGeg8fBQZp" \
  -H "Content-Type: application/json" \
  -d '{
        "inputs": [
          [
            4641025220953719,
            4920355418495856
          ]
        ]
      }'

For generative inference (i.e vLLM) the response follows the OpenAI specification.

Response#

The model returns predictions in a JSON object. The response depends on the model server implementation. You can find more information regarding specific model servers in the Kserve documentation.