Inference requests on the Distributed LLM Service can be submitted and managed via API. This allows general development of user interfaces for interaction with the LLM service, such as a WebUI or notebook application, as well as integrations with other applications and platforms.

The API endpoints are documented below.

Authorization

API endpoints require authorization for use. This is accomplished over HTTP by setting a header with a valid token:

Authorization: Bearer [token]

For example, a cURL request would include the --header parameter similar to the following:

--header 'Authorization: Bearer example-token'

Endpoints

Jobs on the LLM service can be submitted and managed through API requests.

Submitting a job

Submitting inference jobs to the LLM service is accomplished through the jobs API endpoint:

https://api.gridrepublic.services/remotejobs/v2/jobs

This endpoint accepts a JSON payload via POST with the following parameters:

app // Name of the application; in this case, "gridrepublic:text-inference"
commandLine // JSON string of the prompt in an array named "inputs"; double-quotes must be escaped
tag // Name of the model to use for inference

The format of the JSON data is as follows, with the "inputs" and "tag" strings populated with a prompt and specific model name:

{
  "app": "gridrepublic:text-inference",
  "commandLine": "{\"inputs\": [\"When does 1+1=10?\"] }",
  "tag": "gemma:2b"
}

When the job is submitted, the API returns a success indicator and either an array of job "ids", when success is true, or a string indicating an "error", when success is false. For example:

{
  "success":true,
  "ids":["a9ab011455bb6aeb0161b5fc08766b42"]
}

If the submission fails, the response will include an error:

{
  "success":false,
  "error":"Invalid input format"
}

Retrieving job status

To retrieve the current status of a job that has been submitted to the LLM service, the jobs API endpoint accepts GET requests with a comma-separated list of one or more job IDs as a path parameter:

https://api.gridrepublic.services/remotejobs/v2/jobs/{ids}

In the following example, {ids} was replaced with 9f22472031ef57c3fd517061d116ad68; the output of the inference process is contained in the "log" property and is updated as the process runs:

{
  "success":true,
  "jobs":{
    "9f22472031ef57c3fd517061d116ad68":{
      "vmStatus":"running",
      "states":{
        "default":{
          "status":"running",
          "outputFiles":[],
          "log":"1+1=2. When you add numbers, the result",
          "commandLine":"{\"inputs\":[\"How can 1+1=10?\"]}",
          "app":"gridrepublic:text-inference"
        }
      },
      "created":"2024-05-06T21:26:00+00:00",
      "copy":0,
      "tag":"llama2-uncensored",
      "runtime":23
    }
  }
}

Page tree

LLM Inference API

Authorization

Endpoints

Submitting a job

Retrieving job status