The GridRepublic LLM Service is accessible via API, WebUI, and a notebook application.

The WebUI and notebook application both allow interaction with connected LLM servers, which power the inference service. These interfaces accept text as input for inference and support a range of models.

WebUI

When using the WebUI, up to three models can be selected to run inference on a given prompt. Check the box for any model, enter the prompt to use for input, and click the Process button to submit a job for LLM servers.

GR-Public > LLM WebUI and Notebook > distributed-inference-webui.png

Notebook application

Inference requests can be submitted to the LLM service via notebook by running the python-based API client in the notebook environment and then providing the request and model details. An available LLM server will process the request and return the result.

GR-Public > LLM WebUI and Notebook > distributed-inference-notebook.png

API

Submitting inference jobs to the LLM service is accomplished through the jobs API endpoint:

https://api.charityengine.services/remotejobs/v2/jobs

This endpoint accepts a JSON payload via POST with the following parameters:

app // Name of the application; in this case, "charityengine:text-inference"
commandLine // JSON string of the prompt in an array named "input"
tag // Name of the model to use for inference

The format of the JSON data is as follows, with the "input" and "tag" strings populated with a prompt and specific model name:

{
  "app": "charityengine:text-inference",
  "commandLine": "{'input': ['When does 1+1=10?'] }",
  "tag": "gemma:2b"
}

TODO: Specify the response format from GET jobs/{id}

WebUI and notebook application both use this API .