The Distributed LLM Service is accessible via WebUI , and a notebook application, and an API. All interfaces support . Each interface supports a range of models.

Use of each of these endpoints interfaces is documented below.

Table of Contents

outline	true
class	toc

WebUI

The WebUI for distributed inference allows the user to select up to three supported models, enter a prompt, and run inference.

Choose your models: select preferred models (up to 3 at a time), via the checkboxes
Enter your prompt (plain text only)
Click the "Process" button to submit a job to the network

The submitted job will be picked up by inference servers running on the distributed network. If multiple models are selected, the job will be run in parallel on available resources, so results from the different models will print to the page concurrently.

Notebook application

Inference requests can alsp also be submitted to the LLM service via notebook by running the python-based API client in the notebook environment and then providing the request and model details. An available LLM server will process the request and return the result.

Image Removed

API

The WebUI and notebook application both use a job submission API to interact with the LLM service. Other such interfaces can also be developed to integrate the LLM service with other applications and platforms.

Submit a job

Submitting inference jobs to the LLM service is accomplished through the jobs API endpoint:

https://api.gridrepublic.services/remotejobs/v2/jobs

This endpoint accepts a JSON payload via POST with the following parameters:

Code Block

language	js

app // Name of the application; in this case, "gridrepublic:text-inference"
commandLine // JSON string of the prompt in an array named "inputs"; double-quotes must be escaped
hours // Runtime limit for the job; by default, use: 1
tag // Name of the model to use for inference

The format of the JSON data is as follows, with the "inputs" and "tag" strings populated with a prompt and specific model name:

Code Block

language	text

{
  "app": "gridrepublic:text-inference",
  "commandLine": "{\"inputs\": [\"When does 1+1=10?\"] }",
  "hours": 1,
  "tag": "gemma:2b"
}

When the job is submitted, the API returns a success indicator and either an array of job "ids", when success is true, or a string indicating an "error", when success is false. For example:

Code Block

language	text

{
  "success":true,
  "ids":["a9ab011455bb6aeb0161b5fc08766b42"]
}

If the submission fails, the response will include an error:

Code Block

language	text

{
  "success":false,
  "error":"Invalid input format"
}

Get job status

To retrieve the current status of a job that has been submitted to the LLM service, the jobs API endpoint accepts GET requests with a comma-separated list of one or more job IDs as a path parameter:

https://api.gridrepublic.services/remotejobs/v2/jobs/{ids}

In the following example, {ids} was replaced with 9f22472031ef57c3fd517061d116ad68; the output of the inference process is contained in the "log" property and is updated as the process runs:

(*This is a Colab notebook; login to a Google account is required to run.)

Click the step 1 "play" button in order to load the API client into the notebook.
When the first step has completed, enter an "inference_request" and choose a model from the drop-down list.
Click the step 2 "play" button and the request will be submitted to the distributed network for inference.

Image Added

An available LLM server will process the request and return the result right within the notebook. Additional requests can be made by changing the values for "inference_request" and "model" and then again clicking the step 2 "play" button.

Info

The WebUI and notebook application rely on the LLM Inference API for communicating with the Distributed LLM Service. It is through this service and API that requests are connected with back-end inference servers participating in the distributed network.

Code Block

language	text

{ "success":true, "jobs":{ "9f22472031ef57c3fd517061d116ad68":{ "vmStatus":"running", "states":{ "default":{ "status":"running", "outputFiles":[], "log":"1+1=2. When you add numbers, the result", "commandLine":"{\"inputs\":[\"How can 1+1=10?\"]}", "app":"gridrepublic:text-inference" } }, "created":"2024-05-06T21:26:00+00:00", "copy":0, "tag":"llama2-uncensored", "runtime":23 } } }

Page tree

Versions Compared

Old Version 11

New Version Current

Key

WebUI

Notebook application

API

Submit a job

Get job status

Page tree

Page History

Versions Compared

Old Version 11

New Version Current

Key

WebUI

Notebook application

API

Submit a job

Get job status