The Distributed LLM Service is accessible via WebUI and a notebook application. Each interface supports a range of models.

Use of these interfaces is documented below.



WebUI

The WebUI for distributed inference allows the user to select up to three supported models, enter a prompt, and run inference.

  1. Choose your models: select preferred models (up to 3 at a time), via the checkboxes
  2. Enter your prompt (plain text only)
  3. Click the "Process" button to submit a job to the network



The submitted job will be picked up by inference servers running on the distributed network. If multiple models are selected, the job will be run in parallel on available resources, so results from the different models will print to the page concurrently.

Notebook application

Inference requests can also be submitted to the LLM service via notebook by running the python-based API client in the notebook environment and then providing the request and model details. (*This is a Colab notebook; login to a Google account is required to run.)

  1. Click the step 1 "play" button in order to load the API client into the notebook.
  2. When the first step has completed, enter an "inference_request" and choose a model from the drop-down list.
  3. Click the step 2 "play" button and the request will be submitted to the distributed network for inference.


An available LLM server will process the request and return the result right within the notebook. Additional requests can be made by changing the values for "inference_request" and "model" and then again clicking the step 2 "play" button.

The WebUI and notebook application rely on the LLM Inference API for communicating with the Distributed LLM Service. It is through this service and API that requests are connected with back-end inference servers participating in the distributed network.

  • No labels