The Distributed LLM Service is accessible via WebUI and a notebook application. Each interface supports a range of models.
Use of each of these interfaces is documented below.
...
Inference requests can also be submitted to the LLM service via notebook by running the python-based API client in the notebook environment and then providing the request and model details. (*This is a Colab notebook; login to a Google account is required to run.)
- Click the step 1 "play" button in order to load the API client into the notebook.
- When the first step has completed, enter an "inference_request" and choose a model from the drop-down list.
- Click the step 2 "play" button and the request will be submitted to the distributed network for inference.
...