The Distributed LLM Service is accessible via WebUI, a notebook application, and an API. All interfaces support a range of models.
Use of each of these endpoints is documented below.
The WebUI allows the user to select up to three supported models, enter a prompt, and run inference.
Inference requests can alsp be submitted to the LLM service via notebook by running the python-based API client in the notebook environment and then providing the request and model details. An available LLM server will process the request and return the result.
Submitting inference jobs to the LLM service is accomplished through the jobs
API endpoint:
|
This endpoint accepts a JSON payload via POST with the following parameters:
app // Name of the application; in this case, "charityengine:text-inference" commandLine // JSON string of the prompt in an array named "input" tag // Name of the model to use for inference |
The format of the JSON data is as follows, with the "input" and "tag" strings populated with a prompt and specific model name:
{ "app": "charityengine:text-inference", "commandLine": "{'input': ['When does 1+1=10?'] }", "tag": "gemma:2b" } |
The WebUI and notebook application both use this API.
TODO: