The GridRepublic LLM Service is powered by a distributed network of LLM servers.
The core server, available as a Docker application, provides a local interactive inference service, via simple command line interface. A range of models are supported.
System requirements
- Docker Engine must be installed and running on the target system
- Network connectivity is needed so that the Docker image and any specified models can be downloaded automatically
- Disk space and RAM:
- The application container itself requires 600 MB of disk space and will use 400 MB of system memory to run.
- In addition to the above, the model selected will have its own resource requirements. See the instructions on Choosing a model for further details.
Installation and basic usage
The server application can be launched with a single command:
$ docker run -it --rm gridrepublic/llm-server
This will fetch the latest version of the application and it will begin listening for inference requests and serving them as they are received.
In most cases, caching will provide significant benefits; see Caching models locally for details.
For testing purposes, the application can also be launched in "task mode", which provides a local command prompt for requests rather than serving requests from the network:
$ docker run -it --rm -e TASK_MODE=1 gridrepublic/llm-server
By default, the server will download the Zephyr 7B model upon launch. Once the model has been pulled and initialized, a prompt will appear and wait for input:
>>>
Choosing a model
The Zephyr 7B model that is used by default requires a bit over 4 GB of disk space and at least 4.5 GB of RAM to run (in addition to the application container requirements, as outlined above). Depending on the resources available, smaller, lighter weight models and larger, more performant models can also be used.
A variety of supported models can be found at: https://ollama.com/library
As a general rule, systems should have more disk space and memory available in bytes than the number of parameters of a given model; e.g. to run a model of 7 billion parameters, use a system with 8 GB of disk space and RAM.
The model that the llm-server container will use is determined by the MODEL
environment variable that is set when the container is first launched via "docker run". To set this to another model that is found in the library, add the following parameter to the docker command:
-e MODEL="[name]"
For example, to use the lightweight gemma 2B model:
$ docker run -it --rm -e TASK_MODE=1 -e MODEL="gemma:2b" gridrepublic/llm-server
Caching models locally
To avoid pulling the same models over and over on subsequent runs of the LLM server on a local system, it is valuable to establish a cache directory on the local system that will be used to store models. To do this, create an empty directory on the system, e.g. ~/llm-server/cache
, and mount it as a volume at /srv/gr
when running the docker image:
$ docker run -it --rm -v ~/llm-server/cache:/srv/gr gridrepublic/llm-server
The LLM server will then store all pulled models in that cache directory, which future instances launched via docker run will likewise find (as long as that same volume mount option is provided).
Examples
Using the default Zephyr model
To launch a the application container for inference using the Zephyr model, with local caching of the model:
$ docker run -it --rm -v ~/llm-server/cache:/srv/gr -e TASK_MODE=1 gridrepublic/llm-server >>> What is the wavelength of blue light in nanometers? The wavelength of blue light can vary, but it typically falls within the range of 450 to 495 nanometers (nm) in vacuum. In a medium like air or water, the wavelength is slightly longer due to refraction. This range of wavelengths corresponds to the color that we perceive as blue in visible light. >>> /show info Model details: Family llama Parameter Size 7B Quantization Level Q4_0 >>> /bye
Using the lightweight Gemma 2B model
To use the Gemma 2B model:
$ docker run -it --rm -v ~/llm-server/cache:/srv/gr -e TASK_MODE=1 -e MODEL="gemma:2b" gridrepublic/llm-server pulling manifest pulling c1864a5eb193... 26% ||||| | 432 MB/1.7 GB 10 MB/s 1m57s ... success >>> What is the wavelength of blue light in nanometers? The wavelength of blue light is approximately 400-500 nanometers. >>> /show info Model details: Family gemma Parameter Size 3B Quantization Level Q4_0 >>> /bye