...
Table of Contents | ||||
---|---|---|---|---|
|
Prerequisites
Setting up the inference system will use M*DEL's Aurora model (aurora-m) and will rely on RunPod compute resources, similar to the model training process. If the prerequisites for Training an M*DEL Expert have already been completed, then continue to the next section.
Otherwise, follow the instructions in that document for setting up HuggingFace and RunPod accounts before continuing.
Launch the TGI Container Instance
A template exists at RunPod to launch a text generation inference (TGI) container for the Aurora model.
...
Start from the text-generation-inference template and set the organization Account, as appropriate, from the top-right profile drop-down list.
- Click the settings Filters icon and change "Allowed CUDA Versions" to 12.2 only.
Scroll down to the " Previous Generation " section and click Deploy on the 1xA100 80GB GPU. Click customize deployment"Customize Deployment":
- Update
HUGGING_FACE_HUB_TOKEN
with your read token from hf.co/settings/tokensUpdate "Container Start Command" to point to the expert /lorauploaded via the huggingface-cli during training; e.g.:--model-id stillerman/mtg-aurora
aurora-mathematica
- Expand "Environment Variables" and set
HUGGING_FACE_HUB_TOKEN
to your read token from huggingface.co/settings/tokens - Click "Set Overrides" and the then Continue → Deploy
- Update
When After the instance has finished starting started up, wait for the following line will to appear in its logContainer Logs (this might take approximately 5 minutes):
Code Block WARN text_generation_router: router/src/main.rs:327: Invalid hostname, defaulting to 0.0.0.0
Close the log and click Connect → Connect to HTTP Service [Port 80]; this will open a page in a web browser. Copy the URL to use for inference requests.
Run Inference Commands
HTTP requests can now be made to the running TGI system using an HTTP client or library.
...