Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Shortened title

...

Table of Contents
outlinetrue
classtoc


Prerequisites

Setting up the inference system will use M*DEL's Aurora model (aurora-m) and will rely on RunPod compute resources, similar to the model training process. If the prerequisites for Training an M*DEL Expert have already been completed, then continue to the next section.

Otherwise, follow the instructions in that document for setting up HuggingFace and RunPod accounts before continuing.

Launch the TGI Container Instance

A template exists at RunPod to launch a text generation inference (TGI) container for the Aurora model.

...

  1. Start from the text-generation-inference template and set the organization Account, as appropriate, from the top-right profile drop-down list.

  2. Click the settings Filters icon and change "Allowed CUDA Versions" to 12.2 only.
  3. Scroll down to the " Previous Generation " section and click Deploy on the 1xA100 80GB GPU. Click customize deployment"Customize Deployment":

    • Update HUGGING_FACE_HUB_TOKEN with your read token from hf.co/settings/tokensUpdate "Container Start Command" to point to the expert /lorauploaded via the huggingface-cli during training; e.g.: --model-id stillerman/mtg-aurora aurora-mathematica
    • Expand "Environment Variables" and set HUGGING_FACE_HUB_TOKEN to your read token from huggingface.co/settings/tokens
    • Click "Set Overrides" and the then Continue → Deploy
  4. When After the instance has finished starting started up, wait for the following line will to appear in its logContainer Logs (this might take approximately 5 minutes):

    Code Block
    WARN text_generation_router: router/src/main.rs:327: Invalid hostname, defaulting to 0.0.0.0

    Close the log and click Connect → Connect to HTTP Service [Port 80]; this will open a page in a web browser. Copy the URL to use for inference requests.

Run Inference Commands

HTTP requests can now be made to the running TGI system using an HTTP client or library.

...