Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Shortened title

...

Table of Contents
outlinetrue
classtoc


Prerequisites

Setting up the inference system will use M*DEL's Aurora model (aurora-m) and will rely on RunPod compute resources, similar to the model training process. If the prerequisites for Training an M*DEL Expert have already been completed, then continue to the next section.

Otherwise, follow the instructions in that document for setting up HuggingFace and RunPod accounts before continuing.

Launch the TGI Container Instance

A template exists at RunPod to launch a text generation inference (TGI) container for the Aurora model.

...

  1. Start from the text-generation-inference template and set the organization Account, as appropriate, from the top-right profile drop-down list.

  2. Click the Filters icon and change "Allowed CUDA Versions" to 12.2 only.
  3. Scroll down to the Previous Generation section and click Deploy on the 1xA100 80GB GPU. Click "Customize Deployment":

    • Update "Container Start Command" to point to the expert uploaded via the huggingface-cli during training; e.g.: --model-id stillerman/aurora-mathematica
    • Expand "Environment Variables" and set HUGGING_FACE_HUB_TOKEN to your read token from huggingface.co/settings/tokens
    • Click "Set Overrides" and then Continue → Deploy
  4. After the instance has started up, wait for the following line to appear in its Container Logs (this might take approximately 5 minutes):

    Code Block
    WARN text_generation_router: router/src/main.rs:327: Invalid hostname, defaulting to 0.0.0.0

    Close the log and click Connect → Connect to HTTP Service [Port 80]; this will open a page in a web browser. Copy the URL to use for inference requests.

Run Inference Commands

HTTP requests can now be made to the running TGI system using an HTTP client or library.

...