...
This process involves branching from a base model and training the branch on specific domain data in order to establish an expert, which routing logic is able to activate in serving inference requests. The result is a framework for domain expertise that is easily-extensible, modular, and efficient.
Prerequisites
M*DEL does not rely specifically on a particular base model or computing service for training, but for the purpose of instruction, the Aurora model from HuggingFace 's Aurora model (aurora-m) will be used as the base and RunPod compute resources will be used for running the training. Also, Weights and Biases will collect information during training for monitoring status, evaluating progress, and to allow comparing subsequent training runs for performance.
...