Supported models for fine-tuning
Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.
With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.
The following table provides a list of models available for fine-tuning. Each model has a context length, or, the maximum number of tokens the model can process in a single input. The context length depends on whether you're running inference on the base model, fine-tuning the base model, or running inference on the fine-tuned model.
Model | Size | Base context length | Fine-tuning context length | Inferencing context length |
---|---|---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct | 8B | 8192 tokens | Up to 32768 tokens | The greater of the base context length and fine-tuning context length used. |
meta-llama/Meta-Llama-3-70B-Instruct | 70B | 8192 tokens | Up to 16384 tokens | The greater of the base context length and fine-tuning context length used. |
mistralai/Mistral-7B-Instruct-v0.1 | 7B | 8192 tokens | Up to 8192 tokens | The greater of the base context length and fine-tuning context length used. |
mistralai/Mixtral-8x7B-Instruct-v0.1 | 8x7B | 32768 tokens | Up to 32768 tokens | The greater of the base context length and fine-tuning context length used. |
Fine-tuning a model with a context length longer than its base context length may increase training time and result in reduced model quality.
meta-llama/Llama-2-7b-chat-hf
, meta-llama/Llama-2-13b-chat-hf
, meta-llama/Llama-2-70b-chat-hf
have transitioned to the legacy models list. Moving forward, for fine-tuning a similar or better model, use meta-llama/Meta-Llama-3-8B-Instruct
or meta-llama/Meta-Llama-3-70B-Instruct
.
Serving support for fine-tuned Llama-2 models expires on June 15, 2024.