Skip to main content
Version: Latest

Supported models for fine-tuning

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

The following table provides a list of models available for fine-tuning. Each model has a context length, or, the maximum number of tokens the model can process in a single input. The context length depends on whether you're running inference on the base model, fine-tuning the base model, or running inference on the fine-tuned model.

ModelSizeBase context lengthFine-tuning context lengthInferencing context length
meta-llama/Meta-Llama-3-8B-Instruct8B8192 tokensUp to 32768 tokensThe greater of the base context length and fine-tuning context length used.
meta-llama/Meta-Llama-3-70B-Instruct70B8192 tokensUp to 16384 tokensThe greater of the base context length and fine-tuning context length used.
mistralai/Mistral-7B-Instruct-v0.17B8192 tokensUp to 8192 tokensThe greater of the base context length and fine-tuning context length used.
mistralai/Mixtral-8x7B-Instruct-v0.18x7B32768 tokensUp to 32768 tokensThe greater of the base context length and fine-tuning context length used.
Context length considerations

Fine-tuning a model with a context length longer than its base context length may increase training time and result in reduced model quality.

Deprecation of Llama-2 models

meta-llama/Llama-2-7b-chat-hf, meta-llama/Llama-2-13b-chat-hf, meta-llama/Llama-2-70b-chat-hf have transitioned to the legacy models list. Moving forward, for fine-tuning a similar or better model, use meta-llama/Meta-Llama-3-8B-Instruct or meta-llama/Meta-Llama-3-70B-Instruct.

Serving support for fine-tuned Llama-2 models expires on June 15, 2024.