Skip to main content
Version: 1.0.0

Fine-tuning guide

Check your docs version

Anyscale is rolling out a new design. If you have preview access to the enhanced experience, use the latest version of the docs and see the migration guide for transitioning.

Fine-tuning involves taking a pre-trained model and further refining it with a domain-specific dataset. The decision to fine-tune has subtle considerations. It's useful when techniques such as prompt engineering, document retrieval, and tool use don't satisfy the need to strike a particular style, format, or tone. When using smaller open source models on niche tasks, this method shows promising cost-saving potential without sacrificing quality.

Follow this guide for running fine-tuning on your data.


This guide outlines how to initiate fine-tuning with Anyscale Private Endpoints. For foundational fine-tuning information, consult the OpenAI documentation, as the formats and core steps remain the same.

Step 0: Prerequisites

Set up your account

Sign up for Anyscale Private Endpoints to receive an invite code. Then, create an account or sign in through the Anyscale Console.

Anyscale authentication

Install the Anyscale command-line tool and authenticate your account.

Satisfy cloud prerequisites

Deploy your Anyscale Cloud. Then, ensure that your cloud has sufficient quota to fine-tune your LLMs.

Step 1: Select your cloud and model

In the New fine-tuning job page, choose the following:

  • the cloud you want to run the fine-tuning job.
  • the model to use as a base.

Step 2: Prepare training data

To guide the base model, prepare a diverse set of conversations in a training dataset for fine-tuning and an optional validation dataset to provide a check against over-fitting. Both files should be in .jsonl (JSON Lines), which is the same format OpenAI's chat completion API uses.

The .jsonl format

Each line is a JSON object with a messages array that represents one conversation. Each conversation three roles:

  • system - Description of the behavior or personality of the model. As a best practice, this should be the same for all examples in the fine-tuning dataset, and should remain the same system prompt when moved to production.
  • user - User message, or "prompt," that provides a request for the model to respond to.
  • assistant - Stores previous responses but can also contain examples of intended responses for the LLM to return.
{"messages": [{"role": "system", "content": "Answer in fewer than 25 words."}, {"role": "user", "content": "What's the difference between yogurt and labneh?"}, {"role": "assistant", "content": "Labneh is strained yogurt, thicker in texture with a tangier taste, often used as a dip or spread."}]}
{"messages": [{"role": "system", "content": "Answer in fewer than 25 words."}, {"role": "user", "content": "What is a floating point operation?"}, {"role": "assistant", "content": "A floating point operation is a calculation involving real numbers with decimal points, allowing for a wide range of values with fractional precision."}]}

Conversations can consist of a single query and reply or contain multiple "query and reply" exchanges.

Data preparation example


You can download the following prepared training and validation datasets to try out fine-tuning, or continue with the complete walkthrough below.

curl -o train.jsonl

curl -o valid.jsonl

This section is an example of converting raw data into the expected format for fine-tuning. To begin, download the text-to-SQL SPIDER dataset, and unzip the contents.

Each entry in this dataset looks like the following:

"db_id": "department_management",
"question": "How many heads of the departments are older than 56 ?",
"query": "SELECT count(*) FROM head WHERE age > 56",
}, ...

To convert this into the expected input format, run the following code to generate rows of examples to fine-tune the base model:

train_spider = json.load(open('spider/train_spider.json','r'))
dev_spider = json.load(open('spider/dev.json','r'))

# Define the prompts. 
system_prompt = "You are a helpful assistant that helps people convert natural language to SQL. You are given the database format using a CREATE command, and your goal is to convert this to a SQL query that is a SELECT command."
user_prompt_template = "The database is {db_id}. Convert the following to a SQL command: {question}"
assistant_prompt_template = "{query}"

# Take a row from SPIDER and construct an example from it. 
# Uses kwargs to fill in the template from the row directly.
def convert_to_msg(row):
    return { 'messages': [
                {'role': 'system', 
                 'content': system_prompt},
                {'role': 'user', 
                 'content': user_prompt_template.format(**row)},
                {'role': 'assistant',
                 'content': assistant_prompt_template.format(**row)}]}

# Save this to a file.
f = open('train.jsonl', 'w')
for row in train_spider:
    json.dump(convert_to_msg(row), f)
f.close() # Avoid corrupted output files. 

# Repeat for development data. Use the dev data
# for validation. Validation is used to determine if the fine-tuning
# has converged and which model performs the best. 
f = open('valid.jsonl', 'w')
for row in dev_spider: 
    json.dump(convert_to_msg(row), f)

Step 3: Run the fine-tuning job

After preparing your training dataset and optional validation dataset, initiate the fine-tuning job and upload files to your cloud with the following Anyscale command-line command:

anyscale fine-tuning submit BASE_MODEL_NAME --train-file TRAIN_FILE_JSONL --valid-file VALIDATION_FILE_JSONL --cloud-id CLOUD_ID

Remember to replace these placeholder variables with your values:

  • BASE_MODEL_NAME - The API name of the base pre-trained model to fine-tune. See here for a list of supported base models.
  • TRAIN_FILE_JSONL - The filename for the training data in a .jsonl format.
  • VALIDATION_FILE_JSONL - The filename for the validation data in a .jsonl format.
  • CLOUD_ID - The ID of the Anyscale cloud you're using.

Here are the optional arguments:

  • instance-type - The instance type to run the fine-tuning command. The available options for AWS are: g5.4xlarge, g5.12xlarge, p4d.24xlarge, and p4de.24xlarge. Support for custom instance types on GCP is planned.
  • version - The version of the fine-tuning library to use. Defaults to the latest version.

Datasets, models, and other files save to your default storage bucket of your Anyscale cloud. Navigate to your cloud service provider console to manage them.

Step 4: Try the fine-tuned model

Once the fine-tuning job completes, Anyscale sends an email containing the results of the run. To send queries, deploy the fine-tuned model as you would with a pre-trained model. Retrieve the FINE_TUNED_MODEL_NAME from the Fine-tuning tab or listed when creating a New endpoint.

Note: Endpoints deployed using version 0.4.1 or greater can serve any fine-tuned models.

Remember to setup your Anyscale API base and key.

import os
from openai import OpenAI

client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']

response =
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}