Skip to main content
Version: Latest

Fine tuning: end-to-end example

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

Goals

By the end of this example you should be able to:

  • Convert your domain specific data into the appropriate JSON format.
  • Fine tune either Llama 7B or 13B or 70B on Anyscale Endpoints.
  • Use a fine tuned model using the standard Anyscale Endpoint APIs.

Prerequisites

Before starting this example, you should have:

  • An Anyscale Endpoints API Key
  • A set of examples you want to improve performance on in your preferred format

Previous experience with Anyscale Endpoints is helpful.

This guide uses the OpenAI SDK in Python to do this. If you’ve done fine tuning on OpenAI before, the following should look familiar to you.

Anyscale Endpoints fine-tuning design principles

One design principle of Anyscale Endpoints is to be as compatible as possible with the OpenAI approach, to make it easier for users transitioning from OpenAI. This approach includes:

  • API calls that are similar. You can use the OpenAI Python SDK and only change a few environment variables.
  • The file format for fine tuning is the same.

Outline of stages

The general process for fine tuning is as follows:

  1. Collect a set of system prompt, user prompt, and assistant responses that represent your desired patterns.
  2. Convert these responses into training data in the OpenAI prompt format. Optionally, also convert validation data.
  3. Upload the training data to Anyscale Endpoints. Uploading the validation data is also an option.
  4. Start a fine tuning job that refers to the base model and the training data.
  5. Receive notification of a completed fine tuning job.
  6. Use the fine tuned model.

0. Download training and validation files

Ready-to-use training and validation files are available for you. Directly download them and skip step 1 and 2.

curl -o train.jsonl https://gist.githubusercontent.com/robertnishihara/ac613ff0404487fbf115cf9c0224080e/raw/80a1af0b1c2940a03830c477f7f722323b77472e/train.jsonl

curl -o valid.jsonl https://gist.githubusercontent.com/robertnishihara/ac613ff0404487fbf115cf9c0224080e/raw/80a1af0b1c2940a03830c477f7f722323b77472e/valid.jsonl

1. Collecting the desired patterns

This example tests your fine tuning on the SPIDER dataset for translating natural language to SQL. You can download the SPIDER from: https://yale-lily.github.io/spider. See Spider Dataset in “Getting Started”.

Examine the input data from SPIDER:

    {
"db_id": "department_management",
"question": "How many heads of the departments are older than 56 ?",
"query": "SELECT count(*) FROM head WHERE age > 56",
...
}, ...

You want to convert into the form used for fine tuning. The data you want to produce should generally consist of a triple that looks like the following:

    [{"role": "system", 
  "content": "You are a helpful assistant that helps people convert natural language to SQL. You are given the database format using a CREATE command, and your goal is to convert this to a SQL query that is a SELECT command."
  }, 
  {"role": "user", 
  "content": "The database is department_management. Convert the following to a SQL command: How many heads of the departments are older than 56?"
  },
 { "role": "assistant", 
   "content": SELECT count(*) FROM head WHERE age  >  56"
}] 

The first line is the system prompt. This prompt defines the overall behavior of the model. For fine tuning, this should be the same for all examples generally. You should use it when querying in production.

The second line is the user prompt. This prompt is the request that varies with every example you give to the fine tuner.

The third line is the assistant response. This response is what you want the LLM to return.

Next, get it into the right format.

Unzip the SPIDER into the spider/ directory and get started.

2. Convert your inputs into training data format

The following is some simple code that generates the input format that Anyscale Endpoints expects to fine tune the previous input. Each row of output is an example that Anyscale Endpoints uses to fine tune.


train_spider = json.load(open('spider/train_spider.json','r'))
dev_spider = json.load(open('spider/dev.json','r'))

# Define the prompts. 
system_prompt = "You are a helpful assistant that helps people convert natural language to SQL. You are given the database format using a CREATE command, and your goal is to convert this to a SQL query that is a SELECT command."
user_prompt_template = "The database is {db_id}. Convert the following to a SQL command: {question}"
assistant_prompt_template = "{query}"

# Takes a row from SPIDER and constructs an example from it. 
# Uses clever trick of using kwargs to fill in the template from the row directly.
def convert_to_msg(row):
    return { 'messages': [
                {'role': 'system', 
                 'content': system_prompt},
                {'role': 'user', 
                 'content': user_prompt_template.format(**row)},
                {'role': 'assistant',
                 'content': assistant_prompt_template.format(**row)}]}

# Save this to a file: 
f = open('train.jsonl', 'w')
for row in train_spider:
    json.dump(convert_to_msg(row), f)
    f.write('\n')
f.close() # Don't forget this or you might get corrupted output files. 

# Now do the same for development data. Use the dev
# data for validation. Validation is used to determine if the fine tuning
# has converged and what the best model produced in fine tuning was. 
 
f = open('valid.jsonl', 'w')
for row in dev_spider: 
    json.dump(convert_to_msg(row), f)
    f.write('\n')
f.close()

3. Upload the training and validation data

Next, upload the data. First install OpenAI. Then set some environment variables.

    % pip install --upgrade openai
% export ANYSCALE_API_KEY='your_secret'
% export ANYSCALE_BASE_URL='https://api.endpoints.anyscale.com/v1'

Setting the OpenAI environment variable may seem unusual, but the point is to make transitions easy. If you already have OpenAI fine tuning working, changing the environment variables and the model name is all you have to do.

Next upload:

import openai
import os
client = openai.OpenAI(
base_url = "https://api.endpoints.anyscale.com/v1",
api_key = "esecret_yourAuthTokenHere")
training_file_id = client.files.create(
file=open('train.jsonl','rb'),
purpose="fine-tune",
).id

valid_file_id = client.files.create(
file=open('valid.jsonl','rb'),
purpose="fine-tune",
).id

4. Start the fine tuning

After you’ve uploaded the data, you can start the fine tuning using a single call. You need to specify the “base model” that you plan to fine tune.

    model="meta-llama/Llama-2-7b-chat-hf"
finetuning_job_id = client.fine_tuning.jobs.create(
    training_file=training_file_id,
    validation_file=valid_file_id,
    model=model,
).id

You’ve kicked-off fine tuning.

At any time you can see how the job is progressing:

    client.fine_tuning.jobs.retrieve(finetuning_job_id)

JSON: {
  "result_files": [],
  "trained_tokens": null,
  "hyperparameters": {
    "n_epochs": null,
    "context_length": null
  },
  "training_file": "file_2r26ae1rhg3ubsj2ayjvq4b5mg",
  "validation_file": "file_2dyejttkie1hzqj8fsc1diaby8",
  "model": "meta-llama/Llama-2-7b-chat-hf",
  "id": "eftjob_umknzu786e7sm2vm4kux8j17ji",
  "created_at": "2023-09-15T00:58:54.350899+00:00",
  "finished_at": null,
  "fine_tuned_model": "meta-llama/Llama-2-7b-chat-hf:m:ppRIkyW",
  "status": "running",
  "error": null,
  "creator_id": "euser_lnxi3pyjnq7texigw4zl7ntd5a"
}

Once the job kicks off, the job status retrieval results include a file ID in the results_file which you can use to query and monitor the training progress of the fine-tuning job. You can also see the hyper-parameters such as n_epochs and context_length.

    status = client.fine_tuning.jobs.retrieve(finetuning_job_id)
# The result now include file id in the result_files
JSON: {
  "result_files": ["file_cr39xfzm5hmpq5izw33829s7jk"],
  "trained_tokens": null,
  "hyperparameters": {
    "n_epochs": null,
    "context_length": null
  },
  "training_file": "file_2r26ae1rhg3ubsj2ayjvq4b5mg",
  "validation_file": "file_2dyejttkie1hzqj8fsc1diaby8",
  "model": "meta-llama/Llama-2-7b-chat-hf",
  "id": "eftjob_umknzu786e7sm2vm4kux8j17ji",
  "created_at": "2023-09-15T00:58:54.350899+00:00",
  "finished_at": null,
  "fine_tuned_model": "meta-llama/Llama-2-7b-chat-hf:m:ppRIkyW",
  "status": "running",
  "error": null,
  "creator_id": "euser_lnxi3pyjnq7texigw4zl7ntd5a"
}

# You can use OpenAI file API to download the content and look at the results.
file_id = status.result_files[0]
content = client.files.retrieve_content(file_id)
jsonl_lines = content.decode("utf-8").split("\n")
results = [json.loads(line) for line in jsonl_lines if line != ""]

# Example output.
[{'epoch': 0,
  'hyperparameters': {'context_length': 512, 'n_epochs': 1},
  'iteration': 1,
  'perplexity': None,
  'time_since_job_start': 184.51690673828125,
  'train_loss': 3.176703453063965,
  'trained_tokens': 88480,
  'valid_loss': None},...]

Next, wait for an email.

5. Receiving notification via email

The following is the email you should receive.

Your fine-tuning job eftjob_lldktwx7plwc2ihb9uak2zkgbe has succeeded. The model name is meta-llama/Llama-2-7b-chat-hf:m:G135zDF.

Full job details:
{
"result_files": [
"file_6ifp326lrv2uf3ham2b27mr8p4"
],
"trained_tokens": null,
"hyperparameters": {
"n_epochs": null,
"context_length": null
},
"training_file": "file_jgr8hv15j3vsnx5gl8atk7bmst",
"validation_file": "file_ewb6n8pj7fj4tbdnfrxrdc8qiw",
"model": "meta-llama/Llama-2-7b-chat-hf",
"id": "eftjob_lldktwx7plwc2ihb9uak2zkgbe",
"created_at": "2023-09-15 03:52:41.780129+00:00",
"finished_at": "2023-09-15 04:06:04.569118+00:00",
"fine_tuned_model": "meta-llama/Llama-2-7b-chat-hf:m:G135zDF",
"status": "running",
"error": null,
"creator_id": "euser_lnxi3pyjnq7texigw4zl7ntd5a"
}

6. Using the fine tuned model

First, try the base model to see how it does.

Pick the third example from the validation file.

test_msg = convert_to_msg(dev_spider[2])

#Output is: 
{'messages': [{'role': 'system',
   'content': 'You are a helpful assistant that helps people convert natural language to SQL. You are given the database format using a CREATE command, and your goal is to convert this to a SQL query that is a SELECT command.'},
  {'role': 'user',
   'content': 'The database is concert_singer. Convert the following to a SQL command: Show name, country, age for all singers ordered by age from the oldest to the youngest.'},
  {'role': 'assistant',
   'content': 'SELECT name ,  country ,  age FROM singer ORDER BY age DESC'}]}

In this case you don’t want to include the third message because it's the expected output. Trim it and feed it to the base model.

model = "meta-llama/Llama-2-7b-chat-hf"
print(client.chat.completions.create(
base_url="https://api.endpoints.anyscale.com/v1",
api_key="esecret_API_KEY"
model = model,
messages = test_msg['messages'][:-1],# drop the assistant message
temperature = 0).choices[0].message.content)

#Output is:
Sure, I'd be happy to help! Based on the information provided in the `CREATE` command, I can infer that the database is called `concert_singer` and it has the following tables:
* `singer` (id, name, country, age)
To convert the natural language query into a SQL query, we can use the following steps:
1. Identify the entities mentioned in the query: `name`, `country`, `age`
2. Determine the relationship between the entities: `name` and `age` are related, as `age` is a measure of the `age` of the `name`.
3. Use the `SELECT` clause to specify the columns we want to retrieve: `name`, `country`, `age`
4. Use the `ORDER BY` clause to sort the results by `age`: `ordered by age from the oldest to the youngest`
5. Use the `FROM` clause to specify the table(s) we want to retrieve data from: `from the singer table`
Here is the SQL query that corresponds to the natural language query:

SELECT name, country, age
FROM singer
ORDER BY age
FROM OLDSTO THE OLDEST TO THE YOUNGEST;

I hope this helps! Let me know if you have any questions or if you need further assistance.

Note the verbosity and the assertion that `FROM OLDSTO THE OLDEST TO THE YOUNGEST` is valid SQL.

Next, try the fine-tuned version. To use the fine-tuned version, change the value of the model to the one you received in the email, like the following:

model = "meta-llama/Llama-2-7b-chat-hf:m:G135zDF"
print(client.chat.completions.create(
base_url="https://api.endpoints.anyscale.com/v1",
api_key="esecret_API_KEY"
model = model,
messages = test_msg['messages'][:-1],# drop the assistant message
temperature = 0).choices[0].message.content)

#Output is:
SELECT T1.name , T1.country , T1.age FROM singer AS T1 ORDER BY T1.age DESC

Notice the improvement. This response is concise, to the point, and is very similar to the desired output. The only difference is the name T1 mapping to the name of the actual table. This fine tuning has also reduced cost: the first output was 398 tokens total and wrong. This output is 131 tokens total, approximately ⅓ of the cost, and correct.