Fine-tuning Llama-2 models with DeepSpeed, Accelerate, and Ray Train

💻Try it out

Run this example in the Anyscale Console or view it on GitHub.

This template shows you how to fine-tune Llama-2 models.

Step 1: Fine-tune Llama-2-7B model

Run the command below to kick off fine-tuning with dummy data, Grade School Math 8k (GSM8K) dataset.

The flag --as-test is for testing purpose as it runs through only one forward and backward pass and checkpoints the model. It takes ~7 minutes. Without this flag, it takes ~42 minutes (3 epochs for optimal model quality).

Model checkpoints will be stored under {user's first name}/ft_llms_with_deepspeed/ in the cloud storage bucket created for your Anyscale cloud. The full path will be printed in the output after the training is completed.

python train.py --size=7b --as-test

(Optional) Step 2: Switch to a different model size

Change model size (7B, 13B, or 70B) with the --size option in the previous command
Change the worker node type accordingly to fit the model
- use g5.12xlarge for 13B
- use g5.48xlarge for 70B

Run the command below to kick off fine-tuning with new model size and worker nodes.
- 13B: ~9 minutes for test run. ~60 minutes for a full run (3 epochs)
- 13B: ~35 minutes for test run. ~400 minutes for a full run (3 epochs)

python train.py --size=13b --as-test

Step 3: Use your custom data

Replace the contents in ./data/train.jsonl with your own training data
(Optional) Replace the contents in ./data/test.jsonl with your own test data if any.
(Optional) Add special token in ./data/tokens.json if any.

Use the same command to train with your own data.

What's next.

You have fine-tuned your own Llama-2 models. Want more than this? See the advanced tutorials below:

Walkthrough of this template: navigate to tutorials/walkthrough.md
Fine-tune Llama-2 with LoRA adapters: navigate to tutorials/lora.md

Step 1: Fine-tune Llama-2-7B model​

(Optional) Step 2: Switch to a different model size​

Step 3: Use your custom data​

What's next.​

Step 1: Fine-tune Llama-2-7B model

(Optional) Step 2: Switch to a different model size

Step 3: Use your custom data

What's next.