Migration to LLaMA-Factory
Motivation
We are recommending users to use Llama Factory for fine-tuning on Anyscale.
- Documentation for Llama Factory can be found here: https://github.com/hiyouga/LLaMA-Factory/
- Anyscale documentation can be found here
The open source community has consolidated around a couple key projects for fine-tuning. These projects are not only quick to adopt the newest models and functionality, but also have integrated Ray natively for scaling. We believe Llama Factory and Axolotl will provide the best functionality for production fine-tuning use cases and more.
As such, we're going to start deprecating LLMForge and recommend users to scale these open source libraries with Ray on Anyscale.
Feature matrix comparison
Llama Factory | LLMForge | |
---|---|---|
Model Support (Llama, Mistral, QWEN, etc) | ✅ | ✅ |
Kernel integrations (FlashAttention-2, Liger Kernel) | ✅ | ✅ |
Ray Support | ✅ | ✅ |
Quantization support (2/3/4/5/6/8-bit QLoRA) | ✅ | ❌ |
More algorithms (GaLore, LoRA+, Mixture of Depths, etc) | ✅ | ❌ |
Migration guide
Here's a migration guide for common LLMForge workloads -> Llama Factory.
Tasks and data formatting
Pre-training
Before
Previously in LLMForge, pre-training (aka causal language modeling) required the following config change for a pass through prompt format, and formatting for text in an OpenAI compatible chat format
model_id: meta-llama/Meta-Llama-3-8B-Instruct # Any HF model ID.
task: "causal_lm"
generation_config:
prompt_format: # Does nothing but concatenation.
system: "{instruction}"
user: "{instruction}"
assistant: "{instruction}"
system_in_user: False
...
{
"messages": [
{"role": "user", "content": "Once upon a time ..."},
],
},
After
In LLaMA-Factory, specify pre-training data formats as a JSON with a text
field as shown below:
{
"text": "Once upon a time ..."
},
Find details for LLaMA-Factory pre-training here.
Instruction tuning
Before
{
"messages": [
{"role": "user", "content": "Describe a process of making crepes."},
{"role": "assistant", "content": "Making crepes is an easy process!..."}
],
},
After
{
"instruction": "Describe a process of making crepes.",
"output": "Making crepes is an easy process!...",
},
Find details for LLaMA-Factory instruction tuning here.
Example: tuning LoRA for Llama3
Before
# config.yaml
# Change this to the model you want to fine-tune
model_id: meta-llama/Meta-Llama-3-8B-Instruct
# Change this to the path to your training data
train_path: s3://air-example-data/gsm8k/train.jsonl
# Change this to the path to your validation data. This is optional
valid_path: s3://air-example-data/gsm8k/test.jsonl
# Change this to the context length you want to use. Examples with longer
# context length will be truncated.
context_length: 512
# Change this to total number of GPUs that you want to use
num_devices: 4
# Change this to the number of epochs that you want to train for
num_epochs: 3
# Change this to the batch size that you want to use
train_batch_size_per_device: 2
eval_batch_size_per_device: 4
gradient_accumulation_steps: 2
# Change this to the learning rate that you want to use
learning_rate: 1e-4
# This will pad batches to the longest sequence. Use "max_length" when profiling to profile the worst case.
padding: "longest"
# Deepspeed configuration, you can provide your own deepspeed setup
deepspeed:
config_path: deepspeed_configs/zero_2.json
# Accelerator type, we value of 0.001 is not important, as long as it is
# beteween 0 and 1. This ensures that accelerator type is used per trainer
# worker.
worker_resources:
anyscale/accelerator_shape:4xA10G: 0.001
# Lora configuration
lora_config:
r: 8
lora_alpha: 16
lora_dropout: 0.05
target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- embed_tokens
- lm_head
task_type: "CAUSAL_LM"
bias: "none"
modules_to_save: []
llmforge anyscale finetune config.yaml
After
# config.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true
### method
stage: pt
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
### dataset
dataset: c4_demo
dataset_dir: /mnt/cluster_storage/ # Ray needs this.
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/llama3-8b/lora/pretrain
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### ray
ray_run_name: llama3_8b_pretrain_lora
ray_storage_path: /mnt/cluster_storage/
ray_num_workers: 4 # number of GPUs to use
resources_per_worker:
GPU: 1
anyscale/accelerator_shape:4xA10G: 0.001 # Use this to specify a specific node shape,
# accelerator_type:A10G: 0.001 # Or use this to simply specify a GPU type.
# see https://docs.ray.io/en/master/ray-core/accelerator-types.html#accelerator-types for a full list of accelerator types
USE_RAY=1 llamafactory-cli train config.yaml
Example: Continuous pre-training
Assuming the model's checkpoint is stored at /mnt/cluster_storage/checkpoint/
where the checkpoint folder contains a huggingface compatible checkpoint (adapter_model.safetensors
, adapter_config.json
, config.json
, tokenizer.json
, etc.), you can support continuous pre-training as follows
Before Given the preceding llmforge config, simply add
...
initial_adapter_model_ckpt_path: /mnt/cluster_storage/checkpoint/
...
After
Given the preceding LLaMA-Factory config, add
...
resume_from_checkpoint: /mnt/cluster_storage/checkpoint/
...
Conclusion
If there are any feature gaps in Llama Factory, contact the Anyscale team.