Migrate from LLMForge to LLaMA-Factory

This migration guide provides an overview of moving from LLMForge to LLaMA Factory.

Anyscale recommends using LLaMA Factory for fine-tuning on Anyscale.

Documentation for LLaMA Factory can be found here: https://github.com/hiyouga/LLaMA-Factory/
Anyscale documentation can be found here

The open source community has consolidated around a couple key projects for fine-tuning. These projects are not only quick to adopt the newest models and functionality, but also have integrated Ray natively for scaling. LLaMA Factory and Axolotl provide the best functionality for production fine-tuning use cases and more.

Anyscale has deprecated LLMForge and recommends users to scale these open source libraries with Ray on Anyscale. Reach out to Anyscale support to request additional coverage for Ray in open source fine-tuning tools.

Feature matrix comparison

	LLaMA Factory	LLMForge
Model Support (LLaMA, Mistral, QWEN, etc)	✅	✅
Kernel integrations (FlashAttention-2, Liger Kernel)	✅	✅
Ray Support	✅	✅
Quantization support (2/3/4/5/6/8-bit QLoRA)	✅	❌
More algorithms (GaLore, LoRA+, Mixture of Depths, etc)	✅	❌

Migrate pre-training tasks

Before

Previously in LLMForge, pre-training (causal language modeling) required the following config change for a pass through prompt format, and formatting for text in an OpenAI compatible chat format:

model_id: meta-llama/Meta-Llama-3-8B-Instruct  # Any HF model ID.
task: "causal_lm"
generation_config:
    prompt_format: # Does nothing but concatenation.
        system: "{instruction}"
        user: "{instruction}"
        assistant: "{instruction}"
        system_in_user: False
...

{
    "messages": [
        {"role": "user", "content": "Once upon a time ..."},
    ],
},

After

In LLaMA Factory, specify pre-training data formats as a JSON with a text field as shown below:

{
    "text": "Once upon a time ..."
},

Find details for LLaMA Factory pre-training here.

Migrate instruction tuning

Before

{
    "messages": [
        {"role": "user", "content": "Describe a process of making crepes."},
        {"role": "assistant", "content": "Making crepes is an easy process!..."}
    ],
},

After

{
    "instruction": "Describe a process of making crepes.",
    "output": "Making crepes is an easy process!...",
},

Find details for LLaMA Factory instruction tuning here.

Example: Tuning LoRA for Llama3

Before

# config.yaml
# Change this to the model you want to fine-tune
model_id: meta-llama/Meta-Llama-3-8B-Instruct

# Change this to the path to your training data
train_path: s3://air-example-data/gsm8k/train.jsonl

# Change this to the path to your validation data. This is optional
valid_path: s3://air-example-data/gsm8k/test.jsonl

# Change this to the context length you want to use. Examples with longer
# context length will be truncated.
context_length: 512

# Change this to total number of GPUs that you want to use
num_devices: 4

# Change this to the number of epochs that you want to train for
num_epochs: 3

# Change this to the batch size that you want to use
train_batch_size_per_device: 2
eval_batch_size_per_device: 4
gradient_accumulation_steps: 2

# Change this to the learning rate that you want to use
learning_rate: 1e-4

# This will pad batches to the longest sequence. Use "max_length" when profiling to profile the worst case.
padding: "longest"

# Deepspeed configuration, you can provide your own deepspeed setup
deepspeed:
  config_path: deepspeed_configs/zero_2.json

# Accelerator type, we value of 0.001 is not important, as long as it is
# beteween 0 and 1. This ensures that accelerator type is used per trainer
# worker.
worker_resources:
  anyscale/accelerator_shape:4xA10G: 0.001

# Lora configuration
lora_config:
  r: 8
  lora_alpha: 16
  lora_dropout: 0.05
  target_modules:
    - q_proj
    - v_proj
    - k_proj
    - o_proj
    - gate_proj
    - up_proj
    - down_proj
    - embed_tokens
    - lm_head
  task_type: "CAUSAL_LM"
  bias: "none"
  modules_to_save: []

llmforge anyscale finetune config.yaml

After

# config.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
trust_remote_code: true

### method
stage: pt
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: c4_demo
dataset_dir: /mnt/cluster_storage/ # Ray needs this.
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/llama3-8b/lora/pretrain
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### ray
ray_run_name: llama3_8b_pretrain_lora
ray_storage_path: /mnt/cluster_storage/
ray_num_workers: 4  # number of GPUs to use
resources_per_worker:
  GPU: 1
  anyscale/accelerator_shape:4xA10G: 0.001  # Use this to specify a specific node shape,
  # accelerator_type:A10G: 0.001            # Or use this to simply specify a GPU type.
  # see https://docs.ray.io/en/master/ray-core/accelerator-types.html#accelerator-types for a full list of accelerator types

USE_RAY=1 llamafactory-cli train config.yaml

Example: Continuous pre-training

Assuming the model's checkpoint is stored at /mnt/cluster_storage/checkpoint/ where the checkpoint folder contains a huggingface compatible checkpoint (for example, adapter_model.safetensors, adapter_config.json, config.json, and tokenizer.json), you can support continuous pre-training as follows:

Before Given the preceding llmforge config, simply add:

...
initial_adapter_model_ckpt_path: /mnt/cluster_storage/checkpoint/
...

After

Given the preceding LLaMA Factory config, add:

...
resume_from_checkpoint: /mnt/cluster_storage/checkpoint/
...

Feature matrix comparison​

Migrate pre-training tasks​

Migrate instruction tuning​

Example: Tuning LoRA for Llama3​

Example: Continuous pre-training​

Feature matrix comparison

Migrate pre-training tasks

Migrate instruction tuning

Example: Tuning LoRA for Llama3

Example: Continuous pre-training