Deprecated

LLMForge is being deprecated: The Ray Team is consolidating around open source fine-tuning solutions. Llama Factory and Axolotl provide enhanced functionality (quantization, advanced algorithms) and native Ray support for scaling. See the migration guide for transitioning your workflows.

note

LLMForge APIs are in Beta.

Fine-tuning open-weight LLMs with Anyscale

Fine-tuning LLMs is an easy and cost-effective way to tailor their capabilities towards niche applications with high-acccuracy. While Ray and RayTrain offer generic primitives for building such workloads, Anyscale created a higher-level library called LLMForge that builds on top of Ray and other open source libraries to provide an easy to work with interface for fine-tuning and training LLMs.

What is LLMForge?

LLMForge is a library that implements a collection of design patterns that use Ray, Ray Train, and Ray Data in combination with other open source libraries (for example DeepSpeed, 🤗 Hugging Face Accelerate, Transformers, etc.) to provide an easy to use library for fine-tuning LLMs. In addition to these design patterns, it offers tight integrations with the Anyscale platform, such as model registry, streamlined deployment, observability, Anyscale's job submission, etc.

Configurations

Specify LLMForge workloads using YAML configurations. You can launch them as Anyscale jobs or from the command line in a workspace as follows:

llmforge anyscale finetune <LLMFORGE_CONFIG>.yaml 

Example configs

To run these examples, open the fine-tuning template as a workspace on Anyscale and run the commands in the terminal. Find the example configs under ./training_configs. Outside of a workspace you can also find them on GitHub.

Fine-tune Llama-3-8B Instruct on 16xA10s with context length 512.

Command:

llmforge anyscale finetune training_configs/custom/meta-llama--Meta-Llama-3-8B-Instruct/lora/16xA10-512.yaml

Config:

Note: liger_kernel flag requires llmforge>=0.5.7

model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
valid_path: s3://...
context_length: 512
deepspeed:
  config_path: deepspeed_configs/zero_3_offload_optim+param.json
liger_kernel: 
  enabled: True
worker_resources:
  accelerator_type:A10G: 0.001

Fine-tune Gemma-2-27b on 8xA100-80G.

Command:

llmforge anyscale finetune training_configs/custom/google--gemma-2-27b-it/lora/8xA100-80G-512.yaml

Config:

Note: liger_kernel flag requires llmforge>=0.5.7

model_id: google/gemma-2-27b-it
train_path: s3://...
valid_path: s3://...
num_devices: 8
worker_resources:
  accelerator_type:A100-80G: 0.001
liger_kernel: 
  enabled: True
generation_config:
  prompt_format:
    system: "{instruction} + "
    assistant: "<start_of_turn>model\n{instruction}<end_of_turn>\n"
    trailing_assistant: "<start_of_turn>model\n"
    user: "<start_of_turn>user\n{system}{instruction}<end_of_turn>\n"
    system_in_user: True
    bos: "<bos>"
    default_system_message: ""
  stopping_sequences: ["<end_of_turn>"]

Summary of features in LLMForge

Full parameter and LoRA fine-tuning

LoRA with different configurations, ranks, layers, etc. that are fully configurable with a Hugging Face PEFT or transformers compatible interface.
Full-parameter fine-tuning with multi-node training support.

State of the art performance optimizations

Gradient checkpointing
Mixed precision training
Flash attention v2
DeepSpeed support (zero-DDP sharding)
Liger Kernel integration
torch.compile support

Unified chat data format with flexible prompt formatting

Use-case: multi-turn chat, instruction tuning, classification

Data format
Prompt format

{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi"},
        {"role": "assistant", "content": "Howdy!"},
        {"role": "user", "content": "What is the type of this model?"},
        {"role": "assistant", "content": "[[1]]"},
    ]
}

Prompt Format for llama-3-instruct:

system: "<|start_header_id|>system<|end_header_id|>\n\n{instruction}<|eot_id|>"
user: "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|>"
assistant: "<|start_header_id|>assistant<|end_header_id|>\n\n{instruction}<|eot_id|>"
system_in_user: False

Use-case: casual language modeling (aka continued pre-training), custom prompt formats (for example Llama-guard)

Example Continued pre-training (JSON):

{
    "messages": [
        {"role": "user", "content": "Once upon a time ..."},
    ],
},
{
    "messages": [
        {"role": "user", "content": "..."},
    ],
}

Prompt Format for doing nothing except concatenation:

system: "{instruction}"
user: "{instruction}"
assistant: "{instruction}"
system_in_user: False

Flexible task types

Causal language modeling: Loss considers predictions for all the tokens.
Instruction tuning: Considers only "assistant" tokens in the loss.
Classification: Predicts only a user-defined set of labels based on past tokens.
Preference tuning: Uses the contrast between chosen and rejected messages to improve the model.
Vision-language instruction tuning: Predicts assistant tokens based on a mix of past image and text tokens.

Multi-stage continuous fine-tuning

Fine-tune on one dataset, then continue fine-tuning on another dataset, for iterative improvements.
Do continued pre-training on one dataset, then chat-style fine-tuning on another dataset.
Do continued pre-training on one dataset followed by iterations of supervised-finetuning and preference tuning on independent datasets.

Context length extension

Extend the context length of the model using methods like RoPE scaling.

Configurable hyper-parameters

Full control over learning hyperparameters such as learning rate, number of epochs, batch size, etc.

Anyscale and third-party integrations

(Coming soon) Model registry:
- SDK for accessing fine-tuned models for creating automated pipelines
- More streamlined deployment flow when you fine-tune on Anyscale
Monitoring and observability:
- Take advantage of standard logging frameworks such as Weights & Biases and MLflow
- Use of Ray dashboard and Anyscale loggers for debugging and monitoring the training process
Anyscale jobs integration: Use Anyscale's job submission API to programmatically submit long-running jobs through LLMForge

What is LLMForge?​

Configurations​

Example configs​

Summary of features in LLMForge​

Full parameter and LoRA fine-tuning​

State of the art performance optimizations​

Unified chat data format with flexible prompt formatting​

Flexible task types​

Multi-stage continuous fine-tuning​

Context length extension​

Configurable hyper-parameters​

Anyscale and third-party integrations​