LLMForge APIs are in Beta.
Fine-tuning Open-weight LLMs with Anyscale
Fine-tuning LLMs is an easy and cost-effective way to tailor their capabilities towards niche applications with high-acccuracy. While Ray and RayTrain offer generic primitives for building such workloads, Anyscale created a higher-level library called LLMForge that builds on top of Ray and other open source libraries to provide an easy to work with interface for fine-tuning and training LLMs.
What is LLMForge?
LLMForge is a library that implements a collection of design patterns that use Ray, Ray Train, and Ray Data in combination with other open source libraries (for example DeepSpeed, 🤗 Hugging Face Accelerate, Transformers, etc.) to provide an easy to use library for fine-tuning LLMs. In addition to these design patterns, it offers tight integrations with the Anyscale platform, such as model registry, streamlined deployment, observability, Anyscale's job submission, etc.
Configurations
Specify LLMForge workloads using YAML configurations. You can launch them as Anyscale jobs or from the command line in a workspace as follows:
llmforge anyscale finetune <LLMFORGE_CONFIG>.yaml
Example configs
To run these examples, open the fine-tuning template as a workspace on Anyscale and run the commands in the terminal. Find the example configs under ./training_configs
. Outside of a workspace you can also find them on GitHub.
Fine-tune Llama-3-8B Instruct on 16xA10s with context length 512.
Command:
llmforge anyscale finetune training_configs/custom/meta-llama--Meta-Llama-3-8B-Instruct/lora/16xA10-512.yaml
Config:
Note: liger_kernel
flag requires llmforge
>=0.5.7
model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
valid_path: s3://...
context_length: 512
deepspeed:
config_path: deepspeed_configs/zero_3_offload_optim+param.json
liger_kernel:
enabled: True
worker_resources:
accelerator_type:A10G: 0.001
Fine-tune Gemma-2-27b on 8xA100-80G.
Command:
llmforge anyscale finetune training_configs/custom/google--gemma-2-27b-it/lora/8xA100-80G-512.yaml
Config:
Note: liger_kernel
flag requires llmforge
>=0.5.7
model_id: google/gemma-2-27b-it
train_path: s3://...
valid_path: s3://...
num_devices: 8
worker_resources:
accelerator_type:A100-80G: 0.001
liger_kernel:
enabled: True
generation_config:
prompt_format:
system: "{instruction} + "
assistant: "<start_of_turn>model\n{instruction}<end_of_turn>\n"
trailing_assistant: "<start_of_turn>model\n"
user: "<start_of_turn>user\n{system}{instruction}<end_of_turn>\n"
system_in_user: True
bos: "<bos>"
default_system_message: ""
stopping_sequences: ["<end_of_turn>"]
Summary of features in LLMForge
Full parameter and LoRA fine-tuning
- LoRA with different configurations, ranks, layers, etc. that are fully configurable with a Hugging Face PEFT or transformers compatible interface.
- Full-parameter fine-tuning with multi-node training support.
State of the art performance optimizations
- Gradient checkpointing
- Mixed precision training
- Flash attention v2
- DeepSpeed support (zero-DDP sharding)
- Liger Kernel integration
torch.compile
support
Unified chat data format with flexible prompt formatting
Use-case: multi-turn chat, instruction tuning, classification
- Data format
- Prompt format
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Howdy!"},
{"role": "user", "content": "What is the type of this model?"},
{"role": "assistant", "content": "[[1]]"},
]
}
Prompt Format for llama-3-instruct
:
system: "<|start_header_id|>system<|end_header_id|>\n\n{instruction}<|eot_id|>"
user: "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|>"
assistant: "<|start_header_id|>assistant<|end_header_id|>\n\n{instruction}<|eot_id|>"
system_in_user: False
Use-case: casual language modeling (aka continued pre-training), custom prompt formats (for example Llama-guard)
Example Continued pre-training (JSON):
{
"messages": [
{"role": "user", "content": "Once upon a time ..."},
],
},
{
"messages": [
{"role": "user", "content": "..."},
],
}
Prompt Format for doing nothing except concatenation:
system: "{instruction}"
user: "{instruction}"
assistant: "{instruction}"
system_in_user: False
Flexible task types
- Causal language modeling: Loss considers predictions for all the tokens.
- Instruction tuning: Considers only "assistant" tokens in the loss.
- Classification: Predicts only a user-defined set of labels based on past tokens.
- Preference tuning: Uses the contrast between chosen and rejected messages to improve the model.
- Vision-language instruction tuning: Predicts assistant tokens based on a mix of past image and text tokens.
Multi-stage continuous fine-tuning
- Fine-tune on one dataset, then continue fine-tuning on another dataset, for iterative improvements.
- Do continued pre-training on one dataset, then chat-style fine-tuning on another dataset.
- Do continued pre-training on one dataset followed by iterations of supervised-finetuning and preference tuning on independent datasets.
Context length extension
- Extend the context length of the model using methods like RoPE scaling.
Configurable hyper-parameters
- Full control over learning hyperparameters such as learning rate, number of epochs, batch size, etc.
Anyscale and third-party integrations
- (Coming soon) Model registry:
- SDK for accessing fine-tuned models for creating automated pipelines
- More streamlined deployment flow when you fine-tune on Anyscale
- Monitoring and observability:
- Take advantage of standard logging frameworks such as Weights & Biases and MLflow
- Use of Ray dashboard and Anyscale loggers for debugging and monitoring the training process
- Anyscale jobs integration: Use Anyscale's job submission API to programmatically submit long-running jobs through LLMForge