Skip to main content
Version: Canary 🐤
note

LLMForge APIs are in Beta.

Fine-tuning Open-weight LLMs with Anyscale

Fine-tuning LLMs is an easy and cost-effective way to tailor their capabilities towards niche applications with high-acccuracy. While Ray and RayTrain offer generic primitives for building such workloads, at Anyscale we have created a higher-level library called LLMForge that builds on top of Ray and other open-source libraries to provide an easy to work with interface for fine-tuning and training LLMs.

What is LLMForge?

LLMForge is a library that implements a collection of design patterns that use Ray, Ray Train, and Ray Data in combination with other open-source libraries (for example DeepSpeed, 🤗 Hugging Face Accelerate, Transformers, etc.) to provide an easy to use library for fine-tuning LLMs. In addition to these design patterns, it offers tight integrations with the Anyscale platform, such as model registry, streamlined deployment, observability, Anyscale's job submission, etc.

Configurations

LLMForge workloads are specified using YAML configurations (documentation here). The library offers two main modes: default and custom.

Similar to OpenAI's fine-tuning experience, the default mode provides a minimal and efficient setup. It allows you to quickly start a fine-tuning job by setting just a few parameters (model_id and train_path). All other settings are optional and Anyscale automatically selects them based on dataset statistics and predefined configurations.

Here's a comparison of the two modes:

FeatureDefault ModeCustom Mode
Ideal ForPrototyping what's possible, focusing on dataset cleaning, fine-tuning, and evaluation pipelineOptimizing model quality by controlling more parameters, hardware control
Commandllmforge anyscale finetune config.yaml --defaultllmforge anyscale finetune config.yaml
Model SupportPopular models with their prompt format (for example meta-llama/Meta-Llama-3-8B-Instruct)*Any HuggingFace model, any prompt format (for example meta-llama/Meta-Llama-Guard-2-8B)
Task SupportInstruction tuning for multi-turn chatCausal language modeling, Instruction tuning, Classification
Data FormatSupports chat-style datasets, with fixed prompt formats per modelSupports chat-style datasets, with flexible prompt format
HardwareAutomatically selected (limited by availability)User-configurable
Fine-tuning typeOnly supports LoRA (Rank-8, all linear layers)User-defined LoRA and Full-parameter

*NOTE: old models will get deprecated

Choose the mode that best fits your project requirements and level of customization needed.

Note:

  • Cluster type for all models: 8xA100-80G
  • Supported context length for models: 512 up to max. context length of each model in powers of 2.
Models Supported in Default Mode

Default mode supports a select list of models, with a fixed cluster type of 8xA100-80G. For each model we only support context lengths of 512 up to Max. context length in increments of 2x (that is, 512, 1024, ...). Here are the supported models and their configurations:

Model familymodel_idMax. context lengths
Llama-3.1meta-llama/Meta-Llama-3.1-8B-Instruct4096
Llama-3.1meta-llama/Meta-Llama-3.1-70B-Instruct4096
Llama-3meta-llama/Meta-Llama-3-8B-Instruct4096
Llama-3meta-llama/Meta-Llama-3-70B-Instruct4096
Mistralmistralai/Mistral-Nemo-Instruct-24074096
Mistralmistralai/Mistral-7B-Instruct-v0.34096
Mixtralmistralai/Mixtral-8x7B-Instruct-v0.14096

Summary of Features in Custom Mode

✅ Support both Full parameter and LoRA

  • LoRA with different configurations, ranks, layers, etc. (Anything supported by Hugging Face transformers)
  • Full-parameter with multi-node training support
  • Gradient checkpointing
  • Mixed precision training
  • Flash attention v2
  • DeepSpeed support (zero-DDP sharding)

✅ Unified chat data format with flexible prompt format support enabling fine tuning for:

Use-case: Multi-turn chat, Instruction tuning, Classification
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Howdy!"},
{"role": "user", "content": "What is the type of this model?"},
{"role": "assistant", "content": "[[1]]"},
]
}
Use-case: Casual language modeling (aka continued pre-training), custom prompt formats (for example Llama-guard)

Example Continued pre-training (JSON):

{
"messages": [
{"role": "user", "content": "Once upon a time ..."},
],
},
{
"messages": [
{"role": "user", "content": "..."},
],
}

Prompt Format for doing nothing except concatenation:

system: "{instruction}"
user: "{instruction}"
assistant: "{instruction}"
system_in_user: False

✅ Flexible task support:

  • Causal language modeling: Each token predicted based on all past tokens.
  • Instruction tuning: Only assistant tokens are predicted based on past tokens.
  • Classification: Only special tokens in the assistant message are predicted based on past tokens.
  • Preference tuning: Use the contrast between chosen and rejected messages to improve the model.

✅ Support for multi-stage continuous fine-tuning

  • Fine-tune on one dataset, then continue fine-tuning on another dataset, for iterative improvements.
  • Do continued pre-training on one dataset, then chat-style fine-tuning on another dataset.
  • Do continued pre-training on one dataset followed by iterations of supervised-finetuning and preference tuning on independent datasets.

✅ Support for context length extension

  • Extend the context length of the model using methods like RoPE scaling.

✅ Configurability of hyper-parameters

  • Full control over learning hyperparameters such as learning rate, number of epochs, batch size, etc.

✅ Anyscale and third-party integrations

  • (Coming soon) Model registry:
    • SDK for accessing fine-tuned models for creating automated pipelines
    • More streamlined deployment flow when you fine-tune on Anyscale
  • Monitoring and observability:
    • Take advantage of standard logging frameworks such as Weights and Biases
    • Use of Ray dashboard and Anyscale loggers for debugging and monitoring the training process
  • Anyscale jobs integration: Use Anyscale's job submission API to programitically submit long-running jobs through LLMForge

Example Configs

Here are some examples for default mode and custom mode. To run these examples, you can open-up the fine-tuning template as workspace on Anyscale and run the commands in the terminal. The example configs can be found under ./training_configs. Outside of a workspace you can also find them here.

Fine-tune llama-3-8b-instruct in default mode (LoRA rank 8). Just giving the dataset.

Command:

llmforge anyscale finetune training_configs/default/meta-llama/Meta-Llama-3-8B-Instruct-simple.yaml --default

Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
Fine-tune llama-3-8b-instruct in default mode but also control parameters like learning_rate and num_epochs.

Command:

llmforge anyscale finetune training_configs/custom/meta-llama--Meta-Llama-3-8B-Instruct/lora/16xA10-512.yaml

Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct
train_path: s3://...
valid_path: s3://...
context_length: 512
deepspeed:
config_path: deepspeed_configs/zero_3_offload_optim+param.json
worker_resources:
accelerator_type:A10G: 0.001