Fine-tuning LLMs with LLaMA-Factory on Anyscale
⏱️ Time to complete: ~20 mins, which includes the time to train the model
Looking to get the most out of your LLM workloads? Fine-tuning pretrained LLMs can often be the most cost effective way to improve model performance on the tasks that you care about. This template walks through how to use the popular open source library LLaMA-Factory to fine-tune LLMs on Anyscale. LLaMA-Factory comes with a Ray integration for scaling training to multiple GPUs and nodes.
Getting started with LLaMA-Factory
First, you need to install the LLaMA-Factory code. You can view the latest changes from the LLaMA-Factory GitHub.
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
# Install extras separately so Anyscale can track the dependencies on worker nodes.
pip install torch jieba nltk rouge-chinese
pip install -e .
Next, you need to set your HF_TOKEN
and WANDB_API_KEY
environment variables. It's recommended to set them in the environment variables section of the Dependencies tab in Workspaces. Simply running export HF_TOKEN=<HF_TOKEN_HERE>
, or huggingface-cli login
works for single node compute configurations, but doesn't propagate the environment variables to worker nodes.
Additionally it's recommended to use hf_transfer
for fast model downloading by running pip install hf_transfer
, and setting HF_HUB_ENABLE_HF_TRANSFER=1
in the Dependencies tab. Setting HF_HUB_ENABLE_HF_TRANSFER=1
without first installing hf_transfer
will cause errors. An example of setting relevant environment variables in the Dependencies tab is shown below.

You can find some example configs that use Ray with LLaMA-Factory for pretraining and SFT (instruction tuning) in the llamafactory_config
directory. Find a full set of examples with various models, tasks, and datasets on the LLaMA-Factory GitHub.
LLaMA-Factory provides a CLI llamafactory-cli
that allows you to launch training with a config YAML file. For example, you can launch a supervised fine-tuning job with Ray by running the following command:
cd .. # Return to top level directory.
USE_RAY=1 llamafactory-cli train llamafactory_configs/llama3_lora_sft_ray.yaml
This command runs an instruction tuning training job with Meta-Llama-3-8B-Instruct
on an example subset of the alpaca dataset. By default, Ray logs training statistics with all installed logging libraries, like W&B, MLflow, comet, tensorboard, because LLaMA-Factory relies on Hugging Face's Trainer integration for logging. To specify a single specific library to log with, set report_to: <LIBRARY_NAME>
in the config YAML.
Preparing your dataset
The above config (llamafactory_configs/llama3_lora_sft_ray.yaml
) runs out of the box, because it assumes that the dataset and dataset information is located on the Hugging Face hub, and specifies it as below:
...
### dataset
dataset: identity,alpaca_en_demo
dataset_dir: REMOTE:llamafactory/demo_data # or use local absolute path
...
However, by default LLaMA-Factory reads dataset information from a local file dataset_info.json
, which you can find in LLaMA-Factory/data/
)
To specify new datasets that are accessible across Ray worker nodes, you must first add dataset_info.json
and any data files to shared storage like /mnt/cluster_storage
.
For example, if you wanted to run pretraining on the c4_demo
dataset, first go through the following setup steps:
# Create a copy of the data in /mnt/cluster_storage
cp LLaMA-Factory/data/c4_demo.json /mnt/cluster_storage/
# Create `/mnt/cluster_storage.dataset_info.json` containing:
# {
# "c4_demo": {
# "file_name": "/mnt/cluster_storage/c4_demo.json",
# "columns": {
# "prompt": "text"
# }
# },
# }
echo '{"c4_demo":{"file_name":"/mnt/cluster_storage/c4_demo.json","columns":{"prompt":"text"}}}' > /mnt/cluster_storage/dataset_info.json
In llamafactory_configs/llama3_lora_pretrain_ray.yaml
the relevant dataset_dir
argument is already specified.
...
### dataset
dataset: c4_demo
dataset_dir: /mnt/cluster_storage # or use local absolute path
...
To launch fine-tuning with this local dataset, you can run
USE_RAY=1 llamafactory-cli train llamafactory_configs/llama3_lora_pretrain_ray.yaml
Running LLaMA-Factory in an Anyscale job
To run LLaMA-Factory as an Anyscale job, create a custom container image that comes with LLaMA-Factory installed. You can specify the relevant packages in your dockerfile. For example, you could create an image using the anyscale/ray-ml:2.42.0-py310-gpu
base image as follows.
# Start with an Anyscale base image.
FROM anyscale/ray-ml:2.42.0-py310-gpu
WORKDIR /app
RUN git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git && \
cd LLaMA-Factory && \
pip install --no-cache-dir torch jieba nltk rouge-chinese && \
pip install -e .
You can then use this image to run your job with the Anyscale jobs CLI locally or from a workspace. For example, from a workspace configured with the image built above, you could run the following command to launch fine-tuning as a job.
anyscale job submit --wait --env USE_RAY=1 -- llamafactory-cli train llamafactory_configs/llama3_lora_sft_ray.yaml