Fine-Tuning with LoRA vs Full-parameter
In this guide, we explain the nuances of fine-tuning with LoRA (Low-Rank Adaptation) versus full-parameter fine-tuning.
Comparison
Full-parameter fine-tuning takes the LLM "as is" and trains it on the given dataset. In principle, this is regular supervised training like in the pre-training stage of the LLM. You can expect full-parameter fine-tuning to result in slightly higher model quality.
LoRA (Low-Rank Adaptation) is a fine-tuning technique that freezes all the weights of your LLM and adds a few parameters to it that get fine-tuned instead. These additional parameters make up a LoRA checkpoint. There are three important things to take away from this:
- Since all the original weights are frozen, they don't have to be optimized and therefore don't take up as many resources during fine-tuning. In practice, you can fine-tune on a smaller cluster.
- Since the checkpoint only consists of the few additional parameters, it is very small. If we load the original model into memory, we can swap out the fine-tuned weights quickly. Therefore, it makes for an efficient scheme for serving many fine-tuned models alongside each other.
- Optimizing few parameters has a regularization effect - "it learns less and forgets less"
You can find a more in-depth analysis of this topic here. The domain also has an effect on LoRA's performance. Depending on the domain, it may perform the same or slightly worse than full-parameter fine-tuning.
How to configure LoRA vs full-parameter fine-tuning jobs
To configure a run for LoRA you must specify the lora_config
in the YAML file. Here is an example of a LoRA configuration:
lora_config:
# Determines the rank of the matrices that we fine-tune. Higher rank means more parameters to fine-tune. Increasing the rank gives you diminishing returns.
r: 8
# Scales the learnt LoRA weights. A value 16 is common practice and is not advised to be fine-tuned.
lora_alpha: 16
# Rate at which LoRA weights are dropped out. Can act as a regularizer.
lora_dropout: 0.05
# The modules of the LLM that we want to fine-tune with LoRA.
target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- embed_tokens
- lm_head
# This should be explicitly set to an empty list.
modules_to_save: []
This blog post has information on what the parameters mean. You generally don't need to change any of these parameters. Changing the parameters has negligible benefit.
Fine-tune LoRA with a learning rate of about 1e-4. You can increase it slightly if training is stable enough. LoRA is rather sensitive to the learning rate. For optimal performance, target all possible layers with LoRA. Choosing a higher rank gives very minor improvements. See (this paper) for more details.
If lora_config
is not specified, the run will be a full-parameter one. We advise to use a small learning rate of about 1e-5 here. You can increase it slightly if training is stable enough.
When to use LoRA vs full-parameter fine-tuning?
There is no general answer to this but here are some things to consider:
- The quality of the fine-tuned models will, in most cases, be comparable if not the same
- LoRA shines if:
- You want to serve many fine-tuned models at once yourself
- You want to rapidly experiment (because fine-tuning, downloading and serving the model take less time)
- Full-parameter shines if:
- You want to make sure that your fine-tuned model has the maximum quality
- You want to serve only one fine-tuned version of the model