Skip to main content

Fine-Tuning with LoRA vs Full-parameter

In this guide, we explain the nuances of fine-tuning with LoRA (Low-Rank Adaptation) versus full-parameter fine-tuning.

Comparison

Full-parameter fine-tuning takes the LLM "as is" and trains it on the given dataset. In principle, this is regular supervised training like in the pre-training stage of the LLM. You can expect full-parameter fine-tuning to result in slightly higher model quality.

LoRA (Low-Rank Adaptation) is a fine-tuning technique that freezes all the weights of your LLM and adds a few parameters to it that get fine-tuned instead. These additional parameters make up a LoRA checkpoint. There are three important things to take away from this:

  1. Since all the original weights are frozen, they don't have to be optimized and therefore don't take up as many resources during fine-tuning. In practice, you can fine-tune on a smaller cluster.
  2. Since the checkpoint only consists of the few additional parameters, it is very small. If we load the original model into memory, we can swap out the fine-tuned weights quickly. Therefore, it makes for an efficient scheme for serving many fine-tuned models alongside each other.
  3. Optimizing few parameters has a regularization effect - "it learns less and forgets less"

You can find a more in-depth analysis of this topic here. The domain also has an effect on LoRA's performance. Depending on the domain, it may perform the same or slightly worse than full-parameter fine-tuning.

How to configure LoRA vs full-parameter fine-tuning jobs

To configure a run for LoRA you must specify the lora_config in the YAML file. Here is an example of a LoRA configuration:

lora_config:
# Determines the rank of the matrices that we fine-tune. Higher rank means more parameters to fine-tune. Increasing the rank gives you diminishing returns.
r: 8
# Scales the learnt LoRA weights. A value 16 is common practice and is not advised to be fine-tuned.
lora_alpha: 16
# Rate at which LoRA weights are dropped out. Can act as a regularizer.
lora_dropout: 0.05
# The modules of the LLM that we want to fine-tune with LoRA.
target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- embed_tokens
- lm_head
# This should be explicitly set to an empty list.
modules_to_save: []

This blog post has information on what the parameters mean. You generally don't need to change any of these parameters. Changing the parameters has negligible benefit.

Fine-tune LoRA with a learning rate of about 1e-4. You can increase it slightly if training is stable enough. LoRA is rather sensitive to the learning rate. For optimal performance, target all possible layers with LoRA. Choosing a higher rank gives very minor improvements. See (this paper) for more details.

If lora_config is not specified, the run will be a full-parameter one. We advise to use a small learning rate of about 1e-5 here. You can increase it slightly if training is stable enough.

When to use LoRA vs full-parameter fine-tuning?

There is no general answer to this but here are some things to consider:

  • The quality of the fine-tuned models will, in most cases, be comparable if not the same
  • LoRA shines if:
    • You want to serve many fine-tuned models at once yourself
    • You want to rapidly experiment (because fine-tuning, downloading and serving the model take less time)
  • Full-parameter shines if:
    • You want to make sure that your fine-tuned model has the maximum quality
    • You want to serve only one fine-tuned version of the model