Skip to main content

Supported tasks

LLMForge supports the following tasks out-of-the-box:

  • Causal language modeling: Loss considers predictions for all the tokens.
  • Instruction tuning: Only "assistant" tokens are considered in the loss.
  • Classification: Only a user-defined set of labels are predicted based on past tokens.
  • Preference tuning: Use the contrast between chosen and rejected messages to improve the model.

The following hyperparameters enable this: task , classifier_config (specific to classification) and preference_tuning_config (specific to preference tuning). Note that by default, task defaults to "causal_lm" unless you specify a task-specific config like classifier_config or preference_tuning_config.

Dataset format

The dataset format for all tasks is the OpenAI chat data format. This means that whether the task is continued pre-training on plain text (the causal_lm task) or classifying messages as safe/unsafe, you need to format the dataset in the OpenAI format. Given below are some example dataset and config snippets for each task.

Task examples

Causal Language Modeling

Example Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct  # any HF model ID
task: "causal_lm"
generation_config:
prompt_format: # does nothing but concatenation
system: "{instruction}"
user: "{instruction}"
assistant: "{instruction}"
system_in_user: False
...

Example dataset:

{
"messages": [
{"role": "user", "content": "Once upon a time ..."},
],
},
Instruction Tuning

Here, only the "assistant" tokens are considered in the loss function. This masking is especially helpful when you've got conversational data with large user/system messages (perhaps few shot examples), allowing you to train only on the tokens that matter.

Example Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct # any HF model ID
task: instruction_tuning
...

Example Dataset:

{
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant": "content": "Hi! How can I help you"}
],
},
Classification

Example Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct # any HF model ID
task: classification # this is optional - can be inferred when classifier_config is provided
classifier_config:
label_tokens:
- "[[1]]"
- "[[2]]"
- "[[3]]"
- "[[4]]"
- "[[5]]"
eval_metrics:
- "prauc"
- "accuracy"
- "f1"
...

In the classifier config, there are two configurable parameters:

  • label_tokens: These are the set of strings representing the labels for classification.
  • eval_metrics: A list of evaluation metrics to be used for the classifier. See the API reference for supported metrics.

Internally, each label is added as a special token to the tokenizer. Model training happens as usual with standard cross-entropy loss, but considers only the final assistant message (with the ground truth label).

Example Dataset:

{
"messages": [
{"role": "system", "content": "[Instruction]\nBased on the question provided below, predict the score an expert evaluator would give to an AI assistant\'s response......"},
{"role": "user",
"content": "[Question]\nI'll give you a review, can you......[Answer]...."},
{"role": "assistant", "content": "[[3]]"}
]
}

For a detailed end-to-end example, see the LLM router template to see how you can fine-tune an LLM to route responses.

Preference Tuning

Llmforge supports preference tuning for language models through Direct Preference Optimization (DPO).

Example Config:

model_id: meta-llama/Meta-Llama-3-8B-Instruct # any HF model ID
task: preference_tuning # this is optional, can be inferred when preference_tuning_config is provided
preference_tuning_config:
# beta parameter in DPO, controlling the amount of regularization
beta: 0.01
logprob_processor_scaling_config:
custom_resources:
accelerator_type:A10G: 0.001 # custom resource per worker.
# Runs reference model logp calculation on 4 GPUs
concurrency: 4
# Batch size per worker
batch_size: 2
...

You can find more details in the dedicated preference tuning guide.