Skip to main content

Task types

LLMForge supports the following tasks out-of-the-box:

  • Causal language modeling: Loss considers predictions for all the tokens.
  • Instruction tuning: Considers only "assistant" tokens in the loss.
  • Classification: Predicts only a user-defined set of labels based on past tokens.
  • Preference tuning: Uses the contrast between chosen and rejected messages to improve the model.
  • Vision-language instruction tuning: Predicts assistant tokens based on a mix of past image and text tokens.

The following hyperparameters enable tasks:

  • task
  • classifier_config, which is specific to classification
  • preference_tuning_config, which is specific to preference tuning
  • vision_language_config, which is specific to vision-language instruction tuning

Note that by default, task defaults to "causal_lm" unless you specify a task-specific config like classifier_config, preference_tuning_config, or vision_language_config.

Dataset format

You must format the dataset in the OpenAI format for all tasks - whether you're continuing pre-training on plain text, running the causal_lm task, or classifying messages as safe or unsafe. Find details for how to format data for each task type under Data formats and task configs.