Running LLM fine-tuning template as an Anyscale Job
LLMForge is being deprecated: The Ray Team is consolidating around open source fine-tuning solutions. Llama Factory and Axolotl provide enhanced functionality (quantization, advanced algorithms) and native Ray support for scaling. See the migration guide for transitioning your workflows.
Running LLM fine-tuning template as an Anyscale Job
For developer velocity, use workspaces to run Python scripts. For automation and launching jobs from your laptop without having to spin up a workspace run the fine-tuning workloads with isolated Anyscale Jobs. You may also want to launch production long running jobs through a workspace because you might set your workspace setup as ephemeral.
For specifying a job, specify the command that needs to run, for example, [COMMAND][ARGS], along with the requirements, for example, Docker image, additional, pip packages, etc., in a job YAML and then call anyscale job submit to launch the job on Anyscale.
Assume the following files in your local setup:
.
├── config
│ ├── llama-3-8b.yaml
│ └── zero_3_offload_optim+param.json
└── job_config.yaml
Here is an example content of job_config.yaml for submitting a job:
- From Workspace
- From laptop
name: "llmforge-job"
entrypoint: "llmforge anyscale finetune config/llama-3-8b.yaml"
max_retries: 0
name: "llmforge-job"
entrypoint: "llmforge anyscale finetune config/llama-3-8b.yaml"
image_uri: <replace_with_llmforge_image_uri_value>
max_retries: 0
working_dir: "."
Executing an Anyscale Job within a Workspace will ensure that files in the current working directory are available for the Job (unless excluded with --exclude). But we can also load files from anywhere (ex. GitHub repo, S3, etc.) if we want to launch a Job from anywhere.
These available settings can be found on Anyscale jobs API docs. A few notes:
entrypointis basically the command we want to run. Pay attention to the relative file location (config/llama-3-8b.yaml) and theworking_dir. Insidellama-3-8b.yamlwe are also referencing a relative path toconfig/zero_3_offload_optim+param.json. This works because we specify theworking_dirto be the current directory.when submitting the job from client side. If submitting from the workspace the~/defaultdirectory is treated asworking_dir.image_urirefers to the image that has LLMforge installed. The fine-tuning template automatically lists the latest released image. For the full list of versions and their URIs, see LLMForge releases. If you run this job from a workspace, the job inherits theimage_urifrom the workspace image.max_retries: setting this to zero makes sure we do not keep retrying if the job fails. We should retry only when the job is flaky (maybe due to resource constraints, etc.)working_dir: Settingworking_dirto the current directory is necessary when you're submitting this job from outside of a workspace, for example, your laptop or CI/CD pipelines.
anyscale job submit --config-file ./job_configs/job_workspace.yaml
As the job runs we can go to the provided URL (console.anyscale.con/jobs/prod_job...) to monitor the logs and metrics related to the job.
To provide WANDB_API_KEY you can use env_vars in the job specification YAML.