GRPO with SkyRL
GRPO with SkyRL
This example uses SkyRL to run GRPO training on the GSM8K dataset.
SkyRL is a modular and extensible reinforcement learning library for training large language models. It supports RL algorithms like PPO, GRPO, and DAPO, tool-use tasks, and multi-turn agentic workflows.
Install the Anyscale CLI
pip install -U anyscale
anyscale login
Deploy the service
Clone the example from GitHub.
git clone https://github.com/anyscale/examples.git
cd examples/skyrl
Deploy the service.
anyscale job submit -f job.yaml
Understanding the example
- The entrypoint defined in the job.yaml first runs a script to download the GSM8K dataset and store it under
/mnt/cluster_storage/data/gsm8k. The/mnt/cluster_storage/directory is an ephemeral shared filesystem attached to the cluster for the duration of the job (this ensures that all workers have access to the data). - The main entrypoint,
skyrl_train.entrypoints.main_base, is run usinguv, which picks up the relevant pyproject.toml file in the SkyRL repository. That file specifies a Ray version, but we actually want to use the version of Ray used in the existing Ray cluster on Anyscale, which is why theuv runcommand includes the flag--with ray@http://localhost:9478/ray/ray-2.48.0-cp312-cp312-manylinux2014_x86_64.whl. - In this example, we cannot set the
working_dirargument in the job yaml file becauseuvwill look for the appropriatepyproject.tomlfile in that working directory (and won't find it) instead of in the correct directory$HOME/SkyRL/skyrl-train. - To store checkpoints in a persistent location, you can pass
ckpt_pathinto the entrypoint. Read more about Anyscale storage options. This example saves checkpoints to a mounted shared filesystem viackpt_path=/mnt/shared_storage/skyrl_checkpoints. To save checkpoints to blob storage, setckpt_path=$ANYSCALE_ARTIFACT_STORAGE/skyrl_checkpoints. On AWS you will also need to modify the main entrypoint to include--with s3fsin theuv runcommand, and you'll need--with gcsfson GCP.
View the job
View the job in the jobs tab of the Anyscale console.