Choose a framework for LLM post-training
Choose a framework for LLM post-training
This page helps you choose the right framework for post-training large language models (LLMs) on Anyscale.
Recommended approaches
Anyscale supports post-training techniques including Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from Verifiable Rewards (RLVR) and agentic tuning. For more details, see Post-training for LLMs on Anyscale. The following table compares three frameworks for LLM post-training and agentic tuning on Anyscale:
Framework | Use cases and key features | Setup effort | Integration with Anyscale and Ray |
---|---|---|---|
LLaMA-Factory |
| Easy — official guides and examples; minimal code when using provided configs. | Use Ray train as orchestration; official Anyscale guides; runs on Ray clusters on Anyscale for scale-out training. |
SkyRL |
| Medium — powerful and flexible; requires environment setup and Ray cluster familiarity. | Uses Ray for orchestration; deployable on Anyscale Ray clusters. |
Ray Train |
| Medium — you write training code, but many end-to-end examples exist. | Core Ray training library; first-class on Anyscale (Ray clusters on Anyscale; Train scales across nodes natively). |
Other LLM post-training and RL frameworks
While the above recommendations cover most practical needs, many other open-source RL libraries using Ray are easy to run on Anyscale as well, such as verl and NeMo-RL. For more information on these libraries, see the Open Source RL Libraries for LLMs blog post.
Because most of these libraries are built on Ray, they an run on any Ray cluster, including the ones provided on Anyscale. For an example of running verl on Anyscale, see Train LLMs with reinforcement learning using verl. For a detailed overview and comparison of additional libraries, see Open Source RL Libraries for LLMs.
This comparison reflects current practices and may evolve with new releases. For the latest information, consult Anyscale documentation or the linked resources.