RAG quickstart on Anyscale

This guide shows you how to build and scale a production-ready Retrieval-Augmented Generation (RAG) pipeline on Anyscale, from data ingestion and embedding to LLM deployment and evaluation.

Deploy scalable RAG on Anyscale

Access and use the Anyscale RAG template to deploy production-ready RAG systems that require high availability and scalability. The template provides a comprehensive tutorial that takes you from a basic prototype to a production-grade system.

Use the Anyscale template

To deploy RAG on Anyscale, follow these steps:

Sign up for an Anyscale account.
Navigate to the Templates or Examples section in the Anyscale console.
Find the Distributed RAG pipeline template.
Click Launch to deploy the workspace.
Follow the template instructions and customize the deployment for your use case.

The template includes a notebook tutorial series covering the following topics:

#	Notebook	Description
1	A simple RAG data ingestion pipeline	Build a basic RAG ingestion pipeline using standard libraries such as LangChain and Chroma DB. Learn about fundamental RAG components and identify performance bottlenecks.
2	Scaling RAG data ingestion with Ray	Rebuild the data ingestion pipeline using Ray Data to parallelize document parsing, chunking, and embedding across a multi-node, heterogeneous (CPU and GPU) cluster.
3	Deploying LLMs with Ray Serve	Deploy large open-source models as scalable, OpenAI-compatible API endpoints using Ray Serve LLM.
4	Building the RAG query pipeline	Connect all components to build a user query pipeline that takes user queries, embeds them, queries the vector store, and passes context to your deployed LLM.
5	Advanced prompt engineering for RAG	Refine your system to handle real-world complexity, including chat history, citations, and safety filters for ambiguous or malicious queries.
6	Evaluating RAG	Run a simple evaluation loop using online inference to understand why this approach is slow, costly, and can destabilize production services.
7	Scalable evaluation with Ray Data	Use Ray Data to run batch-inference evaluations that generate embeddings, retrieve context, and get LLM responses for thousands of test questions in parallel with Ray Data LLM.

Additional resources

The following are additional resources for learning about RAG on Anyscale:

Deploy scalable RAG on Anyscale​

Use the Anyscale template​

Additional resources​

Deploy scalable RAG on Anyscale

Use the Anyscale template

Additional resources