Skip to main content

RAG quickstart on Anyscale

RAG quickstart on Anyscale

This guide shows you how to build and scale a production-ready Retrieval-Augmented Generation (RAG) pipeline on Anyscale, from data ingestion and embedding to LLM deployment and evaluation.

Deploy scalable RAG on Anyscale

Access and use the Anyscale RAG template to deploy production-ready RAG systems that require high availability and scalability. The template provides a comprehensive tutorial that takes you from a basic prototype to a production-grade system.

Use the Anyscale template

To deploy RAG on Anyscale, follow these steps:

  1. Sign up for an Anyscale account.
  2. Navigate to the Templates or Examples section in the Anyscale console.
  3. Find the Distributed RAG pipeline template.
  4. Click Launch to deploy the workspace.
  5. Follow the template instructions and customize the deployment for your use case.

The template includes a notebook tutorial series covering the following topics:

#NotebookDescription
1A simple RAG data ingestion pipelineBuild a basic RAG ingestion pipeline using standard libraries such as LangChain and Chroma DB. Learn about fundamental RAG components and identify performance bottlenecks.
2Scaling RAG data ingestion with RayRebuild the data ingestion pipeline using Ray Data to parallelize document parsing, chunking, and embedding across a multi-node, heterogeneous (CPU and GPU) cluster.
3Deploying LLMs with Ray ServeDeploy large open-source models as scalable, OpenAI-compatible API endpoints using Ray Serve LLM.
4Building the RAG query pipelineConnect all components to build a user query pipeline that takes user queries, embeds them, queries the vector store, and passes context to your deployed LLM.
5Advanced prompt engineering for RAGRefine your system to handle real-world complexity, including chat history, citations, and safety filters for ambiguous or malicious queries.
6Evaluating RAGRun a simple evaluation loop using online inference to understand why this approach is slow, costly, and can destabilize production services.
7Scalable evaluation with Ray DataUse Ray Data to run batch-inference evaluations that generate embeddings, retrieve context, and get LLM responses for thousands of test questions in parallel with Ray Data LLM.

Additional resources

The following are additional resources for learning about RAG on Anyscale: