Skip to main content

LlamaIndex integration guide

LlamaIndex is a data framework designed to support developers in building applications powered by Large Language Models (LLMs). It offers tools for data ingestion, structuring, retrieval, and easy integration with Anyscale Private Endpoints.

This example shows how to build a query engine, that is, a chatbot that can answer questions based on a collection of documents.

Step 0: Install dependencies

pip install llama-index>=0.9.8
pip install langchain>=0.0.341

Step 1: Loading documents

In order for the model to answer questions and generate insights from private data, you need to provide a folder of documents to build a vector index with. LlamaIndex's SimpleDirectoryReader uses the file extensions to select the best file readers to load in the documents.

from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Specify the path to your data folder.
data_folder = 'DATA_FOLDER_PATH'

# Load documents from the directory.
documents = SimpleDirectoryReader(data_folder).load_data()

Step 2: Setup ServiceContext with Anyscale

A ServiceContext is a utility container in LlamaIndex for index and query classes that allows you to specify which supported open source model from Anyscale Private Endpoints to use as the LLM and which text embedding model from OpenAI to use to generate vectors.

For a quick demo, you can paste in your API key, but for development, follow best practices for setting your API base and key.

# Your Anyscale and OpenAI API tokens

Option 1: LlamaIndex Anyscale

You can set a ServiceContext using LlamaIndex's integration with Anyscale.

from llama_index import ServiceContext, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.anyscale import Anyscale

# Initialize the service context with LLM and embedding model
service_context = ServiceContext.from_defaults(

Option 2: LangChain ChatAnyscale

Alternatively, you can set a ServiceContext using LangChain's integration with Anyscale.

from langchain.chat_models import ChatAnyscale
from llama_index import LLMPredictor, ServiceContext, VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding

# Initialize the service context with LLM and embedding model
service_context = ServiceContext.from_defaults(

Step 3: Run queries

With the documents and ServiceContext prepared, you can create an index for the documents with VectorStoreIndex and execute queries against it.

# Create the index from documents using the service context
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Convert the index into a query engine
query_engine = index.as_query_engine()

# Run a query against the index
response = query_engine.query("Sample query message.")