LlamaIndex integration

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

LlamaIndex is a data framework for helping developers build LLM applications by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks.

The following is an example of embedding, indexing, and querying with LlamaIndex.

Install llama_index>=0.9.30 and langchain>=0.0.336 packages for this demo.

First, load the documents from data_folder:

from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader('data_folder').load_data()

Use Anyscale support from LlamaIndex to create a ServiceContext. It includes two modules: Anyscale, which initializes an LLM to work with Anyscale and AnyscaleEmbedding, which initializes an embedding model to work with Anyscale.

from llama_index import ServiceContext, VectorStoreIndex
from llama_index.llms import Anyscale
from llama_index.embeddings import AnyscaleEmbedding

ANYSCALE_ENDPOINT_TOKEN = "YOUR_ANYSCALE_TOKEN"
service_context = ServiceContext.from_defaults(
    llm=Anyscale(model = "meta-llama/Llama-2-70b-chat-hf",
                 api_key=ANYSCALE_ENDPOINT_TOKEN),
    embed_model=AnyscaleEmbedding(model="thenlper/gte-large",
                                  api_key=ANYSCALE_ENDPOINT_TOKEN),
    chunk_size=500
)

You can also use ChatAnyscale from LangChain to build an LLM for the ServiceContext.

from langchain.chat_models import ChatAnyscale

service_context = ServiceContext.from_defaults(
    llm=ChatAnyscale(
        anyscale_api_key=ANYSCALE_ENDPOINT_TOKEN,
        model_name="meta-llama/Llama-2-70b-chat-hf"),
    embed_model=AnyscaleEmbedding(
        model="thenlper/gte-large",
        api_key=ANYSCALE_ENDPOINT_TOKEN),
    chunk_size=500
)

Next, create the index for the documents provided with VectorStoreIndex. and run a query against it.

index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("Sample Query Texts")
print(response)