Skip to main content
Version: Canary 🐤

Introduction to Endpoints

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

Anyscale Endpoints offers the best unmodified open-source large language models (LLMs) as fully managed API endpoints. The Anyscale platform provides simple APIs to:

  • Query text generation models
  • Fine-tune LLMs
  • Generate embeddings

Get started

  1. Register an account.
  2. Generate an API key.
  3. Run your first query:
import openai

query = "Write a program to load data from S3 with Ray and train using PyTorch."

client = openai.OpenAI(
base_url = "",
api_key = "esecret_ANYSCALE_API_KEY"
# Note: Anyscale doesn't support all arguments and ignores some of them in the backend.
chat_completion =
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": query}],
for message in chat_completion:
print(message.choices[0].delta.content, end="", flush=True)

See Query a model for more model query details.

Data retention

Anyscale may securely retain your queries or results for up to 30 days to help prevent security or technical problems.

Anyscale treats your queries and results as “Customer Data” under our terms of service (see Anyscale doesn't use your queries or results to train or fine-tune models – unless you use the fine-tuning service, in which case Anyscale tunes the model only for you. If you are interested in fine-tuning a model, reach out to

If you use the fine-tuning feature, Anyscale stores both the fine-tuning data as well as the fine-tuned model (including parameters) within the Anyscale account to provide the fine-tuning feature. You may delete the fine tuning data with the API. See the Endpoints docs for more information. Don't rely on this storage as backup for the fine-tuning data. Keep a backup of any fine-tuning data you wish to keep. If you wish to delete the fine-tuned model, reach out to

Compute resources

Endpoints may utilize shared resources like GPUs and computing nodes. If you require dedicated GPUs, reach out to

See for the latest available Cloud Service Providers, which Anyscale updates periodically. If you require the ability to choose which Cloud Service Provider Anyscale uses to host the endpoints, reach out to

Account deletion

To close your account, reach out to us at Anyscale can help you delete your account if you no longer wish to use it.