Skip to main content
Version: Latest

Generate an embedding

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

Setup your environment

Create an API key on the Credentials page under your account.

Set the following environment variables.

export ANYSCALE_BASE_URL="https://api.endpoints.anyscale.com/v1"
export ANYSCALE_API_KEY="esecret_YOUR_API_KEY"

You can find more details about authentication here.

Select a model

tip

Start with the 70B version, and then work your way down to the smaller models.

Anyscale supports the following models:

Query an embedding model

tip

If you are starting a project from scratch, use the OpenAI Python SDK instead of cURL or Python.

Embedding models

The following is an example of querying the thenlper/gte-large embedding model.

import openai

client = openai.OpenAI(
base_url = "https://api.endpoints.anyscale.com/v1",
api_key = "esecret_YOUR_API_KEY"
)

# Note: not all arguments are currently supported and will be ignored by the backend.
embedding = client.embeddings.create(
model="thenlper/gte-large",
input="Your text string goes here",
)
print(embedding.model_dump())

The output looks like the following:

{
'data': [
{'embedding': [...],
'index': 0,
'object': 'embedding'
}
],
'model': 'thenlper/gte-large',
'object': 'list',
'usage': {
'prompt_tokens': 7,
'total_tokens': 7
},
'id': 'thenlper/gte-large-UEpQEaduAoaC6rq5n1yxkYNalVukLBhMzkG7IV_GPgU',
'created': 1701325873
}

Rate limiting

Anyscale Endpoints rate limits work a little differently than other comparable platforms. The limits are based on the number of concurrent requests in flight, not on the number of tokens or requests per second. Meaning you aren't limited in the number of requests you send, but based on how many you send at once.

The current default limit is 30 concurrent requests. Reach out to endpoints-help@anyscale.com if you have a use case that needs more.