Output log probabilities

Changes to Anyscale Endpoints API

Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.

With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.

When using logprobs the LLM outputs the log probabilities of each output token during generation.

There are two relevant parameters for this mode:

logprobs: default=False - When set to True the LLM outputs the log probabilities of each output token during generation.
top_logprobs: default=None - When set to an integer value, the LLM outputs the log probabilities of the top top_logprobs most likely tokens at each token position during generation. top_logprobs must be between 0 and 5.

warning

Anyscale doesn't support meta-llama/Llama-2-70b-chat-hf and meta-llama/Llama-2-13b-chat-hf.

Example

cURL
Python
OpenAI Python SDK

curl "$ANYSCALE_BASE_URL/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ANYSCALE_API_KEY" \
  -d '{
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Say 'Test'."}],
    "temperature": 0.7,
    "logprobs": true,
    "top_logprobs": 1
  }'

import os
import requests

s = requests.Session()

api_base = os.getenv("ANYSCALE_BASE_URL")
token = os.getenv("ANYSCALE_API_KEY")
url = f"{api_base}/chat/completions"
body = {
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say 'Test'."}
    ],
    "temperature": 0.7,
    "logprobs": True,
    "top_logprobs": 1
}

with s.post(url, headers={"Authorization": f"Bearer {token}"}, json=body) as resp:
    print(resp.json())

import openai

client = openai.OpenAI(
    base_url = "https://api.endpoints.anyscale.com/v1",
    api_key = "esecret_YOUR_API_KEY")

# Note: not all arguments are currently supported and will be ignored by the backend.
chat_completion = client.chat.completions.create(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say 'Test'."}
    ],
    temperature=0.7,
    logprobs=True,
    top_logprobs=1
)
print(chat_completion.model_dump())

Example output:

{"id":"mistralai/Mixtral-8x7B-Instruct-v0.1",
 "object":"text_completion",
 "created":1705450393,
 "model":"mistralai/Mixtral-8x7B-Instruct-v0.1",
 "choices": [{"message":
                {"role":"assistant",
                 "content":"Test.",
                 "tool_calls":null,
                 "tool_call_id":null},
                 "index":0,
                 "finish_reason":"stop",
                 "logprobs":{"content":
    [
        {"token":"Test",
         "logprob":-0.12771208584308624,
         "bytes":[84,101,115,116],
         "top_logprobs": [
                {"logprob":-0.12771208584308624,
                "token":"Test",
                "bytes":[84,101,115,116]
                }
            ]
        },
        {"token":".",
         "logprob":-0.0008685392094776034,
         "bytes":[46],
         "top_logprobs": [
                {"logprob":-0.0008685392094776034,
                 "token":".",
                 "bytes":[46]
                }
            ]
        },
        {"token":"",
        "logprob":0.0,
        "bytes":[],
        "top_logprobs":[
                {"logprob":0.0,
                "token":"",
                "bytes":[]
                }
            ]
        }
    ]
                            }
            }],
"usage": {"prompt_tokens": 26,
          "completion_tokens": 3,
          "total_tokens":29
         }
}

Example​

Example