OpenAI migration guide
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Anyscale Private Endpoints is an interface compatible with OpenAI's Chat Completions API. It ensures seamless integration and functionality that matches the capabilities of the original API, all through Anyscale's secure and scalable infrastructure.
Transitioning from OpenAI's proprietary models to the best open source LLMs with Anyscale Private Endpoints is a seamless experience. The OpenAI Python Library allows developers to integrate OpenAI models into Python applications. This guide covers the minimal changes needed to switch over to using Anyscale.
Step 0: Prerequisites
This guide assumes that you have already installed the OpenAI Python library.
Step 1: Change your API base and key
Refer to the "Get started" page for instructions on how to set your Anyscale API base URL and key.
Update your API key and API base URL to direct to your Anyscale Private Endpoints rather than OpenAI's default settings. Explicitly set the environment variables as shown below:
import os
from openai import OpenAI
client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']
)
Step 2: Change the model name
The primary entrypoint to GPT models is OpenAI's Chat Completion API, which takes a list of messages and returns a generated message. To use Anyscale Private Endpoints, change the name of the model to the name of your deployed open source model.
response = client.chat.completions.create(
model="meta-llama/Llama-2-70b-chat-hf", # Choose any supported model.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
]
)
print(response.choices[0].message)
Step 3: Check parameter compatibility
The Anyscale Private Endpoints implementation of the Chat Completions API encompasses three categories of parameters:
- Supported parameters
messages
- (Required; array) A list of messages comprising the conversation so far.model
- (Required; string) The ID of the model to use. Select from the list of supported models.frequency_penalty
- (number) A number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.max_tokens
- (integer) The maximum number of tokens to generate in the chat completion.presence_penalty
- (number) A number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.stop
- (string/array) Up to 4 sequences where the API stops generating further tokens.stream
- (boolean) If true, send messages token by token. If false, messages send in bulk.temperature
- (number) A number between 0 and 2. Higher values correspond to more random responses and lower values being more deterministic.top_p
- (number) The percentage of tokens withtop_p
probability mass to consider. For example, 0.1 means only tokens comprising the top 10% probability mass become candidates.
-
Unsupported parameters
The ChatCompletions API may offer additional parameters beyond the preceding ones, but Anyscale Private Endpoints doesn't support them. Based on demand and compatibility, future updates may include them.
-
Extended parameters
In addition to the native API parameters, Anyscale Private Endpoints introduces a set of extended parameters designed to enhance the base capability.
top_k
- (integer) The number of highest probability vocabulary tokens to keep for top-k-filtering.
Optional: Using Anyscale Private Endpoints and OpenAI together
It's common for an LLM app to use a combination of models to serve requests to balance cost and quality. For example, you may use a collection of smaller open source fine-tuned models to address specialized queries and a foundational model like GPT-4 to field everything else.
To use Anyscale Private Endpoints in an app with OpenAI endpoints, pass your Anyscale api_key
and base_url
into the OpenAI client.
Before proceeding, make sure that you set your OPENAI_BASE_URL
and OPEN_API_KEY
environment variables to your OpenAI API key to avoid calling the Anyscale API by default.
- Anyscale Private Endpoints
- OpenAI API Endpoints
import os
from openai import OpenAI
client = OpenAI(
api_key = os.environ['ANYSCALE_API_KEY'],
base_url = os.environ['ANYSCALE_API_BASE']
)
response = client.chat.completions.create(
model="meta-llama/Llama-2-70b-chat-hf", # Fill in your deployed model.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
]
)