Migrate from OpenAI
Introduction
Effective August 1, 2024 Anyscale Endpoints API will be available exclusively through the fully Hosted Anyscale Platform. Multi-tenant access to LLM models will be removed.
With the Hosted Anyscale Platform, you can access the latest GPUs billed by the second, and deploy models on your own dedicated instances. Enjoy full customization to build your end-to-end applications with Anyscale. Get started today.
To make migration from OpenAI to Anyscale as easy as possible, there are a number of similarities between workflows. However there are a small number of things that must change to complete the migration.
These notes cover migration of Python code that uses the official Python OpenAI library. Although untested in other languages, these changes should also work for any other language that respects the standard environment variables or has a way to set the parameters directly in the code.
The four steps in order are:
- Setting the
OPENAI_BASE_URL
environment variable - Setting the
OPENAI_API_KEY
environment variable - Changing the model name in the code
- Adjusting any parameters to the API calls
Hopefully these 4 steps mean it should take you just a few minutes to migrate.
Setting the OPENAI_BASE_URL
environment variable
The OpenAI Python library supports setting an environment variable specifying the base for calls. Set this to point at Anyscale Endpoints. For example, in bash
:
export OPENAI_BASE_URL='https://api.endpoints.anyscale.com/v1'
How you set environment variables varies based on the deployment environment.
Setting the OPENAI_API_KEY
environment variable
You also need a key generated from Anyscale Endpoints. Once you log in, you can create a key at https://app.endpoints.anyscale.com/credentials.
Once you have the key from there, you should add it to your environment. For example, in bash
:
export OPENAI_API_KEY=esecret_...
Changing the model you are using
Use the ChatCompletion API. In the code that calls that, you should specify a different model name. For example, in the below, you need to replace gpt-3.5-turbo
with meta-llama/Llama-2-70b-chat-hf
.
client = openai.OpenAI()
client.chat.completions.create(
model = 'gpt-3.5-turbo', # Note this optional and may not be declared
messages = message_history,
stream = True
)
# Change that to:
client = openai.OpenAI()
client.chat.completions.create(
# Here we use the 70b model (recommended), but you can also use 7b and 13b
model = 'meta-llama/Llama-2-70b-chat-hf',
messages = message_history,
stream = True
)
Check any parameters to the create call that you might need to change
In the preceding code, you modified the create call. For that create call Anyscale supports most of the optional parameters listed on the API Reference, but not all. Anyscale supports the most commonly used parameters, for example, stream
, top_p
, and temperature
.
Here is a list of the parameters to the create call and the support status.
Parameter | Endpoints support status |
---|---|
model | Supported |
messages | Supported |
temperature | Supported |
top_p | Supported |
stream | Supported |
max_tokens | Supported |
stop | Supported |
frequency_penalty | Supported |
presence_penalty | Supported |
n | Supported |
logprobs | Supported* |
top_logprobs | Supported |
response_format | Supported |
tools | Supported |
tool_choice | Supported |
logit_bias | Supported |
functions | Deprecated by OpenAI |
function_call | Deprecated by OpenAI |
user | Not supported* |
*: meta-llama/Llama-2-70b-chat-hf
and meta-llama/Llama-2-13b-chat-hf
aren't supported.
In addition, Anyscale supports some additional parameters
Parameter | Description |
---|---|
schema | Define the JSON schema for the JSON mode. |
top_k | The number of highest probability vocabulary tokens to keep for top-k-filtering. |