Migrate from OpenAI to Open Models
Introduction
RayLLM provides an OpenAI compatible Rest API that can be used to query open-weight models served on Anyscale. This guide covers the list of features that can be migrated with minimal changes from OpenAI to RayLLM deployed models. It also supports additional features that are supported by RayLLM but not by OpenAI.
For basic use-case the migration of the applications should be as simple as following the steps below:
- Setting the
OPENAI_BASE_URL
environment variable - Setting the
OPENAI_API_KEY
environment variable - Changing the model name in the code
- Adjusting any parameters to the API calls
Hopefully these 4 steps mean it should take you just a few minutes to migrate.
Setting the OPENAI_BASE_URL
environment variable
The OpenAI Python library supports setting an environment variable specifying the base for calls. Set this to point at Anyscale Endpoints. For example, in bash
:
export OPENAI_BASE_URL='<Anyscale_Service_URL>'
Setting the OPENAI_API_KEY
environment variable
If Authentication is enabled when deploying the service, you can use the generated token from there, otherwise any non-empty token can be used. For example, in bash
:
export OPENAI_API_KEY=some_secret_key
Changing the model you are using
Use the ChatCompletion API. In the code that calls that, you should specify a different model name. For example, in the below, you need to replace gpt-3.5-turbo
with meta-llama/Llama-2-70b-chat-hf
.
client = openai.OpenAI()
client.chat.completions.create(
model = 'gpt-3.5-turbo',
messages = message_history,
stream = True
)
# Now change that to:
client = openai.OpenAI()
client.chat.completions.create(
model = 'meta-llama/Llama-2-70b-chat-hf',
messages = message_history,
stream = True
)
Features supported by OpenAI and RayLLM
Most of the optional parameters listed on the OpenAI's API Referenceare supported by RayLLM services. Here is a list of the supported parameters.
Parameter | Endpoints support status |
---|---|
model | Supported |
messages | Supported |
temperature | Supported |
top_p | Supported |
stream | Supported |
max_tokens | Supported |
stop | Supported |
frequency_penalty | Supported |
presence_penalty | Supported |
n | Supported |
logprobs | Supported |
top_logprobs | Supported |
response_format | Supported |
tools | Supported |
tool_choice | Supported |
logit_bias | Supported |
functions | Deprecated by OpenAI |
function_call | Deprecated by OpenAI |
user | Not supported* |
In addition, Anyscale supports some other parameters
Parameter | Description |
---|---|
schema | Define the expected JSON schema for the JSON mode. |
top_k | The number of highest probability vocabulary tokens to keep for top-k-filtering. |