Skip to main content

Migrate from OpenAI to Open Models

Introduction

RayLLM provides an OpenAI compatible Rest API that can be used to query open-weight models served on Anyscale. This guide covers the list of features that can be migrated with minimal changes from OpenAI to RayLLM deployed models. It also supports additional features that are supported by RayLLM but not by OpenAI.

For basic use-case the migration of the applications should be as simple as following the steps below:

  1. Setting the OPENAI_BASE_URL environment variable
  2. Setting the OPENAI_API_KEY environment variable
  3. Changing the model name in the code
  4. Adjusting any parameters to the API calls

Hopefully these 4 steps mean it should take you just a few minutes to migrate.

Setting the OPENAI_BASE_URL environment variable

The OpenAI Python library supports setting an environment variable specifying the base for calls. Set this to point at Anyscale Endpoints. For example, in bash:

export OPENAI_BASE_URL='<Anyscale_Service_URL>'

Setting the OPENAI_API_KEY environment variable

If Authentication is enabled when deploying the service, you can use the generated token from there, otherwise any non-empty token can be used. For example, in bash:

export OPENAI_API_KEY=some_secret_key

Changing the model you are using

Use the ChatCompletion API. In the code that calls that, you should specify a different model name. For example, in the below, you need to replace gpt-3.5-turbo with meta-llama/Llama-2-70b-chat-hf.

client = openai.OpenAI()
client.chat.completions.create(
  model = 'gpt-3.5-turbo',
  messages = message_history,
  stream = True
)

# Now change that to:

client = openai.OpenAI()
client.chat.completions.create(
  model = 'meta-llama/Llama-2-70b-chat-hf', 
  messages = message_history,
  stream = True
)

Features supported by OpenAI and RayLLM

Most of the optional parameters listed on the OpenAI's API Referenceare supported by RayLLM services. Here is a list of the supported parameters.

ParameterEndpoints support status
modelSupported
messagesSupported
temperatureSupported
top_pSupported
streamSupported
max_tokensSupported
stopSupported
frequency_penaltySupported
presence_penaltySupported
nSupported
logprobsSupported
top_logprobsSupported
response_formatSupported
toolsSupported
tool_choiceSupported
logit_biasSupported
functionsDeprecated by OpenAI
function_callDeprecated by OpenAI
userNot supported*

In addition, Anyscale supports some other parameters

ParameterDescription
schemaDefine the expected JSON schema for the JSON mode.
top_kThe number of highest probability vocabulary tokens to keep for top-k-filtering.