Migrate from OpenAI to Open Models

Introduction

RayLLM provides an OpenAI compatible Rest API that can be used to query open-weight models served on Anyscale. This guide covers the list of features that can be migrated with minimal changes from OpenAI to RayLLM deployed models. It also supports additional features that are supported by RayLLM but not by OpenAI.

For basic use-case the migration of the applications should be as simple as following the steps below:

Setting the OPENAI_BASE_URL environment variable
Setting the OPENAI_API_KEY environment variable
Changing the model name in the code
Adjusting any parameters to the API calls

Hopefully these 4 steps mean it should take you just a few minutes to migrate.

Setting the `OPENAI_BASE_URL` environment variable

The OpenAI Python library supports setting an environment variable specifying the base for calls. Set this to point at Anyscale Endpoints. For example, in bash:

export OPENAI_BASE_URL='<Anyscale_Service_URL>'

Setting the `OPENAI_API_KEY` environment variable

If Authentication is enabled when deploying the service, you can use the generated token from there, otherwise any non-empty token can be used. For example, in bash:

export OPENAI_API_KEY=some_secret_key

Changing the model you are using

Use the ChatCompletion API. In the code that calls that, you should specify a different model name. For example, in the below, you need to replace gpt-3.5-turbo with meta-llama/Llama-2-70b-chat-hf.

client = openai.OpenAI()
client.chat.completions.create(
  model = 'gpt-3.5-turbo',
  messages = message_history,
  stream = True
)

# Now change that to:

client = openai.OpenAI()
client.chat.completions.create(
  model = 'meta-llama/Llama-2-70b-chat-hf', 
  messages = message_history,
  stream = True
)

Features supported by OpenAI and RayLLM

Most of the optional parameters listed on the OpenAI's API Referenceare supported by RayLLM services. Here is a list of the supported parameters.

Parameter	Endpoints support status
`model`	Supported
`messages`	Supported
`temperature`	Supported
`top_p`	Supported
`stream`	Supported
`max_tokens`	Supported
`stop`	Supported
`frequency_penalty`	Supported
`presence_penalty`	Supported
`n`	Supported
`logprobs`	Supported
`top_logprobs`	Supported
`response_format`	Supported
`tools`	Supported
`tool_choice`	Supported
`logit_bias`	Supported
`functions`	Deprecated by OpenAI
`function_call`	Deprecated by OpenAI
`user`	Not supported*

In addition, Anyscale supports some other parameters

Parameter	Description
`schema`	Define the expected JSON schema for the JSON mode.
`top_k`	The number of highest probability vocabulary tokens to keep for top-k-filtering.

Introduction​

Setting the OPENAI_BASE_URL environment variable​

Setting the OPENAI_API_KEY environment variable​

Changing the model you are using​

Features supported by OpenAI and RayLLM​

Introduction

Setting the `OPENAI_BASE_URL` environment variable

Setting the `OPENAI_API_KEY` environment variable

Changing the model you are using

Features supported by OpenAI and RayLLM