Skip to main content

OpenAI migration guide

Anyscale Private Endpoints is an interface compatible with OpenAI's Chat Completions API. It ensures seamless integration and functionality that matches the capabilities of the original API, all through Anyscale's secure and scalable infrastructure.

Transitioning from OpenAI's proprietary models to the best open source LLMs with Anyscale Private Endpoints is a seamless experience. The OpenAI Python Library allows developers to integrate OpenAI models into Python applications. This guide covers the minimal changes needed to switch over to using Anyscale.

Step 0: Prerequisites

This guide assumes that you have already installed the OpenAI Python library.

Step 1: Change your API base and key


Refer to the "Get started" page for instructions on how to set your Anyscale API base URL and key.

Update your API key and API base URL to direct to your Anyscale Private Endpoints rather than OpenAI's default settings. Explicitly set the environment variables as shown below:

import os
from openai import OpenAI

client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']

Step 2: Change the model name

The primary entrypoint to GPT models is OpenAI's Chat Completion API, which takes a list of messages and returns a generated message. To use Anyscale Private Endpoints, change the name of the model to the name of your deployed open source model.

response =
model="meta-llama/Llama-2-70b-chat-hf", # Choose any supported model.
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},


Step 3: Check parameter compatibility

The Anyscale Private Endpoints implementation of the Chat Completions API encompasses three categories of parameters:

  1. Supported parameters
  • messages - (Required; array) A list of messages comprising the conversation so far.
  • model - (Required; string) The ID of the model to use. Select from the list of supported models.
  • frequency_penalty - (number) A number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  • max_tokens - (integer) The maximum number of tokens to generate in the chat completion.
  • presence_penalty - (number) A number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  • stop - (string/array) Up to 4 sequences where the API stops generating further tokens.
  • stream - (boolean) If true, send messages token by token. If false, messages send in bulk.
  • temperature - (number) A number between 0 and 2. Higher values correspond to more random responses and lower values being more deterministic.
  • top_p - (number) The percentage of tokens with top_p probability mass to consider. For example, 0.1 means only tokens comprising the top 10% probability mass become candidates.
  1. Unsupported parameters

    The ChatCompletions API may offer additional parameters beyond the preceding ones, but Anyscale Private Endpoints doesn't support them. Based on demand and compatibility, future updates may include them.

  2. Extended parameters

    In addition to the native API parameters, Anyscale Private Endpoints introduces a set of extended parameters designed to enhance the base capability.

  • top_k - (integer) The number of highest probability vocabulary tokens to keep for top-k-filtering.

Optional: Using Anyscale Private Endpoints and OpenAI together

It's common for an LLM app to use a combination of models to serve requests to balance cost and quality. For example, you may use a collection of smaller open source fine-tuned models to address specialized queries and a foundational model like GPT-4 to field everything else.

To use Anyscale Private Endpoints in an app with OpenAI endpoints, pass your Anyscale api_key and base_url into the OpenAI client.


Before proceeding, make sure that you set your OPENAI_BASE_URL and OPEN_API_KEY environment variables to your OpenAI API key to avoid calling the Anyscale API by default.

import os
from openai import OpenAI

client = OpenAI(
api_key = os.environ['ANYSCALE_API_KEY'],
base_url = os.environ['ANYSCALE_API_BASE']

response =
model="meta-llama/Llama-2-70b-chat-hf", # Fill in your deployed model.
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}