Skip to main content


The Anyscale Private Endpoints interface ensures compatibility with OpenAI's Chat Completions API, enabling a smooth transition of your existing applications to open source models deployed privately in your cloud.

To interact with a model, the messages parameter takes in an array of message objects, each tagged as one of the following three roles:

  • System Message (Optional): This parameter defines the AI's tone, behavior, or personality. It complements, but doesn't replace, the user message that describes the task.
  • User Message: What most people refer to as the "prompt," the user message provides a request for the model to respond to. The LLM requires at least one user message to respond.
  • Assistant Message: These messages store previous responses but can also provide examples of intended responses. Each user prompt has an assistant message, except for the most recent one, which represents the query to answer.

Below is an example of all three parameters interacting in an example query:

import os
from openai import OpenAI

client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']

response =
{"role": "system", "content": "Explain things briefly."}, # Optional
# 0 or more user-assistant pairs
# {"role": "user", "content": "How do you make pasta?"},
# {"role": "assistant", "content": "Mix flour and water, knead, shape, and boil."},
{"role": "user", "content": "How to pick a good tomato?"}


Understanding and using these prompts effectively is key to eliciting the desired behavior from the AI model. The following is a step-by-step guide to setting up simple querying with the Anyscale Private Endpoints API.


Before proceeding, make sure you have set your Anyscale API base URL and key and installed the OpenAI Python library.

Step 0: Setup

Set the client to authenticate with your Anyscale API key, point it to the URL of your Anyscale Private Endpoint, and fill in the name of the deployed model to query.

pip install openai==1.3.2
import os
from openai import OpenAI

# Use your Anyscale API key, and ensure it's set securely.
client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']

# Fill in the name of your deployed model.
model = "meta-llama/Llama-2-70b-chat-hf"

Step 1: Define system and user messages

Next, define the system and user prompts. The system prompt establishes the AI's role, while the user prompt includes the comment or request.

# System message sets the assistant's context
system_msg = 'You are a helpful assistant.'

# User message is what you want to ask the model
user_msg = 'Where should I go on a third date in San Francisco?'

Step 2: Send a query

Create a helper function for chatting

def quick_chat(system, user, temp=1.0):
response =
{"role": "system", "content": system},
{"role": "user", "content": user}
return response.choices[0].message

Send a prompt

# Send a query and print the response.
print(quick_chat(system_msg, user_msg))

Adjust the temperature parameter for creativity control

The temperature setting can influence the creativity of the AI responses. It ranges from 0 to 2 with higher values leading to more random responses and lower values yielding more deterministic replies.

# For a more conservative response
print(quick_chat(system_msg, user_msg, temp=0.1))

# For a more creative response.
print(quick_chat(system_msg, user_msg, temp=1.1))