Get started
Anyscale Private Endpoints offers a streamlined interface for developers to leverage state-of-the-art open source large language models (LLMs) to power AI applications. Deploying in a private cloud environment allows teams to meet their specific privacy, control, and customization requirements.
LLM applications that you build with Private Endpoints are backed by the Ray and Anyscale and inherit robust production-ready features like zero downtime upgrades, high availability, and enhanced observability. When you're ready for enterprise-level solutions and support, the transition to the expansive capabilities of the Anyscale Platform for machine learning workloads is seamless.
Set up your account
- Sign up for Anyscale Private Endpoints to receive an invite code.
- Create an account or sign in through the Anyscale Console.
Cloud prerequisites
To use Anyscale Private Endpoints, you must satisfy the following requirements:
- Deploy your Anyscale Cloud.
- Ensure that your cloud has sufficient quota to deploy your LLMs.
Availability of instance types can vary by region and availability zone, so before making adjustments to your cloud quota, confirm with your cloud service provider that your selected region and zone can accommodate your instance needs.
How to update cloud quotas
AWS (EC2)
- Navigate to the AWS Management Console and sign in.
- Open the Services dropdown menu and under the Management & Governance section, open Service Quotas
- Request a quota increase for the following instances. Remember that quotas are region specific, so update the relevant one.
Spot instance quotas
- All G and VT Spot Instance Requests: Default is
0
. Set to at least512
, which supports 8 G5.12xlarge and 8 G5.4xlarge spot instances. - All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests: Default is
5
. Set to at least512
, which supports 16M5.8xlarge
spot instances. - All P4, P3, and P2 Spot Instance Requests. Default is
64
: Set to at least224
, which supports 4P3.8xlarge
instances and 1P4de.24xlarge
instance.
Standard instance quotas
- Running On-Demand G and VT instances: Default is
0
. Set to at least512
, which supports 8G5.12xlarge
instances and 8G5.4xlarge
instances. - Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances: Default is
5
. Set to at least544
,which supports 17M5.8xlarge
instances. - Running On-Demand P instances: Default is
64
. Set to at least224
, which supports 4 P3.8xlarge instances and 1P4de.24xlarge
instance.
GCP (GCE)
Compute Engine API
. You can view all of the regions when searching for the following metrics, but also be aware of the All Regions
quota that you may want to review.- Go to Google Cloud Console and sign in.
- Access the IAM & Admin section and select "Quotas."
- Filter by the service Compute Engine API from the dropdown.
- Select the quotas you want to increase, adjust the numbers according to the recommendations in the following list, and click Edit Quotas at the top of the page.
Pre-emptible instance quotas
compute.googleapis.com/preemptible_cpus
- Set this to at least256
compute.googleapis.com/preemptible_nvidia_t4_gpus
- Set this to at least32
compute.googleapis.com/preemptible_nvidia_v100_gpus
- Set this to at least16
compute.googleapis.com/preemptible_nvidia_a100_gpus
- Set this to at least2
Optional
Standard instance quotas
compute.googleapis.com/CPUs
- Set this to at least256
compute.googleapis.com/n2_cpus
- Set this to at least128
compute.googleapis.com/nvidia_t4_gpus
- Set this to at least32
compute.googleapis.com/nvidia_v100_gpus
- Set this to at least16
compute.googleapis.com/nvidia_a100_gpus
- Set this to at least2
Optional
Deploy an Anyscale Private Endpoint
Step 1: Create a new Endpoint
Click on Endpoints server, and then Create.
Step 2: Configure the deployment
Customize your settings:
- Endpoint name: Fill in a unique name; the name is immutable after deployment.
- Endpoint version: Select the latest.
- Cloud name: Choose the cloud that you set up with the adjusted quotas to run your Private Endpoint in.
- Select models to deploy: Choose which models you would like to deploy. You can update this selection after deployment. See here for an advanced configuration guide.
- Click Create Endpoint.
Step 3: Set your API base and key
The status page displays your unique API base and key under Setup. Depending on your development platform or environment, setting environment variables for the cURL command varies.
- For a single project
- macOS
- Windows
- Create a file named
.env
in your project's root directory The names of the environment variables are - Add
.env
to your.gitignore
file Protect your API key and sensitive information by ensuring that you never accidentally commit this file to a Git repository. - Load environment variables Use one of the following two options to load the environment variables:
- Load into bash Run the following command, which loads all the variables into the current session, allowing scripts and commands run in that session to access them:
- Use
python_dotenv
to load.env
files in Python With this library, you can use these lines of code in a Python program to load environment variables:
OPENAI_API_SUFFIX
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Add the following lines, replacing
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint:OPENAI_BASE_URL=ANYSCALE_API_BASE
OPENAI_API_KEY=ANYSCALE_API_KEY
source .env
from dotenv import load_dotenv
load_dotenv()
- Open the Terminal application
- Edit the Shell profile To determine what shell you are using, type the following:
- Add environment variables The names of the environment variables are
- Save and exit Press Ctrl+O to write, Enter to confirm the filename, and Ctrl+X to close the editor.
- Apply the changes Use the command
- Verify the setup Paste this command in the terminal to display your API key.
echo $SHELL
bash
, macOS versions prior to Catalina (10.15) typically use .bash_profile
.nano ~/.bash_profile
zsh
, edit .zshrc
.nano ~/.zshrc
OPENAI_API_SUFFIX
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Add the following line, replacing
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint:export OPENAI_BASE_URL=ANYSCALE_API_BASE
export OPENAI_API_KEY=ANYSCALE_API_KEY
source ~/.bash_profile
or source ~/.zshrc
to load the updated profile.echo $OPENAI_API_KEY
- Open the Command Prompt application
- Set permanent environment variables The names of the environment variables are
- Verify the setup Reopen the command prompt and paste in the following command to confirm your API key:
OPENAI_API..
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Replace
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint.setx OPENAI_BASE_URL ANYSCALE_API_BASE
setx OPENAI_API_KEY ANYSCALE_API_KEY
echo %OPENAI_API_KEY%
Step 4: Query the model
- cURL
- Python
curl -X 'POST' "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the Australian open 2012 final and how many sets were played?"}
]
}'
import os
from openai import OpenAI
client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']
)
response = client.chat.completions.create(
model="meta-llama/Llama-2-7b-chat-hf", # Replace with your model name.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the Australian open 2012 final and how many sets were played?"}
]
)
print(response.choices[0].message)
Next steps
- Check out the OpenAI Migration Guide to transition existing applications over from the OpenAI API to Anyscale Private Endpoints.
- Further customize your model to meet your deployment, autoscaling, and text generation needs.
- Use the observability tooling built into the Endpoints Server to monitor the health of deployed models and set up alerts for notable events.