Get started
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Anyscale Private Endpoints offers a streamlined interface for developers to leverage state-of-the-art open source large language models (LLMs) to power AI applications. Deploying in a private cloud environment allows teams to meet their specific privacy, control, and customization requirements.
LLM applications that you build with Private Endpoints are backed by the Ray and Anyscale and inherit robust production-ready features like zero downtime upgrades, high availability, and enhanced observability. When you're ready for enterprise-level solutions and support, the transition to the expansive capabilities of the Anyscale Platform for machine learning workloads is seamless.
Set up your account
- Sign up for Anyscale Private Endpoints to receive an invite code.
- Create an account or sign in through the Anyscale Console.
Cloud prerequisites
To use Anyscale Private Endpoints, you must satisfy the following requirements:
- Deploy an Anyscale Cloud.
- Ensure that this cloud has sufficient quota to deploy your LLMs.
For Anyscale Private Endpoints to serve your models, you must modify the default resource quotas set by your cloud service provider.
Note: The availability of instances can vary by region and zone, so confirm with your cloud service provider that your selection can accommodate your instance needs.
How to update AWS quotas
For Amazon EC2, follow these steps.
- Navigate to the AWS Management Console and sign in.
- Open the Services dropdown menu and under the Management & Governance section, open Service Quotas
- Request a quota increase for the following instances. Remember that quotas are region specific, so update the relevant one.
Spot instance quotas
- All G and VT Spot Instance Requests: Default is
0
. Set to at least512
, which supports 8 G5.12xlarge and 8 G5.4xlarge spot instances. - All Standard (A, C, D, H, I, M, R, T, Z) Spot Instance Requests: Default is
5
. Set to at least512
, which supports 16M5.8xlarge
spot instances. - All P4, P3, and P2 Spot Instance Requests. Default is
64
: Set to at least224
, which supports 4P3.8xlarge
instances and 1P4de.24xlarge
instance.
Standard instance quotas
- Running On-Demand G and VT instances: Default is
0
. Set to at least512
, which supports 8G5.12xlarge
instances and 8G5.4xlarge
instances. - Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances: Default is
5
. Set to at least544
,which supports 17M5.8xlarge
instances. - Running On-Demand P instances: Default is
64
. Set to at least224
, which supports 4 P3.8xlarge instances and 1P4de.24xlarge
instance.
How to update GCP quotas
- Navigate to the Google Cloud Quotas page
- Filter the quotas by each quota metric listed below
- Click the checkbox from the same region as your Anyscale Cloud
- Select EDIT QUOTAS to enter a new limit.
Quota Metric | Minimum Recommended Quota |
---|---|
compute.googleapis.com/preemptible_cpus | 256 CPUs |
compute.googleapis.com/preemptible_nvidia_t4_gpus | 32 GPUs |
compute.googleapis.com/preemptible_nvidia_v100_gpus | 16 GPUs |
compute.googleapis.com/preemptible_nvidia_a100_gpus | 8 GPUs |
compute.googleapis.com/CPUs | 256 CPUs |
compute.googleapis.com/n2_cpus | 128 CPUs |
compute.googleapis.com/nvidia_t4_gpus | 32 GPUs |
compute.googleapis.com/nvidia_v100_gpus | 16 GPUs |
compute.googleapis.com/nvidia_a100_gpus | 8 GPUs |
Persistent Disk SSD (GB) | 1000 GB |
Deploy an Anyscale Private Endpoint
Step 1: Create a new Endpoint
Click on Endpoints server, and then Create.
Step 2: Configure the deployment
Customize your settings:
- Endpoint name: Fill in a unique name; the name is immutable after deployment.
- Endpoint version: Select the latest.
- Cloud name: Choose the cloud that you set up with the adjusted quotas to run your Private Endpoint in.
- Select models to deploy: Choose which models you would like to deploy. You can update this selection after deployment. See here for an advanced configuration guide.
- Click Create Endpoint.
Step 3: Set your API base and key
The status page displays your unique API base and key under Setup. Depending on your development platform or environment, setting environment variables for the cURL command varies.
- For a single project
- macOS
- Windows
This approach works across macOS, Windows, and Linux and allows you to specify environment variables for each project you're working on.
- Create a file named
.env
in your project's root directory
The names of the environment variables are - Add
.env
to your.gitignore
file
Protect your API key and sensitive information by ensuring that you never accidentally commit this file to a Git repository.
- Load environment variables Use one of the following two options to load the environment variables:
- Load into bash Run the following command, which loads all the variables into the current session, allowing scripts and commands run in that session to access them:
- Use
python_dotenv
to load.env
files in Python
With this library, you can use these lines of code in a Python program to load environment variables:
OPENAI_API_SUFFIX
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Add the following lines, replacing
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint:OPENAI_BASE_URL=ANYSCALE_API_BASE OPENAI_API_KEY=ANYSCALE_API_KEY
source .env
from dotenv import load_dotenv load_dotenv()
To make your API key permanent and accessible to all projects without needing to add it to your application code, use the following method for macOS:
- Open the Terminal application
- Edit the Shell profile To determine what shell you are using, type the following:
- Add environment variables The names of the environment variables are
- Save and exit Press Ctrl+O to write, Enter to confirm the filename, and Ctrl+X to close the editor.
- Apply the changes Use the command
- Verify the setup Paste this command in the terminal to display your API key.
echo $SHELLIf you are using
bash
, macOS versions prior to Catalina (10.15) typically use .bash_profile
.
nano ~/.bash_profileFor Catalina and later versions, or if you are using
zsh
, edit .zshrc
.
nano ~/.zshrc
OPENAI_API_SUFFIX
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Add the following line, replacing
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint:
export OPENAI_BASE_URL=ANYSCALE_API_BASE export OPENAI_API_KEY=ANYSCALE_API_KEY
source ~/.bash_profile
or source ~/.zshrc
to load the updated profile.
echo $OPENAI_API_KEY
To make your API key permanent and accessible to all projects without needing to add it to your application code, use the following method for Windows:
- Open the Command Prompt application
- Set permanent environment variables The names of the environment variables are
- Verify the setup Reopen the command prompt and paste in the following command to confirm your API key:
OPENAI_API..
to ensure seamless compatibility with existing applications written with OpenAI APIs, but this should be your Anyscale API base and key.Replace
'ANYSCALE_API_BASE'
and 'ANYSCALE_API_KEY'
with your API base and key copied from the Setup section on the About page for an endpoint.
setx OPENAI_BASE_URL ANYSCALE_API_BASE setx OPENAI_API_KEY ANYSCALE_API_KEY
echo %OPENAI_API_KEY%
Step 4: Query the model
- cURL
- Python
cURL is a command-line tool that developers commonly use for making HTTP requests. After you've set-up your API key in your terminal or command prompt, send a sample request to the API with the following command:
curl -X 'POST' "$OPENAI_BASE_URL/chat/completions" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H 'Content-Type: application/json' \
-d '\{
"model": "meta-llama/Llama-2-70b-chat-hf",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the Australian open 2012 final, and how many sets were played?"}
]
}'
To make an API request using the Python library, set your environment variables where your script or notebook is running. Then, run the following example:
import os
from openai import OpenAI
client = OpenAI(
api_key = os.environ['OPENAI_API_KEY'],
base_url = os.environ['OPENAI_BASE_URL']
)
response = client.chat.completions.create(
model="meta-llama/Llama-2-70b-chat-hf", # Replace with your model name.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the Australian open 2012 final, and how many sets were played?"}
]
)
print(response.choices[0].message)
Next steps
- Check out the OpenAI Migration Guide to transition existing applications over from the OpenAI API to Anyscale Private Endpoints.
- Further customize your model to meet your deployment, autoscaling, and text generation needs.
- Use the observability tooling built into the Endpoints Server to monitor the health of deployed models and set up alerts for notable events.