Skip to main content

Overview

Check your docs version

This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.

Anyscale Private Endpoints allows you to seamlessly serve the most popular open source large language models (LLMs) in your cloud environment without compromising on performance. As a result, you can concentrate on customizing your LLM-powered applications while adhering to the highest security standards.

Model serving

Securely add a conversational AI layer to your applications using open source models. With this tool, you can build responsive chatbots, automate customer support, analyze content, and code with AI assistance, while keeping your data within your own cloud environment for enhanced privacy and compliance.

Supported models

Anyscale Private Endpoints supports the most popular open source LLMs.

mistralai/Mistral-7B-Instruct-v0.1

Mistral-7B-Instruct-v0.1 is a pre-trained generative text model available in the Hugging Face Transformers format.

HuggingFaceH4/zephyr-7b-beta

Zephyr-7B-β is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).

meta-llama/Llama-2-7b-chat-hf

The Llama-2-Chat 7B model is a pre-trained generative text model fine-tuned for dialog use cases and available in the Hugging Face Transformers format.

meta-llama/Llama-2-13b-chat-hf

The Llama-2-Chat 13B model is a generative text model trained on a diverse mix of publicly available online data, designed for optimal performance in various text tasks.

meta-llama/Llama-2-70b-chat-hf

The Llama-2-Chat 70 billion parameter model is a high-capacity generative text model trained on a blend of publicly available online data and enhanced with Grouped-Query Attention for improved inference scalability. Hosted in the Hugging Face Transformers format, it represents the largest variant within the Llama 2 series.

codellama/CodeLlama-34b-Instruct-hf

The Code Llama - Instruct 34B model, developed by Meta, is the 34 billion parameter, instruction-tuned transformer model designed for instruction following and safer deployment. Housed in the Hugging Face Transformers format, this variant is part of the Code Llama family, specializing in general code synthesis and understanding.

note

Email the support team at endpoints-help@anyscale.com to request the addition of another model to Anyscale Private Endpoints.

Guides

To get a general introduction to Anyscale Private Endpoints, follow these foundational guides:

Examples

For hands-on tutorials that use specific tools and frameworks, see this collection of starter content:

Fine-tuning

By training on your unique datasets, you can tailor a language model to understand and generate text that aligns with your organization's tone, terminology, and objectives. This API simplifies the process of adapting open source models to recognize the nuances of your business's communications, support queries, and technical documents.

Supported models

The following models are available for fine-tuning:

mistralai/Mistral-7B-Instruct-v0.1

Mistral-7B-Instruct-v0.1 is a pre-trained generative text model available in the Hugging Face Transformers format. Fine-tuning support added in version 0.4.1.

meta-llama/Llama-2-7b-hf

The Llama 2, 7 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.

meta-llama/Llama-2-7b-chat-hf

The Llama 2, 7 billion parameter model is a pre-trained generative text model fine-tuned for dialog use cases and available in the Hugging Face Transformers format.

meta-llama/Llama-2-13b-hf

The Llama 2, 13 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.

meta-llama/Llama-2-13b-chat-hf

The Llama 2, 13 billion parameter model is a generative text model trained on a diverse mix of publicly available online data, designed for optimal performance in various text tasks.

meta-llama/Llama-2-70b-hf

The Llama 2, 70 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.

meta-llama/Llama-2-70b-chat-hf

The Llama-2-Chat 70 billion parameter model is a high-capacity generative text model trained on a blend of publicly available online data and enhanced with Grouped-Query Attention for improved inference scalability. Hosted in the Hugging Face Transformers format, it represents the largest variant within the Llama 2 series. Fine-tuning support added in version 0.4.1.

codellama/CodeLlama-34b-Instruct-hf

The Code Llama - Instruct 34B model, developed by Meta, is the 34 billion parameter, instruction-tuned transformer model designed for instruction following and safer deployment. Housed in the Hugging Face Transformers format, this variant is part of the Code Llama family, specializing in general code synthesis and understanding. Fine-tuning support added in version 0.4.1.

Guides

tip

For additional assistance or feature requests for Anyscale Private Endpoints, contact the support team at endpoints-help@anyscale.com.