Overview
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Anyscale Private Endpoints allows you to seamlessly serve the most popular open source large language models (LLMs) in your cloud environment without compromising on performance. As a result, you can concentrate on customizing your LLM-powered applications while adhering to the highest security standards.
Model serving
Securely add a conversational AI layer to your applications using open source models. With this tool, you can build responsive chatbots, automate customer support, analyze content, and code with AI assistance, while keeping your data within your own cloud environment for enhanced privacy and compliance.
Supported models
Anyscale Private Endpoints supports the most popular open source LLMs.
mistralai/Mistral-7B-Instruct-v0.1
Mistral-7B-Instruct-v0.1
is a pre-trained generative text model available in the Hugging Face Transformers format.
- API name:
mistralai/Mistral-7B-Instruct-v0.1
- Provider: Mistral
- License: Apache 2.0
- More details: Hugging Face model card
HuggingFaceH4/zephyr-7b-beta
Zephyr-7B-β
is a fine-tuned version of mistralai/Mistral-7B-v0.1
that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).
- API name:
HuggingFaceH4/zephyr-7b-beta
- Provider: Hugging Face
- License: MIT
- More details: Hugging Face model card
meta-llama/Llama-2-7b-chat-hf
The Llama-2-Chat 7B model is a pre-trained generative text model fine-tuned for dialog use cases and available in the Hugging Face Transformers format.
- API name:
meta-llama/Llama-2-7b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-13b-chat-hf
The Llama-2-Chat 13B model is a generative text model trained on a diverse mix of publicly available online data, designed for optimal performance in various text tasks.
- API name:
meta-llama/Llama-2-13b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-70b-chat-hf
The Llama-2-Chat 70 billion parameter model is a high-capacity generative text model trained on a blend of publicly available online data and enhanced with Grouped-Query Attention for improved inference scalability. Hosted in the Hugging Face Transformers format, it represents the largest variant within the Llama 2 series.
- API name:
meta-llama/Llama-2-70b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
codellama/CodeLlama-34b-Instruct-hf
The Code Llama - Instruct 34B model, developed by Meta, is the 34 billion parameter, instruction-tuned transformer model designed for instruction following and safer deployment. Housed in the Hugging Face Transformers format, this variant is part of the Code Llama family, specializing in general code synthesis and understanding.
- API name:
codellama/CodeLlama-34b-Instruct-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
Email the support team at endpoints-help@anyscale.com to request the addition of another model to Anyscale Private Endpoints.
Guides
To get a general introduction to Anyscale Private Endpoints, follow these foundational guides:
- Get started: Introductory steps for Anyscale Private Endpoints setup.
- OpenAI migration guide: Transition models and data from OpenAI to Anyscale.
- Model configuration: Customize and optimize deployed open source models.
- Observability and alerting: Tools for monitoring deployed models and setting up alerts for key metrics.
Examples
For hands-on tutorials that use specific tools and frameworks, see this collection of starter content:
- Prompting: First steps for designing prompts and prompt templates.
- Create a chatbot: Create a simple, streaming chatbot.
- LangChain integration: Setup LangChain with Anyscale for advanced chat agents.
- Arize integration: Implement Arize for model monitoring in Anyscale.
- Weights & Biases integration: Track experiments and tune models with W&B.
- LlamaIndex integration: Enhance your LLMs by building an indexing and querying engine with LlamaIndex.
- Anyscale Endpoint Cookbook: A GitHub repo with even more examples applications built with Anyscale Private Endpoints for common use cases.
Fine-tuning
By training on your unique datasets, you can tailor a language model to understand and generate text that aligns with your organization's tone, terminology, and objectives. This API simplifies the process of adapting open source models to recognize the nuances of your business's communications, support queries, and technical documents.
Supported models
The following models are available for fine-tuning:
mistralai/Mistral-7B-Instruct-v0.1
Mistral-7B-Instruct-v0.1
is a pre-trained generative text model available in the Hugging Face Transformers format. Fine-tuning support added in version 0.4.1.
- API name:
mistralai/Mistral-7B-Instruct-v0.1
- Provider: Mistral
- License: Apache 2.0
- More details: Hugging Face model card
meta-llama/Llama-2-7b-hf
The Llama 2, 7 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.
- API name:
meta-llama/Llama-2-7b-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-7b-chat-hf
The Llama 2, 7 billion parameter model is a pre-trained generative text model fine-tuned for dialog use cases and available in the Hugging Face Transformers format.
- API name:
meta-llama/Llama-2-7b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-13b-hf
The Llama 2, 13 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.
- API name:
meta-llama/Llama-2-13b-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-13b-chat-hf
The Llama 2, 13 billion parameter model is a generative text model trained on a diverse mix of publicly available online data, designed for optimal performance in various text tasks.
- API name:
meta-llama/Llama-2-13b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-70b-hf
The Llama 2, 70 billion parameter model is a pre-trained generative language model, versatile for diverse text applications. It's not specifically fine-tuned for dialog, but excels in tasks like text generation, summarization, and translation. Fine-tuning support added in version 0.4.1.
- API name:
meta-llama/Llama-2-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
meta-llama/Llama-2-70b-chat-hf
The Llama-2-Chat 70 billion parameter model is a high-capacity generative text model trained on a blend of publicly available online data and enhanced with Grouped-Query Attention for improved inference scalability. Hosted in the Hugging Face Transformers format, it represents the largest variant within the Llama 2 series. Fine-tuning support added in version 0.4.1.
- API name:
meta-llama/Llama-2-70b-chat-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
codellama/CodeLlama-34b-Instruct-hf
The Code Llama - Instruct 34B model, developed by Meta, is the 34 billion parameter, instruction-tuned transformer model designed for instruction following and safer deployment. Housed in the Hugging Face Transformers format, this variant is part of the Code Llama family, specializing in general code synthesis and understanding. Fine-tuning support added in version 0.4.1.
- API name:
codellama/CodeLlama-34b-Instruct-hf
- Provider: Meta
- License: Llama 2 Community License
- More details: Hugging Face model card
Guides
- Get started: How to run a basic fine-tuning job.
For additional assistance or feature requests for Anyscale Private Endpoints, contact the support team at endpoints-help@anyscale.com.