Skip to main content

Deploy Anyscale on Azure Kubernetes Service (AKS)

Deploy Anyscale on Azure Kubernetes Service (AKS)

Complete the following steps to configure and deploy a new Anyscale cloud on AKS.

1. Install the Anyscale CLI

pip install -U anyscale
anyscale login # authenticate

2. Authenticate the Azure CLI

az login

3. Use the Anyscale Terraform module to create an AKS cluster

Anyscale provides a Terraform module to deploy an AKS cluster and supporting Azure resources.

Enter information about where you want to deploy your AKS cluster:

Clone the Terraform module and navigate to the AKS example:

git clone https://github.com/anyscale/terraform-kubernetes-anyscale-foundation-modules
cd terraform-kubernetes-anyscale-foundation-modules/examples/azure/aks-new_cluster/

Run the following command to create and populate a Terraform variable file:

cat <<EOF > terraform.tfvars
aks_cluster_name = "<your_aks_cluster_name>"
azure_subscription_id = "<your_subscription_id>"
azure_tenant_id = "<your_tenant_id>"
azure_location = "<your_azure_region>"
EOF
note

The Terraform example enables GPU node pools (T4 and A100) by default. To customize or disable GPU pools, set gpu_pool_configs in variables.tf or in your terraform.tfvars (for example, use an empty map {} to disable GPU pools).

Run the following commands to apply the Terraform configuration. This may take several minutes.

terraform init
terraform plan
terraform apply

Collect and enter the following values from your Terraform output:

4. Install additional AKS components

In this step, you connect to your AKS cluster and install Envoy Gateway for externally facing load balancing.

Run the following command to connect your terminal to the AKS cluster:

az aks get-credentials --resource-group <azure_resource_group_name> --name <your_aks_cluster_name> --overwrite-existing

Install Envoy Gateway v1.7.0:

helm install eg oci://docker.io/envoyproxy/gateway-helm \
--version v1.7.0 \
--namespace envoy-gateway-system \
--create-namespace
kubectl wait --for=condition=available deployment/envoy-gateway \
-n envoy-gateway-system --timeout=120s

Create a file named envoyproxy.yaml with the following contents:

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: envoy-proxy
namespace: envoy-gateway-system
spec:
provider:
type: Kubernetes
kubernetes:
envoyService:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/azure-load-balancer-internal: "false"
service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/healthz"

Apply the resource:

kubectl apply -f envoyproxy.yaml

Create a file named gatewayclass.yaml with the following contents:

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: eg
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: envoy-proxy
namespace: envoy-gateway-system

Apply the resource:

kubectl apply -f gatewayclass.yaml
note

Envoy Gateway is the recommended ingress controller for Anyscale on Kubernetes. Other gateway and ingress controllers are supported. See Ingress and gateway controllers for all supported options.

If you intend to use NVIDIA GPUs in your Anyscale workloads, install the NVIDIA device plugin:

helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm upgrade nvdp nvdp/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--version 0.17.1 \
--values values_nvdp.yaml \
--create-namespace \
--install

5. Register the Anyscale cloud resources

Run the following command with the values from your Terraform output. Verify all variables are entered correctly.

anyscale cloud register \
--name <your_cloud_name> \
--region <your_azure_region> \
--provider azure \
--compute-stack k8s \
--azure-tenant-id <your_tenant_id> \
--anyscale-operator-iam-identity <anyscale_operator_principal_id> \
--cloud-storage-bucket-name 'abfss://<blob_storage_name>@<storage_account>.dfs.core.windows.net'

Record the cloud resource ID from the output in the following field:

6. Install and deploy the Anyscale operator on your AKS cluster

In this step, you create the Gateway, create a values.yaml file, and install the Anyscale operator on your AKS cluster.

Add the Anyscale operator Helm chart

Run the following command to add the Anyscale operator Helm chart:

helm repo add anyscale https://anyscale.github.io/helm-charts
helm repo update anyscale

Create the Gateway

The anyscale-operator namespace must exist before you create the Gateway. Create it with the following command. If the namespace already exists, this command returns an error; that's expected and you can proceed.

kubectl create namespace anyscale-operator

Create a file named gateway.yaml.

Enter your cloud resource ID from step 5. Hyphens are applied automatically for the certificate name:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
namespace: anyscale-operator
spec:
gatewayClassName: eg
listeners:
- name: http
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
- name: https
port: 443
protocol: HTTPS
hostname: '*.i.anyscaleuserdata.com'
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: anyscale-<cloud-resource-id>-certificate
allowedRoutes:
namespaces:
from: All
- name: https-session
port: 443
protocol: HTTPS
hostname: '*.s.anyscaleuserdata.com'
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: anyscale-svc-<cloud-resource-id>-certificate
allowedRoutes:
namespaces:
from: All

Apply the Gateway and retrieve its external address:

kubectl apply -f gateway.yaml
kubectl get gateway gateway -n anyscale-operator \
-o jsonpath='{.status.addresses[0].value}'

Record the Gateway address in the following field:

Create a values YAML file

Create a values.yaml file. The following example uses the values you provided in earlier steps for a minimal Azure configuration.

global:
cloudDeploymentId: <your_cloud_resource_id>
controlPlaneURL: https://console.azure.anyscale.com
cloudProvider: azure
auth:
iamIdentity: <anyscale_operator_client_id>
audience: api://086bc555-6989-4362-ba30-fded273e432b/.default

workloads:
serviceAccount:
name: anyscale-operator

networking:
gateway:
enabled: true
name: "gateway"
namespace: "anyscale-operator"
apiVersion: "gateway.networking.k8s.io/v1"
hostname: "<gateway-address>"
# ip: "<gateway-ip>" # Use this instead of hostname if the LB provides an IP address

To customize the Helm chart with custom patches or additional pod shapes, see Configure the Helm chart for the Anyscale operator.

Install the Anyscale operator on AKS

Run the following command to install the Anyscale operator with Helm using your values.yaml file:

helm upgrade anyscale-operator anyscale/anyscale-operator \
--namespace anyscale-operator \
-f values.yaml \
--create-namespace \
--wait \
-i

It may take several minutes for your Anyscale cloud to be ready. You can watch the deployment status with the following command:

kubectl get deployments anyscale-operator -n anyscale-operator -w

7. Verify your Anyscale cloud

After the operator is ready, verify that your cloud is registered and functional:

anyscale cloud verify --name <your_cloud_name>