Recover from a deleted Anyscale operator extension on AKS
Recover from a deleted Anyscale operator extension on AKS
This page describes how to recover when the Anyscale operator cluster extension has been deleted from your AKS cluster. You redeploy the extension using an ARM template, reusing your existing managed identity, marketplace plan, and Anyscale cloud record.
This procedure applies when Anyscale on Azure was originally installed through the Azure marketplace, which deploys the operator as a Kubernetes cluster extension. If you installed the operator with Helm directly, reinstall it the same way. See the Anyscale on Azure documentation on Microsoft Learn.
Symptoms
When the operator extension has been deleted, you see one or both of the following signals:
- The Anyscale console shows the cloud as disconnected on the clouds page.
- The cluster extension is no longer present under your AKS cluster.
To confirm the extension is missing, navigate to Settings > Extensions for the cluster in the Azure portal. Or run the following command:
az k8s-extension list \
--cluster-name <aks-cluster-name> \
--resource-group <resource-group> \
--cluster-type managedClusters \
--query "[?extensionType=='Anyscale.AKS.Operator']"
If the query returns an empty array, the operator extension is no longer installed on the cluster.
Cause
The Microsoft.KubernetesConfiguration/extensions resource of type Anyscale.AKS.Operator was deleted from the AKS cluster. The surrounding infrastructure remains intact. The user-assigned managed identity, federated identity credential, accepted marketplace plan, and Anyscale cloud record in your tenant are all still present. Recovery requires redeploying only the cluster extension with configuration that matches the original install.
Prerequisites
Before you begin, confirm the following resources from the original Anyscale on Azure deployment are still in place:
- The AKS cluster.
- The user-assigned managed identity configured for workload identity federation with the operator service account.
- The Anyscale cloud resource (
Anyscale.Platform/clouds/cloudResources/default) in your tenant. - An accepted marketplace plan for
anyscale-operator-aks.
You also need:
- The Azure CLI (
az) installed and authenticated to the subscription that owns the AKS cluster. - Permission to deploy ARM templates to the AKS cluster's resource group.
Solution
To redeploy the operator extension, complete the following steps:
Step 1: Collect the parameter values
You need three values from your existing Anyscale on Azure deployment:
| Parameter | Description |
|---|---|
aksClusterResourceId | Full resource ID of the AKS cluster. |
cloudDeploymentId | The cloudResourceId of the existing Anyscale.Platform/clouds/cloudResources/default resource. |
iamIdentityClientId | Client ID of the user-assigned managed identity federated with the operator service account. |
If you have the outputs from the original marketplace deployment that installed Anyscale on Azure, pull all three values from there. View the deployment under the Deployments tab of the resource group in the Azure portal, or run the following command:
az deployment group show \
--name <original-deployment-name> \
--resource-group <resource-group> \
--query properties.outputs
If the original deployment outputs aren't available, fetch each value directly.
To get the AKS cluster resource ID, run the following command:
az aks show \
--name <aks-cluster-name> \
--resource-group <resource-group> \
--query id \
--output tsv
To get the managed identity client ID, run the following command:
az identity show \
--name <managed-identity-name> \
--resource-group <resource-group> \
--query clientId \
--output tsv
To get the cloud deployment ID, open your cloud in the Anyscale console and copy the cloud deployment ID. The same value also appears in the cloudResourceId property of the Anyscale.Platform/clouds/cloudResources/default resource in your tenant.
Step 2: Save the ARM template
Save the following template as operator-recovery.json:
operator-recovery.json
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"aksClusterResourceId": {
"type": "string",
"metadata": {
"description": "The full resource ID of the AKS cluster where the operator extension will be (re)installed."
}
},
"cloudDeploymentId": {
"type": "string",
"metadata": {
"description": "The Anyscale cloud deployment ID (cloudResourceId from the existing Anyscale.Platform/clouds/cloudResources/default resource). Find it in the Anyscale console or in the outputs of the original mainTemplate deployment."
}
},
"iamIdentityClientId": {
"type": "string",
"metadata": {
"description": "The client ID (UUID) of the existing user-assigned managed identity used for workload identity federation with the operator service account."
}
},
"extensionResourceName": {
"type": "string",
"defaultValue": "anyscaleoperator",
"metadata": {
"description": "The name for the Kubernetes extension resource. Must match the name of the previously deployed extension to retain configuration continuity."
}
},
"controlPlaneUrl": {
"type": "string",
"defaultValue": "https://console.azure.anyscale.com",
"metadata": {
"description": "The Anyscale control plane URL."
}
},
"authAudience": {
"type": "string",
"defaultValue": "api://086bc555-6989-4362-ba30-fded273e432b/.default",
"metadata": {
"description": "The Azure AD audience for authentication (format: api://GUID/.default)."
}
},
"releaseTrain": {
"type": "string",
"defaultValue": "stable",
"metadata": {
"description": "The release train for the cluster extension (e.g., stable)."
}
},
"serviceAccountName": {
"type": "string",
"defaultValue": "anyscale-operator",
"metadata": {
"description": "The Kubernetes service account name the operator workloads use. Must match the subject in the existing federated identity credential."
}
},
"extensionTypeName": {
"type": "string",
"defaultValue": "Anyscale.AKS.Operator",
"metadata": {
"description": "The cluster extension type name."
}
},
"planName": {
"type": "string",
"defaultValue": "anyscale-operator",
"metadata": {
"description": "The marketplace plan name."
}
},
"planPublisher": {
"type": "string",
"defaultValue": "anyscale1750870039553",
"metadata": {
"description": "The marketplace plan publisher."
}
},
"planProduct": {
"type": "string",
"defaultValue": "anyscale-operator-aks",
"metadata": {
"description": "The marketplace plan product (offer ID)."
}
},
"autoUpgradeMinorVersion": {
"type": "bool",
"defaultValue": true,
"metadata": {
"description": "Whether the cluster extension should auto-upgrade to new minor versions on the release train."
}
},
"aksExtensionDeploymentName": {
"type": "string",
"defaultValue": "[concat('AksExtensionDeployment-', utcNow('yyyyMMddHHmmss'))]",
"metadata": {
"description": "The name for the AKS extension nested deployment."
}
}
},
"variables": {
"aksResourceGroup": "[split(parameters('aksClusterResourceId'), '/')[4]]"
},
"resources": [
{
"type": "Microsoft.Resources/deployments",
"apiVersion": "2025-04-01",
"name": "[parameters('aksExtensionDeploymentName')]",
"resourceGroup": "[variables('aksResourceGroup')]",
"properties": {
"expressionEvaluationOptions": {
"scope": "Inner"
},
"mode": "Incremental",
"parameters": {
"extensionResourceName": {
"value": "[parameters('extensionResourceName')]"
},
"aksClusterResourceId": {
"value": "[parameters('aksClusterResourceId')]"
},
"planName": {
"value": "[parameters('planName')]"
},
"planPublisher": {
"value": "[parameters('planPublisher')]"
},
"planProduct": {
"value": "[parameters('planProduct')]"
},
"extensionTypeName": {
"value": "[parameters('extensionTypeName')]"
},
"releaseTrain": {
"value": "[parameters('releaseTrain')]"
},
"autoUpgradeMinorVersion": {
"value": "[parameters('autoUpgradeMinorVersion')]"
},
"cloudDeploymentId": {
"value": "[parameters('cloudDeploymentId')]"
},
"controlPlaneUrl": {
"value": "[parameters('controlPlaneUrl')]"
},
"iamIdentityClientId": {
"value": "[parameters('iamIdentityClientId')]"
},
"authAudience": {
"value": "[parameters('authAudience')]"
},
"serviceAccountName": {
"value": "[parameters('serviceAccountName')]"
}
},
"template": {
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"extensionResourceName": {
"type": "string"
},
"aksClusterResourceId": {
"type": "string"
},
"planName": {
"type": "string"
},
"planPublisher": {
"type": "string"
},
"planProduct": {
"type": "string"
},
"extensionTypeName": {
"type": "string"
},
"releaseTrain": {
"type": "string"
},
"autoUpgradeMinorVersion": {
"type": "bool"
},
"cloudDeploymentId": {
"type": "string"
},
"controlPlaneUrl": {
"type": "string"
},
"iamIdentityClientId": {
"type": "string"
},
"authAudience": {
"type": "string"
},
"serviceAccountName": {
"type": "string"
}
},
"resources": [
{
"type": "Microsoft.KubernetesConfiguration/extensions",
"apiVersion": "2025-03-01",
"name": "[parameters('extensionResourceName')]",
"scope": "[parameters('aksClusterResourceId')]",
"plan": {
"name": "[parameters('planName')]",
"publisher": "[parameters('planPublisher')]",
"product": "[parameters('planProduct')]"
},
"properties": {
"extensionType": "[parameters('extensionTypeName')]",
"autoUpgradeMinorVersion": "[parameters('autoUpgradeMinorVersion')]",
"releaseTrain": "[parameters('releaseTrain')]",
"configurationSettings": {
"global.cloudDeploymentId": "[parameters('cloudDeploymentId')]",
"global.controlPlaneURL": "[parameters('controlPlaneUrl')]",
"global.auth.iamIdentity": "[parameters('iamIdentityClientId')]",
"global.auth.audience": "[parameters('authAudience')]",
"workloads.serviceAccount.name": "[parameters('serviceAccountName')]"
}
}
}
]
}
}
}
],
"outputs": {
"extensionResourceName": {
"type": "string",
"value": "[parameters('extensionResourceName')]"
},
"aksClusterResourceId": {
"type": "string",
"value": "[parameters('aksClusterResourceId')]"
}
}
}
The template wraps the cluster extension resource in a nested deployment so the deployment can target the AKS cluster's resource group from any scope. The default values for the marketplace plan, control plane URL, auth audience, release train, and extension resource name match what the original marketplace install used. Leave the defaults in place unless you originally installed with non-default values.
Step 3: Build the parameters file
Save the following as operator-recovery.parameters.json, replacing the three placeholder values with the ones you collected in Step 1:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"aksClusterResourceId": {
"value": "<aks-cluster-resource-id>"
},
"cloudDeploymentId": {
"value": "<cloud-deployment-id>"
},
"iamIdentityClientId": {
"value": "<managed-identity-client-id>"
}
}
}
Step 4: Deploy the template
Run the deployment against the resource group that contains the AKS cluster:
az deployment group create \
--resource-group <resource-group> \
--template-file operator-recovery.json \
--parameters operator-recovery.parameters.json
The deployment creates the cluster extension and waits for it to provision. Provisioning typically takes a few minutes.
Verification
Confirm the recovery by checking all three of the following signals:
-
The cluster extension reports a successful provisioning state. Run the following command:
az k8s-extension show \
--cluster-name <aks-cluster-name> \
--resource-group <resource-group> \
--cluster-type managedClusters \
--name anyscaleoperator \
--query provisioningState \
--output tsvThe command returns
Succeeded. -
The operator pods are running in the
anyscale-operatornamespace. Run the following command:kubectl get pods -n anyscale-operatorAll pods report a
Runningstatus. -
The Anyscale console shows the cloud as connected on the clouds page.