Skip to main content

Update an Anyscale service

This page provides an overview of the features Anyscale provides for updating services in production.

Deploy an update to a service

Deploying a service that matches the name of an existing service, triggers an update.

anyscale service deploy --config-file <config-file> -n <service-name>
note

Service names are unique within an Anyscale project. You can have multiple services with the same name in an Anyscale cloud or organization.

Anyscale recommends always identifying your service name, cloud, and project in the service config.

See Configure and deploy Anyscale services.

How does Anyscale update services?

Anyscale updates all services using a zero-downtime rollout strategy by default, gradually transitioning traffic to the new version. The following table defines the concepts Anyscale uses to describe the update process:

ConceptDefinition
RolloutA rollout is the process Anyscale uses to incrementally update a service in production.

During a rollout, your service has two versions of your Ray Serve applications running on different clusters. Anyscale gradually shifts traffic from the old version of your service to the new version.
Primary versionThe primary version is the current version of a service you have in production.

All running services have a primary version.
Canary versionThe canary version is the version of the service you are deploying.

A canary version only exists while a rollout is in progress.

When you trigger an update, the following happens:

  1. Anyscale starts a new cluster to run the canary version.
  2. Anyscale monitors the canary version and waits for all your Ray Serve applications to become healthy.
  3. Anyscale begins to gradually shift traffic from your primary version to the canary version.
  4. Anyscale terminates the primary version once all traffic has migrated to the canary version.
  5. The canary version becomes the primary version.

If the canary version fails to start or becomes unhealthy during the rollout, Anyscale automatically performs a rollback for the service and sends all traffic to the primary version. By default, rollouts that don't complete within 120 minutes timeout and rollback. Contact Anyscale support for help troubleshooting rollouts.

note

When you manually control a service rollout, this overrides default rollout and rollback behavior. See Manually-controlled rollouts.

You can manually rollback a service during a rollout using the anyscale service rollback command.

Query a specific version during a rollout

note

This feature is available by default for Anyscale clouds on Google Cloud. To use this feature with Anyscale clouds on AWS, contact Anyscale support.

During a rollout, Anyscale automatically splits requests to the service between the primary and canary versions. You can route requests to a specific version during a rollout by adding the X-ANYSCALE-VERSION HTTP header. Set the header value to primary to route to the primary version or canary to route to the canary version, as in the following examples:

curl -H "Authorization: Bearer <bearer-token>" -H "X-ANYSCALE-VERSION: primary"  <service-url>

curl -H "Authorization: Bearer <bearer-token>" -H "X-ANYSCALE-VERSION: canary" <service-url>

Update a service with constrained resources

By default, Anyscale fully duplicates your service cluster during a rollout. For large service clusters or services that use difficult to acquire instances such as GPUs, this might not be possible due to high costs or instance availability. To overcome these limitations, use the --max-surge-percent option to specify a percentage of your total service resources that Anyscale can duplicate during an incremental rollout, as in the following syntax example:

anyscale service deploy --config-file /path/to/service-config.yaml --max-surge-percent=25

This option limits the total amount of Ray Serve replicas running across the two clusters as a percentage of their steady-state capacity, reducing the total quantity of hardware needed for the update.

note

You should always enable worker group autoscaling with min_nodes=0 in your compute config when using --max-surge-percentage. This ensures that the canary version only add worker nodes as necessary and your primary version can remove all worker nodes as replicas incrementally migrate.

For example, configuring --max-surge-percent=25 has the following outcome:

  1. Anyscale starts a canary version cluster.
  2. Anyscale deploys 25% of the replicas for all applications and deployments in your service.
    • Anyscale deploys at least one replica for all deployments in your applications.
    • If replicas in your workload require specific compute instances, Anyscale deploys these instances.
  3. Anyscale shifts 25% of the traffic to the canary version.
  4. Anyscale reduces the capacity of the primary version by 25%.

This continues until all traffic has shifted to the canary version and the primary version can terminate, as shown in the following table:

Canary version replicasCanary version trafficPrimary version replicasPrimary version traffic
00100100
250100100
25257575
50257575
50505050
75505050
75752525
100752525
10010000

Anyscale ensures that the percentage of traffic sent to a cluster never exceeds its current capacity and ensures that the total capacity across both clusters never exceeds 100 + max-surge-percent, in this example 125%.

How do constrained rollouts interact with replica scaling configurations?

If you use static replica configurations, Anyscale rounds up when calculating the number of replicas to run in the canary version. For example, if you configure a deployment to run seven replicas at full capacity, then four replicas run at 50% capacity.

note

Until 100% of replicas and traffic migrate, at least one replica for each deployment must run in both your primary and canary version.

The percentages apply to all deployments across all applications in your service. For autoscaling deployments, the percentages adjust the min_replicas and max_replicas bounds, and autoscaling behaves as usual within these adjusted bounds.

For the canary cluster, the percentage also applies to initial_replicas. The adjusted initial_replicas value is the floor for the number of replicas until the rollout finishes.

Rollbacks with constrained rollouts

When a constrained rollout fails and a rollback triggers, Anyscale attempts to return the service to a stable state as quickly as possible.

In a normal incremental rollout, Anyscale doesn't remove replicas from the primary version until all traffic has successfully migrated to the canary version. In a constrained rollout, because Anyscale incrementally removes resources as replicas migrate to the canary version, the primary version might not have sufficient replicas or worker nodes to handle all traffic during a rollback. When a rollback triggers on a constrained rollout, the following rollback behaviors trigger simultaneously:

  • Anyscale routes all traffic to the primary version.
  • Anyscale activates all replicas on the primary version, adding worker nodes as required.
  • Anyscale deactivates all replicas on the canary version and terminates the cluster.

Requests might fail during this rollback, and might fail at a higher percentage if most replicas had already migrated to the canary version. Anyscale recommends configuring retries and timeouts to improve resilience to request failures. See Manage timeouts and retries for Anyscale services.

note

If you run a manual rollback with the anyscale service rollback command, you can specify the max-surge-percent to gradually shift capacity back to the original cluster.

Manually-controlled rollouts

By default, rollouts proceed automatically and shift traffic using a predefined pattern. To manually control the rollout instead, pass the --canary-percent option to anyscale service deploy. The traffic progression pauses at the specified traffic split until you run another command. To complete the rollout, set --canary-percent=100.

note

You can use --canary-percent with --max-surge-percent to manually control a gradual rollout with constrained resources.

When you use manual rollouts, you must also use manual rollbacks. Setting --canary-percent disables automatic rollback behavior.

Update a service in-place

You can override default behavior for incremental rollouts by using the --in-place option. This option replaces your existing service with the new version. When possible, this update reuses your existing cluster and applies updates without needing to acquire new instances from the cloud provider or restart the cluster.

caution

Anyscale recommends against updating production services in-place. Anyscale supports this feature to enable rapid iteration in development.

Ray Serve implements in-place updates through two different methods: lightweight updates and heavyweight updates.

  • Lightweight updates involve changing configuration options that don't affect the replica code such as the number of replicas, autoscaling configuration, or logging configuration. Changing these options doesn't require stopping old replicas and starting new ones.
  • Heavyweight updates involve modifying other aspects of the Ray Serve configuration that affect the replica code. These updates require stopping old replicas and starting new ones.
note

You can't update the cloud, image, or compute config of a service using an in-place update.

You can't specify --canary-percent and --max-surge-percent for in-place updates.

See Ray Serve in-place update for more details.