Skip to main content

Upgrade a Service

note

Use of Anyscale Services requires Ray 2.3+.

This guide walks through how to upgrade an Anyscale Service. It discusses:

  • Steps that occur during an upgrade.
  • Settings that can be configured for an upgrade.
  • Best practices when managing an upgrade.

Performing a rollout

To upgrade a service, run the anyscale service rollout -f [UPDATED CONFIG FILE] command. This command triggers an automatic rollout that provisions a new cluster for your new service version. Once the cluster and Serve application become healthy, Anyscale gradually shifts traffic to the new cluster. After all traffic is shifted, Anyscale terminates the old cluster.

If no additional rollout options are set, Anyscale increases the traffic sent to the new service version (known as the canary percent) in a predefined pattern.

On Amazon Web Services (AWS), the canary percent progression is:

0 -> 5 -> 10 -> 20 -> 30 -> 50 -> 75 -> 100

It takes approximately 2 minutes to complete an automatic rollout from the moment the new version becomes healthy.

On Google Cloud Platform (GCP), the canary percent progression is:

0 -> 10 -> 50 -> 100

It takes approximately 8 minutes to complete an automatic rollout from the moment the new version becomes healthy. The time to roll out a service on GCP is extended due to additional health checks made by GCP on the load balancing resources. For additional details, refer to the appendix.

note
  • If at any point the new version becomes unhealthy, the new version will be rolled back and all of the traffic will be routed to the original version.
  • To debug a canary version, we recommend deploying a separate test service or utilizing an Anyscale workspace.
  • Rollouts that don't complete in 120 minutes are automatically rolled back.

Example

This example upgrades a service to show a different message when queried.

First deploy a running service with Deploy your first Service

Update the SERVE_RESPONSE_MESSAGE environment variable in service-v2.yaml, then perform a new rollout using the anyscale service rollout CLI command.

Anyscale uses the name attribute to determine which service to upgrade.

caution

You cannot change the Anyscale Cloud your service runs in once it has been deployed.

name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was rolled out using an automatic rollout

Anyscale will roll out a new version of your service. Once the new version is healthy, traffic will incrementally shift to the new version. After the rollout is complete, the previous version will be terminated.

After your rollout is complete, query the service to see the new message.

Handling resource constraints

By default, rollouts start a full copy of the new application on a separate cluster and then gradually shift traffic.

However some applications may have limited access to hardware, especially those using scarce resources like A100 or H100 GPUs. In these cases, running 2 full-sized clusters may not be possible.

Setting --max-surge-percent in the anyscale service rollout command limits the total amount of replicas running across either cluster. This in turn limits the number of nodes needed for the rollout, allowing it to proceed without using two full-sized clusters. For example, setting --max-surge-percent=50 makes Anyscale:

  • Provision a new cluster
  • Start the new application on the cluster, but disable all replicas
  • Until all traffic is sent to the new cluster:
    • Start 50% of the replicas on the new cluster
    • If necessary, start nodes on the new cluster to run the new replicas
    • Shift 50% of the traffic to the new cluster
    • Stop 50% of the replicas on the old cluster
    • Remove empty nodes from the old cluster
    • Repeat
  • Terminate the old cluster

At any point in time, at most 100% of replicas are active on one cluster and 50% on the other. On the other hand, when max-surge-percent is unspecified, Anyscale keeps 100% of replicas active on both clusters throughout the rollout.

Anyscale ensures that the percent of traffic sent to a cluster never exceeds the percent of replicas active on that cluster. Anyscale also ensures that the total percentage of replicas across both clusters never exceeds 100 + max-surge-percent (such as 150% in the above example).

max-surge-percent can be updated during a rollout by running anyscale service rollout again with a different max-surge-percent value, similar to updating a canary-percent.

Important: Set Min worker nodes in the compute config to 0, even if no deployments use Serve autoscaling. This lets the cluster scale the number of nodes as replicas are activated and deactivated.

note
  • Ray Serve rounds up when calculating the number of replicas to run. For example, if a deployment uses 7 replicas, and 50% are activated, then Serve runs 4 replicas.
  • max-surge-percent and canary-percent can be set at the same time.

Autoscaling

The percentages are applied to all deployments across all applications in the Serve config. For autoscaling deployments, the percentages are applied to the min_replicas and max_replicas bounds, and autoscaling behaves as usual between these adjusted bounds.

For the new cluster, the percentages are also applied to initial_replicas. This adjusted initial_replicas value is the floor for the number of replicas until the rollout is finished. For the old cluster, initial_replicas is ignored.

Rollbacks

During rollback, the service still obeys max-surge-percent. The number of replicas and the percent of traffic are gradually shifted back to the original cluster.

The max-surge-percent during a rollback can be updated by setting --max-surge-percent through the anyscale service rollback command.

note

If the new service version becomes unhealthy during a rollout when max-surge-percent is set but canary-percent is not set, Anyscale rolls the service back immediately. 100% of traffic is routed to the old cluster, all replicas are activated on the old cluster, and all replicas are deactivated on the new cluster. These steps happen concurrently, so there may be a brief period when requests fail. This process ensures that the service returns to a stable state as quickly as possible.

Setting canary percent

The anyscale service rollout command offers a --canary-percent flag, which sets the portion of traffic directed to the new version. When this flag is set, Anyscale pauses the rollout after shifting this amount of traffic to the new cluster. You must then manually increase the canary-percent by rerunning the anyscale service rollout command with higher --canary-percent values.

This flag grants full control over the speed of the rollout. As a result, when this flag is set, Anyscale doesn't perform automatic rollback if the new service becomes unhealthy. You must monitor the service and roll back manually, either by setting a lower canary percent or by running the anyscale service rollback command.

note
  • If the canary version becomes UNHEALTHY before reaching a RUNNING state, the service version will not roll out and the service will remain at 0% canary deployment. This safeguard prevents rolling out to an unhealthy service version.
  • Setting the canary percentage to 100 automatically completes the rollout. To prevent completing the rollout, include the --no-auto-complete-rollout flag in the CLI.

We recommend setting --canary-percent for:

  1. A/B testing between different versions
  2. When you require fine-grained control over the speed of the rollout

Example

This example upgrades a service with the --canary-percent flag.

First deploy a running service with Deploy your first Service

Let's create a new version and manually set the canary percent to 50%. First update the SERVE_RESPONSE_MESSAGE environment variable in service-v2.yaml, then perform a new rollout using the anyscale service rollout CLI command with the flag --canary-percent 50.

Anyscale uses the name attribute to determine which service to upgrade.

caution

You cannot upgrade the cloud your service runs in.

name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was rolled out using a manual rollout

Anyscale will roll out a new version of your service. Once the new version is healthy, 50% of the traffic will shift to the new version.

The new message should appear roughly half of the time when querying your service.

Modify a service without starting a new cluster

Anyscale offers the --rollout-strategy=IN_PLACE option to modify the Ray Serve config of a running service without starting a new cluster. It's important to note that in-place upgrades do not automatically roll back in the event of a failure, making them riskier compared to rolling upgrades.

We recommend in-place upgrades when:

  1. Changing the autoscaling for Serve deployments
  2. Changing a deployment's user_config
  3. Iterating on your application in development

You cannot set --max-surge-percent or --canary-percent for in-place upgrades.

Best practices

If using in-place upgrades to modify your code, we recommend the following best practices:

  1. Validate code changes comprehensively before applying the in-place upgrade
  2. Ensure each Ray Serve deployment includes more than 1 replica and 1 node, to avoid any downtime
  3. If your service becomes unhealthy during in-place upgrade, use a rolling upgrade to revert to a previous version
caution

You cannot upgrade the cloud, cluster_env, and compute_config your service runs in. An in-place upgrade will fail if any of these values have been modified from the original Service Version.

Example

This example upgrades a service to show a different message when queried.

First deploy a running service with Deploy your first Service

Update the SERVE_RESPONSE_MESSAGE environment variable in service-v2.yaml, then perform a new rollout using the anyscale service rollout CLI command with the flag --rollout-strategy IN_PLACE.

Anyscale uses the name attribute to determine which service to upgrade.

name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was upgraded using an in_place upgrade

Anyscale will immediately update the service to the new version, and the new application will be deployed in the same cluster. After applying the in-place upgrade, the new message will appear when querying the service.

Pause the rollout before completion

By default, once a rollout reaches a canary percent of 100, Anyscale completes the rollout by terminating the old cluster and sending all traffic to the new cluster.

Include the --no-auto-complete-rollout flag in the anyscale service rollout command to stop the rollout from completing. When this flag is set, Anyscale pauses the rollout at canary percent 100 and waits for an anyscale service rollout command without the --no-auto-complete-rollout flag to complete the rollout.

Appendix

Where can I learn more about Ray serve?

https://docs.ray.io/en/latest/serve/index.html#

How does Ray Serve implement an in-place upgrade?

Ray Serve implements in-place upgrades through two different methods: lightweight upgrades and heavyweight upgrades.

Lightweight upgrades: Lightweight upgrades involve making changes such as modifying the number of replicas, updating the user configuration, or adjusting autoscaling functionality. These upgrades are more robust as they do not require tearing down old replicas.

Heavyweight upgrades: Heavyweight upgrades involve modifying other aspects of the Ray Serve configuration. This process requires creating new replicas while concurrently tearing down the old ones. During a heavyweight upgrade, traffic is evenly distributed between the new and old replicas. It is important to exercise caution when performing heavyweight upgrades, especially in production services.

During a heavyweight upgrade, the old replicas are torn down concurrently with the creation of new replicas until the service is fully upgraded. By default, Ray Serve performs the upgrade in batches, replacing 20% of the total replicas at a time with the new version. It is important to note that this process does not guarantee that the number of running replicas are enough to handle the traffic during the upgrade.

Where can I learn more about in-place upgrades?

You can find more in Updating the Serve application in the Ray docs.

How can I disable completing the rollout?

To prevent the completion of the rollout, use the --no-auto-complete-rollout flag in the CLI. When the canary percentage reaches 100%, this flag will prevent the termination of the primary version cluster. This flag can be used for either automatic or manual rollouts.

What happens if the new replicas become unhealthy during an in-place upgrade?

In the event that the new replicas become unhealthy during the upgrade, the serve controller will halt the upgrade process. This will result in the service becoming unhealthy but may still be capable of serving traffic if there are remaining old replicas. If the service was deployed with only one replica, it will become completely unavailable since the healthy replica was removed. The service will remain in an unhealthy state until the user takes remedial action. This can be achieved by redeploying the old version in place or rolling back to the previous version.

Why does a rollout on GCP take longer than AWS?

On GCP, it takes 1 minute to update the URL map for a load balancing resource. However, as a safeguard, Anyscale waits 90 seconds before making any following updates. For 4 increments in the rollout progression, the total time required to update the resources is 6 minutes. Additionally, other operations such as health checks contribute another 2 minutes during the rollout.