Update a service
This guide covers how to update an Anyscale Service, including:
- Triggering an update to a service.
- Understanding the steps of the update process.
- Configuring the update behavior.
- Applying best practices when updating production services.
Terminology
- Service: an Anyscale-managed set of Ray Serve apps that consist of one or more clusters.
- Rollout: the process of updating a service from its current (primary) version to a new (canary) version.
- Primary version: the main cluster serving traffic for a service with the current configuration options. There is only a primary version in the absence of an ongoing rollout.
- Canary version: a cluster using the new configuration during a rollout. Anyscale gradually shifts traffic toward the canary version until it reaches 100%, at which point the rollout is complete and the canary version becomes the new primary version.
Update a service
To update a service, use anyscale service deploy
. If the service is already running (identified by its name), this command triggers an automatic rollout that does the following:
- Starts a new cluster for the newly deployed service version. The new version is called the canary version.
- Waits for the new cluster and Serve apps to become healthy.
- Gradually shifts traffic from the old cluster, the primary version, to the new cluster, the canary version.
- Completes the rollout and terminates the old cluster.
Manually-controlled rollouts
By default, rollouts proceed automatically and shift traffic using a predefined pattern. To manually control the rollout instead, pass the --canary-percent
option to anyscale service deploy
. The traffic progression pauses at the specified traffic split until you run another command. To complete the rollout, set --canary-percent=100
.
Handle failures and rolling back
If the canary version fails to start or becomes unhealthy during the rollout, Anyscale rolls back the service and sends all traffic to the primary version.
Manually roll back a service during a rollout using the anyscale service rollback
command.
Rollouts that don't complete within 120 minutes are automatically rolled back.
Advanced features
Resource-constrained updates
For applications constrained by hardware availability, especially those using high-demand resources like A100 or H100 GPUs, running multiple full-scale clusters may not be feasible.
By default, rollouts start a full copy of the new app on the canary cluster and then gradually shift traffic to it. To limit the resources used during a rollout, pass the --max-surge-percent
option to anyscale service deploy
. This option limits the total amount of Ray Serve replicas running across the two clusters as a percentage of their steady-state capacity, reducing the total quantity of hardware needed for the update.
For example, setting --max-surge-percent=50
does the following:
- Starts the canary version cluster
- Starts the applications on the canary version, but only at 50% capacity
- Shifts 50% of the traffic to the canary version
- Reduces the capacity of the primary version to 50%
- Increases the capacity of the canary version to 100%
- Shifts the remaining 50% of the traffic to the canary version
- Completes the rollout
At any point in time, at most 100% of total capacity is active on one cluster and 50% on the other, totalling 150%. On the other hand, if you don't specify max-surge-percent
, Anyscale allocates 100% capacity to both clusters, totalling 200%.
Anyscale ensures that the percentage of traffic sent to a cluster never exceeds its current capacity and ensures that the total capacity across both clusters never exceeds 100 + max-surge-percent
.
If you use a custom ComputeConfig, make sure to set min_nodes=0
for worker node groups. Otherwise the cluster can't scale all the way down.
Autoscaling
Ray Serve rounds up when calculating the number of replicas to run. For example, if you configure a deployment to run 7 replicas at full capacity, then at 50% capacity, Ray Serve runs 4 replicas.
The percentages apply to all deployments across all applications in the Ray Serve config. For autoscaling deployments, the percentages adjust the min_replicas
and max_replicas
bounds, and autoscaling behaves as usual within these adjusted bounds.
For the new cluster, the percentages are also applied to initial_replicas
. This adjusted initial_replicas
value is the floor for the number of replicas until the rollout finishes. For the old cluster, Ray Serve ignores initial_replicas
.
Rollbacks
Services adhere to --max-surge-percent
during rollbacks. The capacity and percentage of traffic gradually shift back to the original cluster.
The max-surge-percent
during a rollback can be updated by setting --max-surge-percent
through the anyscale service rollback
command.
If the new service version becomes unhealthy during a rollout and you set max-surge-percent
but not canary-percent
, Anyscale rolls the service back immediately. Anyscale routes 100% of the traffic to the old cluster, activates all replicas on the old cluster, and deactivates all replicas on the new cluster. These steps happen concurrently, so there may be a brief period when requests fail. This process ensures that the service returns to a stable state as quickly as possible.
In-place updates
In-place updates are not recommended for production use. Use them primarily for rapid iteration in development.
By default, updating a service performs a rollout that starts a new cluster and shifts traffic to it. However, this process can be slow due to the time needed to provision new nodes and start the app. During development or when updating Ray Serve configuration options only, it's sometimes preferable to instead update the configuration of the primary version in place. In-place updates aren't recommended for production services.
To performance an in-place update, use the --in-place
option to anyscale service deploy
.
This command sends the updated configuration to Ray Serve, which updates the apps on the cluster to match the new configuration.
You can't update the cloud, image, or compute_config
of a service using an in-place update.
Additionally, you can't specify --canary-percent
and --max-surge-percent
for in-place updates.
Update behavior
Ray Serve implements in-place updates through two different methods: lightweight updates and heavyweight updates.
- Lightweight updates involve changing configuration options that don't affect the replica code such as the number of replicas, autoscaling configuration, or logging configuration. Changing these options doesn't require stopping old replicas and starting new ones.
- Heavyweight updates involve modifying other aspects of the Ray Serve configuration that affect the replica code. These updates require stopping old replicas and starting new ones.
See Ray Serve in-place update for more details.
Querying a specific version during a rollout
During a rollout, requests to the service are split between the primary and canary versions based on the current canary percent.
curl -H "Authorization: Bearer xxxx" https://sample-service.cld.s.anyscaleuserdata.com/
To route to a specific version during a rollout, you can use the X-ANYSCALE-VERSION
HTTP header. Set the header value to primary
to route to the primary version or canary
to route to the canary version. Note that The X-ANYSCALE-VERSION
HTTP header is case sensitive.
This feature is available for all Services created on GCP clouds. To use this feature on AWS clouds, reach out to Anyscale support.
curl -H "Authorization: Bearer xxxx" -H "X-ANYSCALE-VERSION: primary" https://sample-service.cld.s.anyscaleuserdata.com/
curl -H "Authorization: Bearer xxxx" -H "X-ANYSCALE-VERSION: canary" https://sample-service.cld.s.anyscaleuserdata.com/