Upgrade a Service
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Use of Anyscale Services requires Ray 2.3+.
This guide walks through how to upgrade an Anyscale Service. It discusses:
- Steps that occur during an upgrade.
- Settings that can be configured for an upgrade.
- Best practices when managing an upgrade.
Performing a rollout
To upgrade a service, run the anyscale service rollout -f [UPDATED CONFIG FILE]
command. This command triggers an automatic rollout that provisions a new cluster for your new version. Once the cluster and Serve application become healthy, Anyscale gradually shifts traffic to the new cluster. After all traffic is shifted, Anyscale terminates the old cluster.
If no additional rollout options are set, Anyscale increases the traffic sent to the new version (known as the canary percent) in a predefined pattern.
On Amazon Web Services (AWS), the canary percent progression is:
0 -> 5 -> 10 -> 20 -> 30 -> 50 -> 75 -> 100
It takes approximately 2 minutes to complete an automatic rollout from the moment the new version becomes healthy.
On Google Cloud Platform (GCP), the canary percent progression is:
0 -> 10 -> 50 -> 100
It takes approximately 8 minutes to complete an automatic rollout from the moment the new version becomes healthy. The time to roll out a service on GCP is extended due to additional health checks made by GCP on the load balancing resources. For additional details, refer to the appendix.
- If at any point the new version becomes unhealthy, the new version will be rolled back and all of the traffic will be routed to the original version.
- To debug a canary version, we recommend deploying a separate test service or utilizing an Anyscale workspace.
- Rollouts that don't complete in 120 minutes are automatically rolled back.
Example
This example upgrades a service to show a different message when queried.
First deploy a running service with Deploy your first Service
Update the SERVE_RESPONSE_MESSAGE
environment variable in service-v2.yaml
, then perform a new rollout using the anyscale service rollout
CLI command.
Anyscale uses the name
attribute to determine which service to upgrade.
You cannot change the Anyscale Cloud your service runs in once it has been deployed.
- CLI
- service-v2.yaml
- hello_world.py
anyscale service rollout -f service-v2.yaml
(anyscale +3.3s) View the service in the UI at https://console.anyscale.com/services/service_sc2JFXVupTCLy6CKqcYq5rmN.
name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was rolled out using an automatic rollout
from fastapi import FastAPI
from ray import serve
import os
# When the env var is updated, users see a new return value.
msg = os.getenv("SERVE_RESPONSE_MESSAGE", "Hello world!")
app = FastAPI()
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class HelloWorld:
@app.get("/")
def hello(self):
return msg
@app.get("/healthcheck")
def healthcheck(self):
return
entrypoint = HelloWorld.bind()
Anyscale will roll out a new version of your service. Once the new version is healthy, traffic will incrementally shift to the new version. After the rollout is complete, the previous version will be terminated.
After your rollout is complete, query the service to see the new message.
Handling resource constraints
By default, rollouts start a full copy of the new application on a separate cluster and then gradually shift traffic.
However some applications may have limited access to hardware, especially those using scarce resources like A100 or H100 GPUs. In these cases, running 2 full-sized clusters may not be possible.
Setting --max-surge-percent
in the anyscale service rollout
command limits the total amount of replicas running across either cluster. This in turn limits the number of nodes needed for the rollout, allowing it to proceed without using two full-sized clusters. For example, setting --max-surge-percent=50
makes Anyscale:
- Provision a new cluster
- Start the new application on the cluster, but disable all replicas
- Until all traffic is sent to the new cluster:
- Start 50% of the replicas on the new cluster
- If necessary, start nodes on the new cluster to run the new replicas
- Shift 50% of the traffic to the new cluster
- Stop 50% of the replicas on the old cluster
- Remove empty nodes from the old cluster
- Repeat
- Terminate the old cluster
At any point in time, at most 100% of replicas are active on one cluster and 50% on the other. On the other hand, when max-surge-percent
is unspecified, Anyscale keeps 100% of replicas active on both clusters throughout the rollout.
Anyscale ensures that the percent of traffic sent to a cluster never exceeds the percent of replicas active on that cluster. Anyscale also ensures that the total percentage of replicas across both clusters never exceeds 100 + max-surge-percent
(such as 150% in the above example).
max-surge-percent
can be updated during a rollout by running anyscale service rollout
again with a different max-surge-percent
value, similar to updating a canary-percent
.
Important: Set Min worker nodes
in the compute config to 0, even if no deployments use Serve autoscaling. This lets the cluster scale the number of nodes as replicas are activated and deactivated.
- Ray Serve rounds up when calculating the number of replicas to run. For example, if a deployment uses 7 replicas, and 50% are activated, then Serve runs 4 replicas.
max-surge-percent
andcanary-percent
can be set at the same time.
Autoscaling
The percentages are applied to all deployments across all applications in the Serve config. For autoscaling deployments, the percentages are applied to the min_replicas
and max_replicas
bounds, and autoscaling behaves as usual between these adjusted bounds.
For the new cluster, the percentages are also applied to initial_replicas
. This adjusted initial_replicas
value is the floor for the number of replicas until the rollout is finished. For the old cluster, initial_replicas
is ignored.
Rollbacks
During rollback, the service still obeys max-surge-percent
. The number of replicas and the percent of traffic are gradually shifted back to the original cluster.
The max-surge-percent
during a rollback can be updated by setting --max-surge-percent
through the anyscale service rollback
command.
If the new version becomes unhealthy during a rollout when max-surge-percent
is set but canary-percent
is not set, Anyscale rolls the service back immediately. 100% of traffic is routed to the old cluster, all replicas are activated on the old cluster, and all replicas are deactivated on the new cluster. These steps happen concurrently, so there may be a brief period when requests fail. This process ensures that the service returns to a stable state as quickly as possible.
Setting canary percent
The anyscale service rollout
command offers a --canary-percent
flag, which sets the portion of traffic directed to the new version. When this flag is set, Anyscale pauses the rollout after shifting this amount of traffic to the new cluster. You must then manually increase the canary-percent
by rerunning the anyscale service rollout
command with higher --canary-percent
values.
This flag grants full control over the speed of the rollout. As a result, when this flag is set, Anyscale doesn't perform automatic rollback if the new service becomes unhealthy. You must monitor the service and roll back manually, either by setting a lower canary percent or by running the anyscale service rollback
command.
- If the canary version becomes UNHEALTHY before reaching a RUNNING state, the version will not roll out and the service will remain at 0% canary deployment. This safeguard prevents rolling out to an unhealthy version.
- Setting the canary percentage to 100 automatically completes the rollout. To prevent completing the rollout, include the
--no-auto-complete-rollout
flag in the CLI.
We recommend setting --canary-percent
for:
- A/B testing between different versions
- When you require fine-grained control over the speed of the rollout
Example
This example upgrades a service with the --canary-percent
flag.
First deploy a running service with Deploy your first Service
Let's create a new version and manually set the canary percent to 50%.
First update the SERVE_RESPONSE_MESSAGE
environment variable in service-v2.yaml
, then perform a new rollout using the anyscale service rollout
CLI command with the flag --canary-percent 50
.
Anyscale uses the name
attribute to determine which service to upgrade.
You cannot upgrade the cloud your service runs in.
- CLI
- service-v2.yaml
- hello_world.py
anyscale service rollout -f service-v2.yaml --canary-percent 50
(anyscale +3.3s) View the service in the UI at https://console.anyscale.com/services/service_sc2JFXVupTCLy6CKqcYq5rmN.
name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was rolled out using a manual rollout
from fastapi import FastAPI
from ray import serve
import os
# When the env var is updated, users see a new return value.
msg = os.getenv("SERVE_RESPONSE_MESSAGE", "Hello world!")
app = FastAPI()
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class HelloWorld:
@app.get("/")
def hello(self):
return msg
@app.get("/healthcheck")
def healthcheck(self):
return
entrypoint = HelloWorld.bind()
Anyscale will roll out a new version of your service. Once the new version is healthy, 50% of the traffic will shift to the new version.
The new message should appear roughly half of the time when querying your service.
Modify a service without starting a new cluster
Anyscale offers the --rollout-strategy=IN_PLACE
option to modify the Ray Serve config of a running service without starting a new cluster. It's important to note that in-place upgrades do not automatically roll back in the event of a failure, making them riskier compared to rolling upgrades.
We recommend in-place upgrades when:
- Changing the autoscaling for Serve deployments
- Changing a deployment's
user_config
- Iterating on your application in development
You cannot set --max-surge-percent
or --canary-percent
for in-place upgrades.
Best practices
If using in-place upgrades to modify your code, we recommend the following best practices:
- Validate code changes comprehensively before applying the in-place upgrade
- Ensure each Ray Serve deployment includes more than 1 replica and 1 node, to avoid any downtime
- If your service becomes unhealthy during in-place upgrade, use a rolling upgrade to revert to a previous version
You cannot upgrade the cloud, cluster_env, and compute_config your service runs in. An in-place upgrade will fail if any of these values have been modified from the primary version.
Example
This example upgrades a service to show a different message when queried.
First deploy a running service with Deploy your first Service
Update the SERVE_RESPONSE_MESSAGE
environment variable in service-v2.yaml
, then perform a new rollout using the anyscale service rollout
CLI command with the flag --rollout-strategy IN_PLACE
.
Anyscale uses the name
attribute to determine which service to upgrade.
- CLI
- service-v2.yaml
- hello_world.py
anyscale service rollout -f service-v2.yaml --rollout-strategy IN_PLACE
(anyscale +3.3s) View the service in the UI at https://console.anyscale.com/services/service_sc2JFXVupTCLy6CKqcYq5rmN.
name: service-hello
cluster_env: default_cluster_env_2.9.0_py310:1
ray_serve_config:
applications:
- name: default
import_path: serve_hello:entrypoint
runtime_env:
working_dir: https://github.com/anyscale/docs_examples/archive/refs/heads/main.zip
env_vars:
- SERVE_RESPONSE_MESSAGE: service says hello
+ SERVE_RESPONSE_MESSAGE: service was upgraded using an in_place upgrade
from fastapi import FastAPI
from ray import serve
import os
# When the env var is updated, users see a new return value.
msg = os.getenv("SERVE_RESPONSE_MESSAGE", "Hello world!")
app = FastAPI()
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class HelloWorld:
@app.get("/")
def hello(self):
return msg
@app.get("/healthcheck")
def healthcheck(self):
return
entrypoint = HelloWorld.bind()
Anyscale will immediately update the service to the new version, and the new application will be deployed in the same cluster. After applying the in-place upgrade, the new message will appear when querying the service.
Pause the rollout before completion
By default, once a rollout reaches a canary percent of 100, Anyscale completes the rollout by terminating the old cluster and sending all traffic to the new cluster.
Include the --no-auto-complete-rollout
flag in the anyscale service rollout
command to stop the rollout from completing. When this flag is set, Anyscale pauses the rollout at canary percent 100 and waits for an anyscale service rollout
command without the --no-auto-complete-rollout
flag to complete the rollout.
Querying a specific version during a rollout
During a rollout, requests to the Service are split between the primary and canary versions based on the current canary percent.
curl -H "Authorization: Bearer xxxx" https://sample-service.cld.s.anyscaleuserdata.com/
To route to a specific version during a rollout, you can use the X-ANYSCALE-VERSION
HTTP header. Set the header value to primary
to route to the primary version or canary
to route to the canary version. Note that The X-ANYSCALE-VERSION
HTTP header is case sensitive.
This feature is available for all Services created on GCP clouds. To use this feature on AWS clouds, you must either run cloud update for managed clouds or cloud edit for registered clouds to include a newly added AWS permission. Subsequently, reach out to Anyscale support with your cloud details to enable the feature.
curl -H "Authorization: Bearer xxxx" -H “X-ANYSCALE-VERSION: primary” https://sample-service.cld.s.anyscaleuserdata.com/
curl -H "Authorization: Bearer xxxx" -H “X-ANYSCALE-VERSION: canary” https://sample-service.cld.s.anyscaleuserdata.com/
Appendix
Where can I learn more about Ray serve?
https://docs.ray.io/en/latest/serve/index.html#
How does Ray Serve implement an in-place upgrade?
Ray Serve implements in-place upgrades through two different methods: lightweight upgrades and heavyweight upgrades.
Lightweight upgrades: Lightweight upgrades involve making changes such as modifying the number of replicas, updating the user configuration, or adjusting autoscaling functionality. These upgrades are more robust as they do not require tearing down old replicas.
Heavyweight upgrades: Heavyweight upgrades involve modifying other aspects of the Ray Serve configuration. This process requires creating new replicas while concurrently tearing down the old ones. During a heavyweight upgrade, traffic is evenly distributed between the new and old replicas. It is important to exercise caution when performing heavyweight upgrades, especially in production services.
During a heavyweight upgrade, the old replicas are torn down concurrently with the creation of new replicas until the service is fully upgraded. By default, Ray Serve performs the upgrade in batches, replacing 20% of the total replicas at a time with the new version. It is important to note that this process does not guarantee that the number of running replicas are enough to handle the traffic during the upgrade.
Where can I learn more about in-place upgrades?
You can find more in Updating the Serve application in the Ray docs.
How can I disable completing the rollout?
To prevent the completion of the rollout, use the --no-auto-complete-rollout
flag in the CLI. When the canary percentage reaches 100%, this flag will prevent the termination of the primary version cluster. This flag can be used for either automatic or manual rollouts.
What happens if the new replicas become unhealthy during an in-place upgrade?
In the event that the new replicas become unhealthy during the upgrade, the serve controller will halt the upgrade process. This will result in the service becoming unhealthy but may still be capable of serving traffic if there are remaining old replicas. If the service was deployed with only one replica, it will become completely unavailable since the healthy replica was removed. The service will remain in an unhealthy state until the user takes remedial action. This can be achieved by redeploying the old version in place or rolling back to the previous version.
Why does a rollout on GCP take longer than AWS?
On GCP, it takes 1 minute to update the URL map for a load balancing resource. However, as a safeguard, Anyscale waits 90 seconds before making any following updates. For 4 increments in the rollout progression, the total time required to update the resources is 6 minutes. Additionally, other operations such as health checks contribute another 2 minutes during the rollout.