Deploy multiple versions of an Anyscale service
Deploy multiple versions of an Anyscale service
Deploy and manage multiple versions of an Anyscale service behind a single endpoint to run A/B experiments or control the rollout of new models and configurations.
This feature is in beta release. See Beta limitations and considerations for current limitations.
What are multi-version services?
Multi-version services allow you to deploy up to 10 versions of a service behind a single endpoint. Each version runs on its own cluster with independent autoscaling, and you control how traffic distributes across versions using percentage weights.
The following are examples of patterns supported by multi-version services:
- A/B testing between different models or application configurations.
- Gradual rollouts with more than two versions active simultaneously.
- Shadowing traffic to new versions for validation.
- Canary testing at scale with fine-grained traffic control.
How multi-version services work
When you deploy a multi-version service, Anyscale does the following:
- Creates a shared endpoint that distributes traffic across all active versions.
- Deploys each version to its own Ray cluster.
- Scales each version independently based on its service configuration.
- Tags logs and metrics with version names for easy filtering and comparison.
You specify traffic percentages for each version, and these percentages must sum to 100. Anyscale automatically distributes incoming requests according to these weights.
Version management concepts
You can deploy up to 10 versions to a service. You specify version names in your service configuration and use these version names to specify traffic allocation and interact with individual versions.
Version names
Each version requires a unique name within the service. Anyscale recommends specifying version names in your service configuration for easier management and tracking.
If you don't specify a version name, Anyscale generates one using the format v-{10 random alphanumeric characters}.
Traffic allocation
Traffic allocation controls what percentage of requests each version receives. You specify traffic percentages using the --versions parameter when deploying or updating a service.
The following rules apply to traffic allocation:
- All traffic percentages must sum to exactly 100.
- Each version must have a traffic percentage greater than 0.
- You can adjust traffic percentages without redeploying service configurations.
When you adjust traffic allocation, the service state shows UPDATING while Anyscale reallocates traffic and scales clusters.
Create a new service with multiple versions
To create a new service with multiple versions, provide a service configuration file for each version and specify traffic percentages using the --versions parameter.
Each service configuration must include the following:
- A
version_namefield with a unique name for the version. - The same
namevalue across all configurations, which specifies the service name. - The same
cloudandprojectvalues across all configurations.
The following example creates a service with four versions:
- CLI
- SDK
anyscale service deploy \
--versions '[{"name":"v1","traffic_percent":30}, {"name":"v2","traffic_percent":30}, {"name":"v3","traffic_percent":40}]' \
-f config_v1.yaml \
-f config_v2.yaml \
-f config_v3.yaml
import anyscale
from anyscale.service.models import ServiceConfig
configs = [
ServiceConfig.from_yaml("config_v1.yaml"),
ServiceConfig.from_yaml("config_v2.yaml"),
ServiceConfig.from_yaml("config_v3.yaml")
]
anyscale.service.deploy(
configs,
versions='[{"name":"v1","traffic_percent":30}, {"name":"v2","traffic_percent":30}, {"name":"v3","traffic_percent":40}]'
)
The following shows an example service configuration file for version v1:
name: test_service
version_name: v1
working_dir: .
applications:
- import_path: main:my_app
deployments:
- name: Model
autoscaling_config:
target_ongoing_requests: 2
min_replicas: 1
max_replicas: 1
downscale_delay_s: 10
metrics_interval_s: 10
look_back_period_s: 10
Each version can have different autoscaling configurations or application settings. The version_name field identifies the version in the --versions parameter.
Add a new version to an existing service
To add a new version to an existing service, provide a configuration file only for the new version. Existing versions maintain their current configurations.
The following example adds version v3 to a service with two existing versions and adjusts traffic across all three versions:
- CLI
- SDK
anyscale service deploy \
--versions '[{"name":"v1","traffic_percent":10}, {"name":"v2","traffic_percent":60}, {"name":"v3","traffic_percent":30}]' \
-f config_v3.yaml
import anyscale
from anyscale.service.models import ServiceConfig
config_v3 = ServiceConfig.from_yaml("config_v3.yaml")
anyscale.service.deploy(
[config_v3],
versions='[{"name":"v1","traffic_percent":10}, {"name":"v2","traffic_percent":60}, {"name":"v3","traffic_percent":30}]'
)
Anyscale deploys the new version and reallocates traffic according to the specified percentages.
Adjust traffic between versions
To adjust traffic percentages between existing versions without changing service configurations, use the --versions parameter without providing configuration files.
The following example adjusts traffic across three existing versions:
- CLI
- SDK
anyscale service deploy \
--versions '[{"name":"v1","traffic_percent":30}, {"name":"v2","traffic_percent":20}, {"name":"v3","traffic_percent":50}]'
import anyscale
anyscale.service.deploy(
[],
versions='[{"name":"v1","traffic_percent":30}, {"name":"v2","traffic_percent":20}, {"name":"v3","traffic_percent":50}]',
name="test_service"
)
The service state shows UPDATING while Anyscale reallocates traffic and scales clusters to match the new percentages.
Remove a version
To remove versions from a service, omit them from the --versions parameter. Anyscale terminates any versions not included in the parameter.
The following example removes versions v2 by specifying only v1 and v3:
- CLI
- SDK
anyscale service deploy \
--versions '[{"name":"v1","traffic_percent":50},{"name":"v3","traffic_percent":50}]'
import anyscale
anyscale.service.deploy(
[],
versions='[{"name":"v1","traffic_percent":50},{"name":"v3","traffic_percent":50}]',
name="test_service"
)
Anyscale terminates the omitted versions after reallocating traffic to the remaining versions.
Query a multi-version service
By default, the service endpoint distributes traffic across all versions according to their traffic percentages. You can also send requests to a specific version for testing or debugging.
To query a specific version, add the X-ANYSCALE-VERSION header to your request. The following example queries version v1 of the target service:
curl -H "Authorization: Bearer <auth-token>" \
-H "X-ANYSCALE-VERSION: v1"
"https://your-service.anyscale.com/"
To allow the service to distribute traffic normally, omit the query parameter:
curl -H "Authorization: Bearer <auth-token>" \
"https://your-service.anyscale.com/"
View service status
Use the anyscale service status command to view the state of a multi-version service and its versions.
- CLI
- SDK
anyscale service status -n test_service
import anyscale
status = anyscale.service.status(name="test_service")
print(status.to_dict())
The status output includes the following information for each version:
- Version ID, which shows the version name you specified.
- Current state, such as
RUNNINGorUPDATING. - Traffic weight percentage.
- Creation timestamp.
The following shows example output:
name: test_service
id: service2_uztqb281sq7kx2tg5v6pxw9ktz
state: RUNNING
query_url: https://test-service.cld-kvedzwag2qa8i5bj.s.anyscale.com
creator: user@example.com
query_auth_token: <auth-token>
versions:
- id: m71fqjx4
name: v1
state: RUNNING
weight: 30
created_at: 2025-10-27 18:57:47.953895+00:00
- id: c5nbqwlq
name: v2
state: RUNNING
weight: 30
created_at: 2025-10-27 18:57:47.953895+00:00
- id: cm4muq7g
name: v3
state: RUNNING
weight: 20
created_at: 2025-10-27 18:57:47.953895+00:00
- id: addcp9dv
name: v4
state: RUNNING
weight: 20
created_at: 2025-10-27 18:57:47.953895+00:00
Metrics and observability
The Serve dashboard automatically tags metrics with version names, allowing you to filter and compare metrics across versions. You can view per-version metrics for QPS, latency, and error rates.
To view metrics by version, access the Serve dashboard for your service and use the version filter to select specific versions for comparison.
Update autoscaling configuration
You can update the autoscaling configuration for deployed service versions without redeploying them. See Enable worker group autoscaling in the compute config.
Beta limitations and considerations
The following limitations apply to multi-version services in beta:
- Configuration updates: You can't update the Ray Serve configuration for deployed versions, including applications and HTTP options. To change these settings, deploy a new version. The exception is the autoscaling configuration, which you can update through the Anyscale console.
- Rollback: The
anyscale service rollbackcommand doesn't work with multi-version services. To fix issues, deploy a new version with the corrected configuration. - Version limit: You can deploy up to 10 versions per service.
Multi-version services provide manual control over multiple service versions behind a single endpoint. This behavior is similar to using the --canary-percent parameter to manage how Anyscale routes traffic to two versions during rollouts. See Manually-controlled rollouts
You can't use the --canary-percent parameter with multi-version services. If you specify both --canary-percent and --versions, Anyscale ignores the --canary-percent parameter.