Skip to main content

Thinking in Ray and Anyscale

If you are used to single-node development and not familiar with distributed computing or ML, Anyscale and Ray gets you started with some basic abstractions to keep in mind. This guide walks through the high-level thought process of building a scalable AI or Python app with Ray and Anyscale.

1. Understand the differences between single-node and multi-node apps

When your code runs on machines different from the one you develop on, keep the following differences in mind:

  • Dependency management: you need to define the dependencies that need to be available on all the machines that run your code.
  • Cluster shape: you need to define the cluster that runs your code, including the number and the type of machines, storage, network, etc.
  • Access to data and other resources: the entire cluster needs to access the data, container images, secrets, and other resources.
  • Running your code: you need to think about how to run your code in parallel on multiple machines to access Ray's greatest value.
  • Monitoring and debugging: Issues may happen on any machine in the cluster and familiar tools like debugger may not work as expected.

2. Get familiar with Ray and Ray libraries

Ray and Ray libraries allow you to easily develop or convert your single-node apps to distributed apps that run on multiple machines. Ray Core provides the core programming primitives for a distributed app. Ray libraries are built on top of Ray Core and provide higher-level APIs for different workloads, including but not limited to data processing, training, inference, RL, etc.

See the links below for more information:

  • Ray documentation to get familiar with Ray Core or Ray libraries based on your needs.
  • Examples for how to use Ray for different workloads.

3. Choose your preferred setup and workflows

Anyscale manages the underlying clusters so that you can focus on development. For maximum productivity, follow the steps below to determine your own preferred workflows.

Step 1: Choose your development environment

Option 1: Use the Anyscale workspace development environment

Anyscale Workspaces provides an optimized development environment on top of a Ray cluster.

  • Develop in the cloud IDEs (JupyterLab and VS Code) or connect to your preferred IDE through SSH
  • Iterate on your code, dependencies, and compute requirements directly on a cluster
  • Live debug with Ray dashboard, Ray debugger, etc.

Option 2: Use your own development environment

Develop Ray apps in your own development environment (laptop, remote VMs, etc.) and seamlessly scale them to run on clusters with Anyscale Jobs or Anyscale Services.

  • Run your apps with simple YAML files and CLI.
  • Reuse a warm cluster or start a new one with fast startup time.
  • Deploy to production with the exact same code. Just point to a different environment.
Step 2: Install dependencies

The following are some tasks you may need to perform for dependency management. See the dependency management guide for how to configure them.

  • Build or pull container images.
  • Load additional dependencies or artifacts, like git repo, model, etc., during cluster start.
  • Iterate on dependencies after a cluster starts.
Step 3: Define cluster shape

A Ray cluster consists of a head node and multiple worker nodes. The head node controls the cluster and schedules all the resources defined in your driver script, the main Python file that starts the app. Worker nodes run your Ray apps.

On Anyscale, you define the shape of Ray cluster in a few ways:

  • Create a compute config upfront and use it directly for your workspaces, jobs, and services.
  • Provide the compute configurations inline when creating or updating workspaces, jobs, and services.
  • Modify the nodes in the cluster sidebar in workspace UI.

You can turn on a beta feature "auto-select worker nodes" in compute configuration. Instead of requiring predefined node groups, this setting allows Anyscale to automatically scale up and down the instances based on the resource requests from your Ray apps.

Step 4: Access your data and other resources

You might need help from your infra or platform team to configure access to your data and other resources. The following are few common patterns:

  • Grant the cluster access to your resources with IAM roles or services accounts:

    • object storage like S3, GCS, etc.
    • cloud provider's secret managers
    • other services from your cloud providers
  • (Not recommended) Store credentials in your code, or on a cluster or other storage accessible by the cluster.

    • Users of Anyscale Hosted cloud may have to use this approachfor now until an Anyscale-provided secret manager is available. If it doesn't work for you, contact Anyscale support to get help with accessing your private resources with IAM roles or services accounts.
Step 5: Get familiar with observability tools

Learn and leverage the following tools to monitor and debug your apps:

4. Move your apps to production

Some steps below will be covered in the guide for admins.

If you want to automate your pipelines:

  • choose the right Anyscale features for your pipeline, including jobs, job schedules, job queues, and services.
  • (you may need to) set up Anyscale service accounts for authentication
  • integrate into your CI/CD pipelines or other orchestration tools with Anyscale CLI/SDK/API

If you want to move your apps into dedicated production environment, follow the steps below:

  • Set up the dedicated production environment, which is usually a different Anyscale Cloud.
  • Make necessary changes in case things like data location is different
  • Modify your code or config files to point to that cloud.