Skip to main content

Storage on Anyscale

Storage on Anyscale

This page provides an overview of storage on Anyscale.

Storage types

Anyscale provides access to the following types of storage for your workloads:

Storage typeDescriptionPersistenceWhen to use
Local storageHigh-performance block storage attached directly to each node. Access at /mnt/local_storage.Ephemeral. Data is deleted when the cluster terminates.Temporary computation, caching, Ray object spilling. See Local storage on Anyscale.
Shared storagePersistent storage accessible across nodes with aliased paths such as /mnt/cluster_storage and /mnt/shared_storage.Permanent. Data persists until manually deleted.TensorBoard logs, model checkpoints, shared datasets. See Shared storage on Anyscale.
Cloud object storageDirect access to S3, GCS, or Azure blob storage using cloud-native APIs and URIs.Permanent. Managed by your cloud provider.Large datasets (TB+), sensitive data with IAM controls, production artifacts. See Access S3 buckets and Access Google Cloud Storage buckets.

In addition to user-accessible storage, Anyscale requires a default system storage container for platform operations. See Configure default system storage.

important

Anyscale workspaces use the Ray working_dir to provide a virtual filesystem that persists between workloads. Anyscale recommends using this filesystem for code and configuration files.

While it's possible to store data files in this location, large amounts of data impact workspace snapshotting, workspace cloning, and custom template creation. See Files in Anyscale workspaces.

Local storage

Local storage provides high-performance block storage for temporary computation and caching. Each node has its own isolated storage at /mnt/local_storage.

Local storage has the following key characteristics:

  • Fast I/O, no network latency.
  • Ephemeral. Data is deleted when the cluster terminates.
  • Not shared across nodes.

On the VM stack, you can configure block storage settings such as volume size and NVMe in your compute configuration. See Local storage on Anyscale.

Shared storage

Shared storage provides persistent storage accessible across all nodes in your cluster. Anyscale provides aliased paths at different scopes:

  • /mnt/cluster_storage - Scoped to the current cluster.
  • /mnt/user_storage - Scoped to the current user.
  • /mnt/shared_storage - Scoped to all users in the Anyscale cloud.

For path details and scoping behavior, see Shared storage on Anyscale.

warning

All shared storage paths are accessible by all users in your Anyscale cloud. Don't store sensitive data, credentials, or proprietary information in these locations. For secure storage, use cloud object storage with proper IAM controls. See Anyscale cloud IAM mapping.

Shared storage on VMs vs Kubernetes

Shared storage works differently depending on your deployment:

  • VM stack (AWS and Google Cloud): Anyscale automatically configures shared storage using the default system storage bucket. Shared storage is required for all VM deployments.
  • Kubernetes (AKS, EKS, GKE): Shared storage is optional. Configure using PVC and CSI drivers. For an example, see Configure shared storage with Azure blob PVC for AKS.

Cloud object storage

For production data and large datasets, Anyscale recommends connecting directly to cloud object storage:

  • Use separate storage containers for different data classifications, such as development, production, and sensitive or regulated data.
  • Use cloud IAM mapping to implement role-based access. See Anyscale cloud IAM mapping.
  • Stream large datasets directly from object storage rather than copying to local storage.

For cloud-specific configuration, see the following:

Access the default artifact storage path

Every Anyscale cloud has a default object storage bucket for platform operations. Anyscale provides the $ANYSCALE_ARTIFACT_STORAGE environment variable for storing your artifacts while keeping them separate from Anyscale-managed files.

The following environment variables are available:

  • ANYSCALE_ARTIFACT_STORAGE: The URI to a pre-generated path for storing your artifacts.
  • ANYSCALE_CLOUD_STORAGE_BUCKET: The name of the default storage container for the cloud.
  • ANYSCALE_CLOUD_STORAGE_BUCKET_REGION: The region of the default storage container.

The $ANYSCALE_ARTIFACT_STORAGE path format varies by cloud provider:

  • AWS: s3://<bucket_name>/<org_id>/<cloud_id>/artifact_storage/
  • Azure: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<org_id>/<cloud_id>/artifact_storage/
  • Google Cloud: gs://<bucket_name>/<org_id>/<cloud_id>/artifact_storage/
warning

Anyscale scopes permissions on the storage bucket to only provide access to the artifact storage path. Calls to the root of the underlying bucket might be rejected with an ACCESS_DENIED error. Only use paths with the $ANYSCALE_ARTIFACT_STORAGE/ prefix.

Storage configuration for administrators

Administrators configure the following storage settings when deploying Anyscale clouds:

  • Default system storage: An object storage container for platform operations. Required for all deployments.
  • Shared storage: Optional for Kubernetes, automatic for VMs.
  • Block storage defaults: Volume sizes and types for node-local storage.
  • IAM permissions: Access controls for storage resources.

For detailed configuration instructions, see Configure storage for Anyscale.

Storage on Kubernetes

Anyscale clouds deployed on Kubernetes (AKS, EKS, GKE) have different storage characteristics than VM deployments:

  • Local storage: Anyscale uses ephemeral volumes with the disks configured for your machine types. Your Kubernetes administrator controls available storage through instance type configuration and Helm chart settings.
  • Shared storage: Optional. Configure using PVC and CSI drivers to mount cloud object storage. See Configure shared storage with Azure blob PVC for AKS for an example.
  • IAM integration: Varies by cloud provider. See IAM on Anyscale.