Skip to main content

Shared storage on Anyscale

Shared storage on Anyscale

This page provides an overview of shared storage on Anyscale.

What is shared storage?

Shared storage describes persistent storage available on all Anyscale workloads in a given Anyscale cloud resource.

You access shared storage through mount paths. Mount paths have different scoping to allow you to reuse paths across workloads reliably.

These paths appear as regular filesystem directories in your clusters.

important

Anyscale doesn't delete data stored in shared storage when workloads terminate. Discussions of persistence or scoping of shared storage paths pertain only to the aliased paths available to a workload. Anyscale doesn't delete the underlying directories or data, and all users in an Anyscale cloud resource have access to all other shared storage paths on that Anyscale cloud resource.

You're responsible for managing and cleaning up data in shared storage paths. If you use legacy shared storage configured with NFS, failure to monitor and clean up data might lead to a No space left on device error.

How does shared storage work?

Shared storage is a virtual file system mapped back to persistent storage in your cloud provider account. Anyscale recommends configuring shared storage on top of cloud object storage, such as S3, Azure blob storage, or GCS.

VM stack

Anyscale automatically configures shared storage for cloud resources on the VM stack. Shared storage is required for VM deployments. Shared storage paths map to the same object storage container configured for system default storage.

Kubernetes stack

Shared storage is optional for cloud resources on Kubernetes (AKS, EKS, GKE). To configure shared storage, use PVC and CSI drivers to mount a cloud object storage container.

For an example using Azure blob storage and blobfuse2, see Configure shared storage with Azure blob PVC for AKS.

Legacy NFS configurations

Some existing Anyscale clouds use Network File System (NFS) for shared storage. Anyscale doesn't recommend NFS for new deployments due to storage capacity limitations and management overhead.

Storage paths and scoping

Each shared storage path maps to a specific directory within your cloud's default object storage location. Anyscale doesn't manage persistence for the data in these directories or provide isolation guarantees. The following table describes the scope and persistence of each path relative to the workload, user, or cloud:

PathScopePath persistenceUse case
/mnt/cluster_storageCluster IDCluster lifecycleTemporary files shared across nodes, TensorBoard logs, downloaded datasets
/mnt/user_storageUser IDAcross all user's clustersPersonal scripts, configuration files, development artifacts
/mnt/shared_storageCloud IDPermanent (until manually deleted)Model checkpoints, shared datasets, team resources
note

For Anyscale workspaces, /mnt/cluster_storage persists between cluster restarts but isn't cloned when creating new workspaces. For jobs and services, each run creates a new cluster, so /mnt/cluster_storage doesn't persist between runs.

When you trigger a job or service from a workspace, that workload has a different /mnt/cluster_storage than the workspace.

Storage path allocation by workload type

You can use the workload ID to find the directory where the /mnt/cluster_storage path maps for each workload. The following table shows the pattern for this directory for each workload type:

Workload typeObject storage path pattern
Workspacesworkspaces/{WORKSPACE_ID}/cluster_storage
Jobsjob/{JOB_ID}/cluster_storage
Job queuesjob_queue/{JOB_QUEUE_ID}
Servicesservice/{SERVICE_ID}

Security considerations

All files in shared storage locations are readable and writable by all users in an Anyscale cloud. Shared storage is optional on cloud resources configured on Kubernetes.

Anyscale doesn't recommend storing sensitive data, credentials, models, or code in shared storage.