Skip to main content

Storage and file management

Services provides different storage options optimized for different use cases, such as speed, level of access, and level of persistence.

Types of storage options

Local storage for a Node

Each node has its own volume and disk that is not shared with other nodes. Learn more about how to configure the local storage.

NVMe support

/mnt/local_storage - Non-Volatile Memory Express (NVMe) interface to access SSD storage volumes. It provides additional temporary storage to the Node's root disk/volume. This enables higher performance, lower latency, scalability and support for versatile use cases across a variety of workloads. For instance types that do not have NVMe, /mnt/local_storage just falls back to the root disk/volume.

Storage shared across Nodes

A NFS system is set up and mounted automatically on the Workspace/Job/Service Clusters. Learn more about how to configure the shared storage.

Cluster storage

/mnt/cluster_storage - this is a NFS (private directory on EFS) mounted on every node of the Services cluster and persisted throughout the lifecycle of the Services. It is great for storing files that need to be accessible to the Head node and all the Worker nodes before the Services get terminated.

note

The cluster storage will not be cloned when you clone a Service as a Workspace. Currently, we don't automatically clean up cluster storage for Services Clusters. In the future, we may add cleanup mechanism in order to free up the storage and reduce the cost for users.

User storage

/mnt/user_storage - this is a NFS (private directory on EFS) private to the Anyscale user and accessible from every Node of the Workspaces, Jobs and Services Clusters created by the user. It's great for storing files you need to use with multiple Anyscale features.s

Shared storage

/mnt/shared_storage - this is a NFS (private directory on EFS) accessible to all the Anyscale users of the same Anyscale Cloud. It is mounted on every node of all the Workspaces, Jobs and Services Clusters in the same Cloud. It is great for storing model checkpoints and other artifacts that you want to share with your team.

Object Storage (S3 or GCS buckets)

For every Anyscale Cloud, a default object storage bucket is configured during the Cloud deployment. All the Workspaces, Jobs, and Services Clusters within an Anyscale Cloud have permission to read and write to its default bucket.

caution

Anyscale writes system- or user-generated files (for example, log files) to this bucket. Do not delete or edit the Anyscale-managed files. Use $ANYSCALE_ARTIFACT_STORAGE to separate your files from Anyscale-generated files.

Use the following environment variables to access the default bucket:

  • ANYSCALE_CLOUD_STORAGE_BUCKET: the name of the bucket.
  • ANYSCALE_CLOUD_STORAGE_BUCKET_REGION: the region of the bucket.
  • ANYSCALE_ARTIFACT_STORAGE: the URI to the pre-generated folder for storing your artifacts while keeping them separate them from Anyscale-generated ones.
    • AWS: s3://<org_id>/<cloud_id>/artifact_storage/
    • GCP: gs://<org_id>/<cloud_id>/artifact_storage/

How to choose the storage

The choice depends on the expected performance, file sizes, collaboration needs, security requirements, etc.

  • DO NOT put large files like datasets at terabyte scale in NFS storage. Use object storage (like an S3 bucket) for large files (more than 10 GB).
  • If you want to share small files across different Workspaces, Jobs, or even Services, user and shared storage are good options.
note

NFS storage usually has connection limits. Different Cloud Providers may have different limits. Check out the limits and contact support team if you need assistance.