Storage
File storage covered in the previous section is optimal for storing your code. However, AI workloads often need access to large amounts of data, whether it's data for training and fine tuning, or a common storage to store model checkpoints. The options below cover different use cases.
Types of storage options
NVMe support
/mnt/local_storage
- Non-Volatile Memory Express (NVMe) interface to access Solid State Drive (SSD) storage volumes. It provides additional temporary storage to the Node's root disk/volume. This enables higher performance, lower latency, scalability, and support for versatile use cases across a variety of workloads. For instance types that don't have NVMe, /mnt/local_storage falls back to the root disk/volume. Learn more about how to configure NVMe support.
Storage shared across nodes
Anyscale mounts a Network File System (NFS) system automatically on the Workspace/Job/Service clusters. If you are using a dedicated cloud, you can learn more about how to further configure the shared storage on dedicated clouds.
NFS storage is accessible by all users on your Anyscale Hosted Cloud. Don't put any sensitive data or secrets that you don't want users in your cloud to access.
Cluster storage
/mnt/cluster_storage
- this is a directory on NFS mounted on every node of the Workspace cluster and persisted throughout the lifecycle of the Workspace. It's great for storing files that need to be accessible to the Workspace/Head node and all the Worker nodes. For example:
- TensorBoard logs.
- Common data files that need to be accessed from all Workers using a stable path.
The cluster storage is not cloned when you clone a Workspace.
User storage
/mnt/user_storage
- this is a directory on NFS specific to the Anyscale user and accessible from every Node of the Workspaces, Jobs, and Services Clusters created by the user. It's great for storing files you need to use with multiple Anyscale features.
Shared storage
/mnt/shared_storage
- this is a directory on NFS accessible to all Anyscale users of the same Anyscale Hosted Cloud. It's mounted on every node of all the Anyscale Clusters in the same Cloud. It's great for storing model checkpoints and other artifacts that you want to share with your team.
Object storage (S3 or GCS buckets)
For every Anyscale Hosted Cloud, a default object storage bucket is configured during the Cloud deployment. All the Workspaces, Jobs, and Services Clusters within an Anyscale Hosted Cloud have permission to read and write to its default bucket.
Anyscale writes system- or user-generated files (for example, log files) to this bucket. Don't delete or edit the Anyscale-managed files, which may lead to unexpected data loss. Use $ANYSCALE_ARTIFACT_STORAGE
to separate your files from Anyscale-generated ones.
Use the following environment variables to access the default bucket:
ANYSCALE_CLOUD_STORAGE_BUCKET
: the name of the bucket.ANYSCALE_CLOUD_STORAGE_BUCKET_REGION
: the region of the bucket.ANYSCALE_ARTIFACT_STORAGE
: the URI to the pre-generated folder for storing your artifacts while keeping them separate from Anyscale-generated ones.- AWS:
s3://<org_id>/<cloud_id>/artifact_storage/
- GCP:
gs://<org_id>/<cloud_id>/artifact_storage/
- AWS:
For Anyscale Hosted Cloud we offer 100 GB of free storage, if you require more storage you can contact sales.
Local storage for a node (customer-hosted clouds only)
Each node has its own volume and disk that is not shared with other nodes. Learn more about how to configure the local storage.
How to choose the storage
The choice depends on the expected performance, file sizes, collaboration needs, security requirements, etc.
- NFS is generally slower than the Workspace/Head node Storage.
- DO NOT put large files like datasets at terabyte scale in NFS storage. Use object storage (like an S3 bucket) for large files (more than 10 GB).
- If you want to share small files across different Workspaces, Jobs, or even Services, user and shared storage are good options.
NFS storage usually has connection limits. Different Cloud Providers may have different limits. You can consult the documentation on limits for more information or contact the support team if you need assistance.
To increase the capacity of GCP Filestore instances, Consult the GCP documentation for more information.
Configure storage for ML workloads
When running ML training or tuning workloads in Workspaces, it may be useful to store your training or tuning results in /mnt/cluster_storage
or /mnt/shared_storage
by setting the storage_path
(example). This allows the results to be persisted across time and even shared with other collaborators.
Accessing storage
Upload and download files
- The easiest way to transfer small files to and from your Workspace is to use the VS Code Web UI.
- You can also commit files to git and run
git pull
from the Workspace. - For large files, the best practice is to use object storage (for example, Amazon S3 or Google Cloud Storage) and access the data from there.
How to upload data to object storage (Anyscale hosted clouds)?
The recommended option is to download your data to your Workspace Node then upload it to the artifact storage.
For example:
$ curl -o /tmp/filename.zip https://example.com/path/to/file.zip
$ aws s3 cp /tmp/filename.zip $ANYSCALE_ARTIFACT_STORAGE/filename.zip
How to access data from a personal/company S3/GCS storage from an Anyscale hosted cloud?
Similar to how you would access your data on your local laptop, Anyscale recommends generating credentials and using those to either download the data locally or access it programmatically.
If you want to do this manually for one time, you can download locally and then similar to the preceding instruction upload to the Object Storage or NFS. For example:
# set your credentials
$ export AWS_ACCESS_KEY_ID="<my-key>"
$ export AWS_SECRET_ACCESS_KEY="<my-key>"
$ export AWS_SESSION_TOKEN="<my-token>"
# download your file
$ aws s3 cp s3://my-bucket/filename.zip .
# unset the credentials
$ unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
# Copy to the Anyscale Hosted Cloud object storage
$ aws s3 cp /tmp/filename.zip $ANYSCALE_ARTIFACT_STORAGE/filename.zip
If your code is downloading this data directly from your object storage, you can inject your environment variable using Ray Runtime Environments.
ray.init(
runtime_env={
"env_vars": {
"AWS_ACCESS_KEY_ID": "<my-key>",
"AWS_SECRET_ACCESS_KEY": "<my-key>",
"AWS_SESSION_TOKEN": "<my-token>",
}
}
)