Mount user-managed NFS storage

This page describes how to mount an existing NFS or NFS-compatible file share on Anyscale workloads running on Kubernetes-backed clouds. User-managed NFS is distinct from Anyscale shared storage. For the Anyscale-managed shared storage path, see Shared storage on Anyscale.

When to use user-managed NFS

User-managed NFS is useful when you already have an NFS server or NFS-compatible file share that your organization maintains outside of Anyscale, such as any of the following:

An on-premises NFS appliance reachable from your Anyscale cloud network.
A cloud-provider managed file service such as AWS EFS, Google Cloud Filestore, Azure Files with the NFS protocol, or Azure NetApp Files.
A self-managed NFS server running on a dedicated VM.

For new Anyscale deployments that don't have an existing NFS requirement, Anyscale recommends using cloud object storage for large datasets and Anyscale shared storage for cross-workload collaboration. See Storage on Anyscale for an overview of storage options.

How NFS mounting works

On Kubernetes, Anyscale applies compute configs as a strategic merge patch against the pod specifications that Anyscale generates. You add volumes and volumeMounts entries to the advanced_instance_config.spec section of your compute config, and Anyscale merges them into the Ray pod. See Compute configuration options for Kubernetes for the full compute config reference.

You can mount NFS using either of the following patterns:

PVC-backed NFS references a PersistentVolumeClaim that your Kubernetes administrator provisions with a Container Storage Interface (CSI) driver. The compute config only references the PVC by name.
Direct NFS volume uses the Kubernetes nfs volume type to point at an NFS server by hostname and export path. No CSI driver is required.

PVC-backed mounts are Anyscale's recommended pattern because they match how other Kubernetes storage integrations work and keep cloud-provider specifics out of the compute config.

Prerequisites

Before you mount NFS on your Anyscale workloads, make sure the following are in place:

Your Anyscale cloud is deployed on Kubernetes. See Compute configuration options for Kubernetes.
Your Kubernetes worker nodes can reach the NFS server over the network on the NFS port, typically TCP 2049.
For direct NFS mounts, the Kubernetes nodes have NFS client support. Most managed Kubernetes distributions include this by default.
For PVC-backed mounts, your Kubernetes administrator has created a PersistentVolume and PersistentVolumeClaim in the namespace where Anyscale schedules pods.

Mount NFS using a PVC

Use this pattern when your Kubernetes administrator manages NFS access through a CSI driver. The cluster administrator creates the PersistentVolume and PersistentVolumeClaim. You reference the PVC by name in your compute config.

The following compute config mounts a PVC named nfs-shared at /mnt/nfs inside the Ray container:

compute_config:
  cloud: <your-cloud-name>
  head_node:
    instance_type: 4CPU-16GB
  worker_nodes:
    - instance_type: 4CPU-16GB
      min_nodes: 1
      max_nodes: 10
  advanced_instance_config:
    spec:
      containers:
        - name: ray
          volumeMounts:
            - name: nfs-shared
              mountPath: /mnt/nfs
      volumes:
        - name: nfs-shared
          persistentVolumeClaim:
            claimName: <your-pvc-name>

The provider-specific work happens in your Kubernetes cluster's StorageClass and PersistentVolume definitions. The following list points to the relevant cloud-provider CSI drivers:

AWS EKS with Amazon EFS uses the EFS CSI driver.
Google Cloud GKE with Filestore uses the Filestore CSI driver.
Azure AKS with Azure Files NFS uses the Azure Files CSI driver with the NFS protocol, or the Azure NetApp Files CSI driver for NetApp-backed shares.

For a complete example of configuring a PVC for AKS, see Configure shared storage with Azure blob PVC for AKS. That page uses blobfuse for object storage, but the PVC wiring into the compute config follows the same pattern.

Mount NFS directly

Use this pattern when you want to mount an NFS server without creating a PersistentVolume in Kubernetes. You specify the NFS server and export path directly in the compute config.

caution

Direct NFS mounts work when Kubernetes worker nodes can reach the NFS server over the network and the Kubelet has NFS client support. If the mount fails at pod startup, the pod stays in a ContainerCreating state and Anyscale can't schedule Ray on it.

The following compute config mounts an NFS export at /mnt/nfs inside the Ray container:

compute_config:
  cloud: <your-cloud-name>
  head_node:
    instance_type: 4CPU-16GB
  worker_nodes:
    - instance_type: 4CPU-16GB
      min_nodes: 1
      max_nodes: 10
  advanced_instance_config:
    spec:
      containers:
        - name: ray
          volumeMounts:
            - name: nfs-shared
              mountPath: /mnt/nfs
      volumes:
        - name: nfs-shared
          nfs:
            server: <nfs-server-hostname-or-ip>
            path: /<export-path>
            readOnly: false

Set readOnly: true if your workloads only need to read from the share.

Requirements and limitations

Anyscale doesn't manage the lifecycle of user-managed NFS shares. You're responsible for provisioning, backup, access control, and decommissioning.
NFS performance depends on the server, the network path, and the mount options you configure. Anyscale recommends against using NFS for datasets larger than 10 GB or for workloads that generate high disk I/O. See Storage on Anyscale for guidance on choosing storage.
Mount failures at pod startup prevent Ray from starting on the affected node.

When to use user-managed NFS​

How NFS mounting works​

Prerequisites​

Mount NFS using a PVC​

Mount NFS directly​

Requirements and limitations​