Skip to main content
Version: Canary 🐤

Cloud storage buckets

note

This page only applies to Anyscale-hosted clouds. For accessing cloud storage buckets with self-hosted clouds, see Accessing S3 buckets and Accessing a GCS bucket.

Access default cloud storage

Anyscale provides a default cloud storage path private to each cloud located at $ANYSCALE_ARTIFACT_STORAGE. All nodes launched within your cloud should have access to read/write files at this path. Copy files from your workspace into cloud storage using standard aws s3 cp and gsutil cp commands.

echo "hello world" > /tmp/input.txt
aws s3 cp /tmp/input.txt $ANYSCALE_ARTIFACT_STORAGE/saved.txt
warning

Permissions on the cloud storage bucket backing $ANYSCALE_ARTIFACT_STORAGE are scoped to only provide access to the specified path, so calls made to the root of the underlying bucket (e.g. HeadObject) may be rejected with an ACCESS_DENIED error. Avoid making calls to any paths that are not explicitly prefixed with $ANYSCALE_ARTIFACT_STORAGE/.

Access private cloud storage

To access private cloud storage buckets that aren't managed by Anyscale, configure permissions using the patterns below.

AWS S3

  • Create an IAM role with scoped-down read or write permissions to your existing bucket. See (Amazon docs) for details.
  • Create a set of long-term credentials associated with that role. See (Amazon docs) for details.
  • Add the credentials as tracked environment variables under the "Dependencies" tab of a Workspace. Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
  • Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("s3://<your-bucket-name>/<path>")

Google Cloud Storage

  • Create a service account with scoped-down read and/or write permissions to your existing bucket (docs)
  • Create a set of service account credentials (docs).
  • Export the service account credentials as a JSON file, and then copy it into your Workspace's working directory as a JSON file.
  • Add a path to the credential as tracked environment variables under the "Dependencies" tab of a Workspace, using the variable name GOOGLE_APPLICATION_CREDENTIALS (an example value may be ./google-service-account.json). Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
  • Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("gcs://<your-bucket-name>/<path>")

Additional notes

  • To use private cloud storage buckets to be used concurrently with $ANYSCALE_ARTIFACT_STORAGE, pass the generated access keys or service account into the call to the cloud provider API directly, instead of setting them as process-wide globals.
  • For large-scale workloads, compute and storage may need to be co-located to avoid large data egress costs. To run compute out of a particular region or provider, contact us.
  • If your organization prevents you from creating long-term access keys, contact us.