Skip to main content
Version: Latest

Cloud storage buckets

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

note

This page only applies to Anyscale-hosted clouds. For accessing cloud storage buckets with self-hosted clouds, see Accessing S3 buckets and Accessing a GCS bucket.

Access default cloud storage

Anyscale provides a default cloud storage path private to each cloud located at $ANYSCALE_ARTIFACT_STORAGE. All nodes launched within your cloud should have access to read/write files at this path. Copy files from your workspace into cloud storage using standard aws s3 cp and gsutil cp commands.

echo "hello world" > /tmp/input.txt
aws s3 cp /tmp/input.txt $ANYSCALE_ARTIFACT_STORAGE/saved.txt
warning

Permissions on the cloud storage bucket backing $ANYSCALE_ARTIFACT_STORAGE are scoped to only provide access to the specified path, so calls made to the root of the underlying bucket (e.g. HeadObject) may be rejected with an ACCESS_DENIED error. Avoid making calls to any paths that are not explicitly prefixed with $ANYSCALE_ARTIFACT_STORAGE/.

Access private cloud storage

To access private cloud storage buckets that aren't managed by Anyscale, configure permissions using the patterns below.

AWS S3

  • Create an IAM role with scoped-down read or write permissions to your existing bucket. See (Amazon docs) for details.
  • Create a set of long-term credentials associated with that role. See (Amazon docs) for details.
  • Add the credentials as tracked environment variables under the "Dependencies" tab of a Workspace. Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
  • Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("s3://<your-bucket-name>/<path>")

Google Cloud Storage

  • Create a service account with scoped-down read and/or write permissions to your existing bucket (docs)
  • Create a set of service account credentials (docs).
  • Export the service account credentials as a JSON file, and then copy it into your Workspace's working directory as a JSON file.
  • Add a path to the credential as tracked environment variables under the "Dependencies" tab of a Workspace, using the variable name GOOGLE_APPLICATION_CREDENTIALS (an example value may be ./google-service-account.json). Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
  • Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("gcs://<your-bucket-name>/<path>")

Additional notes

  • To use private cloud storage buckets to be used concurrently with $ANYSCALE_ARTIFACT_STORAGE, pass the generated access keys or service account into the call to the cloud provider API directly, instead of setting them as process-wide globals.
  • For large-scale workloads, compute and storage may need to be co-located to avoid large data egress costs. To run compute out of a particular region or provider, contact us.
  • If your organization prevents you from creating long-term access keys, contact us.