Cloud storage buckets

note

This page only applies to Anyscale-hosted clouds. For accessing cloud storage buckets with self-hosted clouds, see Accessing S3 buckets and Accessing a GCS bucket.

Access default cloud storage

Anyscale provides a default cloud storage path private to each cloud located at $ANYSCALE_ARTIFACT_STORAGE. All nodes launched within your cloud should have access to read/write files at this path. Copy files from your workspace into cloud storage using standard aws s3 cp and gcloud storage cp commands.

Write to Artifact Storage
Read from Artifact Storage

echo "hello world" > /tmp/input.txt
aws s3 cp /tmp/input.txt $ANYSCALE_ARTIFACT_STORAGE/saved.txt

aws s3 cp $ANYSCALE_ARTIFACT_STORAGE/saved.txt /tmp/output.txt
cat /tmp/output.txt

warning

Anyscale scopes permissions on the cloud storage bucket backing $ANYSCALE_ARTIFACT_STORAGE to only provide access to the specified path, so calls made to the root of the underlying bucket (for example HeadObject) may be rejected with an ACCESS_DENIED error. Avoid making calls to any paths that aren't explicitly prefixed with $ANYSCALE_ARTIFACT_STORAGE/.

Access private cloud storage

To access private cloud storage buckets that aren't managed by Anyscale, configure permissions using the patterns below.

AWS S3

Create an IAM role with scoped-down read or write permissions to your existing bucket. See (Amazon docs) for details.
Create a set of long-term credentials associated with that role. See (Amazon docs) for details.
Add the credentials as tracked environment variables under the "Dependencies" tab of a Workspace. Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
Then, use the bucket directly in your Ray app code, or similar.

ds = ray.data.read_parquet("s3://<your-bucket-name>/<path>")

Google Cloud Storage

Create a service account with scoped-down read and/or write permissions to your existing bucket (docs)
Create a set of service account credentials (docs).
Export the service account credentials as a JSON file, and then copy it into the workspace working directory as a JSON file.
Add a path to the credential as tracked environment variables under the "Dependencies" tab of a Workspace, using the variable name GOOGLE_APPLICATION_CREDENTIALS (an example value may be ./google-service-account.json). Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
Then, use the bucket directly in your Ray app code, or similar.

ds = ray.data.read_parquet("gcs://<your-bucket-name>/<path>")

Additional notes

To use private cloud storage buckets to be used concurrently with $ANYSCALE_ARTIFACT_STORAGE, pass the generated access keys or service account into the call to the cloud provider API directly, instead of setting them as process-wide globals.
For large-scale workloads, compute and storage may need to be co-located to avoid large data egress costs. To run compute out of a particular region or provider, contact us.
If your organization prevents you from creating long-term access keys, contact us.

Access default cloud storage​

Access private cloud storage​

AWS S3​

Google Cloud Storage​

Additional notes​

Access default cloud storage

Access private cloud storage

AWS S3

Google Cloud Storage

Additional notes