Cloud storage buckets
This page only applies to Anyscale-hosted clouds. For accessing cloud storage buckets with self-hosted clouds, see Accessing S3 buckets and Accessing a GCS bucket.
Access default cloud storage
Anyscale provides a default cloud storage path private to each cloud located at $ANYSCALE_ARTIFACT_STORAGE
. All nodes launched within your cloud should have access to read/write files at this path. Copy files from your workspace into cloud storage using standard aws s3 cp
and gcloud storage cp
commands.
- Write to Artifact Storage
- Read from Artifact Storage
echo "hello world" > /tmp/input.txt
aws s3 cp /tmp/input.txt $ANYSCALE_ARTIFACT_STORAGE/saved.txt
aws s3 cp $ANYSCALE_ARTIFACT_STORAGE/saved.txt /tmp/output.txt
cat /tmp/output.txt
Anyscale scopes permissions on the cloud storage bucket backing $ANYSCALE_ARTIFACT_STORAGE
to only provide access to the specified path, so calls made to the root of the underlying bucket (for example HeadObject
) may be rejected with an ACCESS_DENIED
error. Avoid making calls to any paths that aren't explicitly prefixed with $ANYSCALE_ARTIFACT_STORAGE/
.
Access private cloud storage
To access private cloud storage buckets that aren't managed by Anyscale, configure permissions using the patterns below.
AWS S3
- Create an IAM role with scoped-down read or write permissions to your existing bucket. See (Amazon docs) for details.
- Create a set of long-term credentials associated with that role. See (Amazon docs) for details.
- Add the credentials as tracked environment variables under the "Dependencies" tab of a Workspace. Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services.
- Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("s3://<your-bucket-name>/<path>")
Google Cloud Storage
- Create a service account with scoped-down read and/or write permissions to your existing bucket (docs)
- Create a set of service account credentials (docs).
- Export the service account credentials as a JSON file, and then copy it into the workspace working directory as a JSON file.
- Add a path to the credential as tracked environment variables under the "Dependencies" tab of a Workspace, using the variable name
GOOGLE_APPLICATION_CREDENTIALS
(an example value may be./google-service-account.json
). Anyscale automatically propagates them to all Ray workloads run through the Workspace, including Jobs and Services. - Then, use the bucket directly in your Ray app code, or similar.
ds = ray.data.read_parquet("gcs://<your-bucket-name>/<path>")
Additional notes
- To use private cloud storage buckets to be used concurrently with
$ANYSCALE_ARTIFACT_STORAGE
, pass the generated access keys or service account into the call to the cloud provider API directly, instead of setting them as process-wide globals. - For large-scale workloads, compute and storage may need to be co-located to avoid large data egress costs. To run compute out of a particular region or provider, contact us.
- If your organization prevents you from creating long-term access keys, contact us.