Accessing a GCS Bucket
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
This page describes how to both directly interact with a GCS bucket from your Anyscale cluster (running on GCP) and how to configure runtime environments to work with GCS.
Determining the cluster's Service Account
By default, Anyscale clusters run with a cloud-specific Service Account (instructions are here).
If you followed instructions on how to run with a custom Service Account, use that Service Account for the rest of the instructions.
Accessing Google Cloud Storage directly from an Anyscale cluster
To interact with a private Google Cloud Storage Bucket you need both permissions and tooling.
To grant your Service Account (either the Anyscale default Service Account or your own) access to a bucket, follow these instructions: (These instructions come from Google).
- Go the Permissions tab of the bucket
- Click Add.
- Type the Service Account Email as a New principal new principal. If you are using the Anyscale default cloud-specific Service Account, you can find the Service Account Email in the Clouds table on the Configurations page in a column called
Provider Identity
. - Select roles to grant to the Service Account. To give full R/W List access, grant your bucket
Storage Object Admin
andStorage Object Viewer
. - Click Save.
Interacting with your bucket
To interact with a Bucket from the CLI, install gsutil
. Running the following command will install gsutil
on a node:
wget -qO- https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-359.0.0-linux-x86_64.tar.gz | tar xvz
Afterwards, you can interact with gsutil
(for example, to copy a local file to the Bucket) as follows:
./google-cloud-sdk/bin/gsutil cp <file> gs://<bucket>
If you install gs_util
via pip
(as is the case with runtime_environments
), you may need to add the following to ~/.boto
:
[GoogleCompute]
service_account = default
You can create this file by running printf "[GoogleCompute]\nservice_account = default\n" > ~/.boto
Using local directory with Anyscale Jobs and Services (on GCP)
With Anyscale Jobs and Services, you can set the working_dir
option of the runtime_env
to be a local directory. Follow the instructions below on how to set up permissions for accessing your Google Cloud Bucket.
Anyscale will upload your local directory to the specified remote storage and downloaded by the cluster before running the job. External storage allows for clusters to be restarted with your working_dir
after the initial submission.
Instructions
Set up your environment
- Make sure you have
gcloud
installed. If you have a Mac, you can runbrew install --cask google-cloud-sdk
to install, otherwise follow the instructions in the link. - Authenticate your computer with Google to allow uploading to your GCS bucket by running
gcloud auth application-default login
on your local terminal. A browser will open and will prompt you to sign-in with your Google account.
Configure permissions
- Use an existing bucket on Google Cloud or create a new bucket. Your bucket can live in any Google Cloud project.
- Configure
gcloud
to use the same project as your bucket by runninggcloud config set project <PROJECT_ID>
- Follow the directions from above to give your Anyscale Cluster permission to access your GCS bucket. You can use the Provider Identity found in the Clouds table on the Configurations page or your own Service Account.
Run a job or service
- You can upload to your bucket by specifying an
upload_path
and a localworking_dir
in theruntime_env
of your Anyscale Job. You can find theupload_path
for your Google Storage Bucket by navigating to Configuration and finding the row calledgsutil URI
. It should look something likegs://my-bucket
. Theruntime_env
portion of your YAML should look similar to below:
runtime_env:
working_dir: "."
upload_path: "gs://my-test-bucket"