Accessing a GCS Bucket
This page describes how to both directly interact with a GCS bucket from your Anyscale cluster (running on GCP) and how to configure runtime environments to work with GCS.
Accessing Google Cloud Storage Directly from an Anyscale Cluster
To interact with a private Google Cloud Storage Bucket you need both permissions and tooling.
To grant your Service Account (either the Anyscale default Service Account or your own) access to a bucket, follow these instructions: (These instructions come from Google).
- Go the "Permissions" tab of the bucket
- Click "Add"
- Type the Service Account Email as a "New principal". If you are using the Anyscale default cloud-specific Service Account, you can find the Service Account Email in the Clouds table on the Configurations page in a column called
- Select roles to grant to the Service Account. To give full R/W List access, grant your bucket
Storage Object Adminand
Storage Object Viewer.
- Click Save
Interacting with your bucket
To interact with a Bucket from the CLI, install
gsutil. Running the following command will install
gsutil on a node:
wget -qO- https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-359.0.0-linux-x86_64.tar.gz | tar xvz
Afterwards, you can interact with
gsutil (e.g. copy a local file to the Bucket) as follows:
./google-cloud-sdk/bin/gsutil cp <file> gs://<bucket>
If you install
pip (as is the case with
runtime_environments), you may need to add the following to
service_account = default
You can create this file by running
printf "[GoogleCompute]\nservice_account = default\n" > ~/.boto
Using Local Directory with Anyscale Jobs and Services (on GCP)
With Anyscale Jobs and Services, you can set the working_dir option of the runtime_env to be a local directory. Follow the instructions below on how to set up permissions for accessing your Google Cloud Bucket.
Anyscale will upload your local directory to the specified remote storage and downloaded by the cluster before running the job. External storage allows for clusters to be restarted with your working_dir after the initial submission.
Set up your environment
- Make sure you have gcloud installed. If you have a Mac, you can run
brew install --cask google-cloud-sdkto install, otherwise follow the instructions in the link.
- Authenticate your computer with Google to allow uploading to your GCS bucket by running
gcloud auth application-default loginon your local terminal. A browser will open and will prompt you to sign-in with your Google account.
- Use an existing bucket on Google Cloud or create a new bucket. Your bucket can live in any Google Cloud project.
gcloudto use the same project as your bucket by running
gcloud config set project <PROJECT_ID>
- Follow the directions from above to give your Anyscale Cluster permission to access your GCS bucket. You can use the Provider Identity found in the Clouds table on the Configurations page or your own Service Account.
Run a Job or Service
- You can now upload to your bucket by specifying
upload_pathand a local
runtime_envof your Anyscale Job. You can find the upload_path for your Google Storage Bucket by navigating to Configuration and finding the row called
gsutil URI. It should look something like
gs://my-bucket. The runtime_env portion of your yaml should look similar to below: