Skip to main content
Version: Latest

Exporting logs with Vector

Check your docs version

These docs are for the new Anyscale design. If you started using Anyscale before April 2024, use Version 1.0.0 of the docs. If you're transitioning to Anyscale Preview, see the guide for how to migrate.

Anyscale integrates with Vector, a monitoring tool that supports custom querying, filtering, and alerting for logs and metrics.

Step 0: Requirements

  • Vector is a tool for building observability pipelines. It enables you to collect, transform, and route data to and from a variety of sources. In this guide, we define Vector configuration files to scrape Ray logs and metrics and export them to your desired location.
  • SupervisorD is a process control system. In this guide, we use SupervisorD to run the Vector process when the cluster starts.

Step 1: Write a Vector configuration file

A Vector configuration file is a directed graph, consisting of one or more sources, transforms, or sinks.

We recommend using an Anyscale Workspace to write and test a configuration file. Open VS Code and create a vector.yaml file. Paste in the following sample configurations:

Source/transform configuration

vector.yaml
sources:
raw_ray_logs:
type: file
fingerprint:
ignored_header_bytes: 0
strategy: device_and_inode
include:
- /tmp/ray/*/logs/**/job-driver-*.*
- /tmp/ray/*/logs/**/runtime_env_setup-*.*
- /tmp/ray/*/logs/**/worker-*.out
- /tmp/ray/*/logs/**/worker-*.err
- /tmp/ray/*/logs/**/serve/*.*
exclude:
# The session_latest directory is a symlink to an actual session directory,
# so we intentionally exclude it here so Vector doesn't ingest duplicates.
- /tmp/ray/session_latest/logs/**/*.*
raw_ray_metrics:
type: prometheus_scrape
endpoints:
- "${ANYSCALE_RAY_METRICS_ENDPOINT}"
instance_tag: ScrapeTarget
scrape_interval_secs: 15

# These transforms add useful attributes to your log files. To use other environment variables,
# see https://docs.anyscale.com/1.0.0/reference/environment-variables/ for all available options.
transforms:
ray_logs:
type: remap
inputs: ["raw_ray_logs"]
source: |-
.cluster_id = "${ANYSCALE_CLUSTER_ID}"
.instance_id = "${ANYSCALE_INSTANCE_ID}"
.node_ip = "${ANYSCALE_NODE_IP}"
ray_metrics:
type: remap
inputs: ["raw_ray_metrics"]
source: |-
.tags.cluster_id = "${ANYSCALE_CLUSTER_ID}"
.tags.instance_id = "${ANYSCALE_INSTANCE_ID}"
.tags.node_ip = "${ANYSCALE_NODE_IP}"
.tags = compact(.tags, recursive: true)

Sink configuration

A sink is a destination for your observability data. Choose from AWS CloudWatch, GCP Cloud Monitoring, or Datadog and add it to your vector.yaml.

warning

This section is applicable only to customer-hosted clouds. If you are using Anyscale-hosted clouds and would like to ship logs to CloudWatch, reach out to preview-help@anyscale.com for more information.

AWS CloudWatch requires additional access for the Cluster IAM role. This can be modified in the AWS IAM Console. Make sure to replace YOUR_ACCOUNT_ID with your AWS Account ID.

IAM Cloudwatch Policy
{
"Statement": [
{
"Action": "cloudwatch:PutMetricData",
"Effect": "Allow",
"Resource": "*",
"Sid": "CloudwatchMetricsWrite"
},
{
"Action": ["logs:DescribeLogStreams", "logs:DescribeLogGroups"],
"Effect": "Allow",
"Resource": "*",
"Sid": "CloudwatchLogsRead"
},
{
"Action": "logs:PutLogEvents",
"Effect": "Allow",
"Resource": "arn:aws:logs:*:YOUR_ACCOUNT_ID:log-group:/anyscale*:*",
"Sid": "CloudwatchLogsEventsWrite"
},
{
"Action": ["logs:CreateLogStream", "logs:CreateLogGroup"],
"Effect": "Allow",
"Resource": "arn:aws:logs:*:YOUR_ACCOUNT_ID:log-group:/anyscale*",
"Sid": "CloudwatchLogsWrite"
}
],
"Version": "2012-10-17"
}

Once the IAM Role has been updated, update vector.yaml to include a sink section as follows:

vector.yaml
sinks:
cloudwatch_logs:
region: us-west-2
encoding:
codec: json
group_name: "/anyscale/"
inputs: ["ray_logs"]
# One of ANYSCALE_PRODJOB_ID / ANYSCALE_SERVICE_ID will be set for jobs / services.
stream_name: "${ANYSCALE_PRODJOB_ID}${ANYSCALE_SERVICE_ID}/${ANYSCALE_SESSION_ID}"
type: aws_cloudwatch_logs
cloudwatch_metrics:
region: us-west-2
default_namespace: anyscale
inputs: ["ray_metrics"]
type: aws_cloudwatch_metrics

Step 2: Test the configuration file

With the Vector configuration file saved as vector.yaml, run the following commands:

# Install Vector.
sudo apt-get install curl -y
curl --proto '=https' --tlsv1.2 -sSfL https://sh.vector.dev | bash
source /home/ray/.profile

# Create a state directory for Vector & make it accessible.
sudo mkdir -p /var/lib/vector/
sudo chmod 777 /var/lib/vector/

# Run Vector
vector --config vector.yaml

# In a new tab, generate fake log content.
mkdir -p /tmp/ray/session_fake/logs/
for i in {1..5000}; do echo "Log Line $i" >> /tmp/ray/session_fake/logs/job-driver-fake.log && echo "Wrote line $i" && sleep 1; done

# Look for warnings / errors in Vector - if you don't see any, check upstreams to see if logs & metrics are being received.

Step 3: Move to production

To move to production, we must build a SupervisorD file to configure the Vector process to run on cluster startup and in a process manager to restart on failure. Create a file named supervisord.conf in the same workspace you used previously.

supervisord.conf
[program:vector]
user=ray
command=bash --login -c -i "sudo -E /home/ray/.vector/bin/vector --config=/etc/vector/vector.yaml"
autostart=true
autorestart=true
startsecs=0
startretries=50
stdout_logfile=/tmp/ray/vector.log
redirect_stderr=true

Then, follow the instructions below to package both of these configuration files into a Ray container image.

  1. On your laptop (or wherever you build your Dockerfile), change directory into the directory with your Dockerfile in it.
  2. Copy vector.yaml from your workspace into this directory.
  3. Copy supervisord.conf from your workspace into this directory.
  4. Add the following lines to your Dockerfile.
# Install Vector.
RUN curl --proto '=https' --tlsv1.2 -sSfL https://sh.vector.dev | bash -s -- -y

# Write the Vector config.
RUN sudo mkdir -p /etc/vector/
RUN chmod 777 /etc/vector/
COPY vector.yaml /etc/vector/vector.yaml

# Write the SupervisorD config.
RUN sudo mkdir -p /etc/supervisor/customer.conf.d/
RUN chmod 777 /etc/supervisor/customer.conf.d/
COPY supervisord.conf /etc/supervisor/customer.conf.d/vector.conf
  1. Build and push your Docker image, and start an Anyscale Job or Service using that image.