Skip to main content

Structured logging

Anyscale enables users to configure the Python logging library to output logs in a structured JSON format. This setup standardizes log entries, making them easier to handle.

Pre-requisites

Use Anyscale Runtime version 2.30 or higher (version 2.32 or higher is recommended).

note

When using Anyscale runtime version 2.36 or higher, logs automatically output with JSON formatting and include additional metadata. If you have set up log exporting with Vector, make sure your transformations are compatible with JSON logs.

API (alpha)

Method 1: Configure structured logging with ray.init

ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)

Users can configure the following parameters:

  • encoding: The encoding format for the logs. The default is TEXT for plain text logs. The other option is JSON for structured logs. In both TEXT and JSON encoding formats, the logs include Ray-specific fields such as job_id, worker_id, node_id, actor_id, and task_id, if available.

  • log_level: The log level for the driver process. The default is INFO. Available log levels are defined in the Python logging library.

When you set up logging_config in ray.init, it configures the root loggers for the driver process, Ray actors, and Ray tasks.

note

The log_to_driver parameter is set to False to disable logging to the driver process as the redirected logs to the driver will include prefixes that made the logs not JSON parsable.

Method 2: Configure structured logging with an environment variable (Anyscale Runtime 2.32 or higher)

You can configure the RAY_LOGGING_CONFIG_ENCODING environment variable to set the encoding format for the logs. You can set the value to TEXT or JSON. Note that the environment variable needs to be set before import ray.

import os
os.environ["RAY_LOGGING_CONFIG_ENCODING"] = "JSON"

import ray
import logging

ray.init(log_to_driver=False)
# Use the root logger to print log messages.

Example

The following example configures the LoggingConfig to output logs in a structured JSON format and set the log level to INFO. It then logs messages with the root loggers in the driver process, Ray tasks, and Ray actors. The logs include Ray-specific fields such as job_id, worker_id, node_id, actor_id, and task_id.

import ray
import logging

ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)

def init_logger():
"""Get the root logger"""
return logging.getLogger()

logger = logging.getLogger()
logger.info("Driver process")

@ray.remote
def f():
logger = init_logger()
logger.info("A Ray task")

@ray.remote
class actor:
def print_message(self):
logger = init_logger()
logger.info("A Ray actor")

task_obj_ref = f.remote()
ray.get(task_obj_ref)

actor_instance = actor.remote()
ray.get(actor_instance.print_message.remote())

"""
{"asctime": "2024-07-15 19:06:06,469", "levelname": "INFO", "message": "Driver process", "filename": "test.py", "lineno": 12, "job_id": "03000000", "worker_id": "03000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9"}
(f pid=4871) {"asctime": "2024-07-15 19:06:07,435", "levelname": "INFO", "message": "A Ray task", "filename": "test.py", "lineno": 17, "job_id": "03000000", "worker_id": "f8f84d811683e5d9e03744a4386b26a5cd6f6ca09fc5cdc8e1dbe5a3", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9", "task_id": "fa31b89f94899135ffffffffffffffffffffffff03000000"}
(actor pid=4939) {"asctime": "2024-07-15 19:06:08,700", "levelname": "INFO", "message": "A Ray actor", "filename": "test.py", "lineno": 23, "job_id": "03000000", "worker_id": "51d62f87e3867cdcad9aecd7b431068ea433b3104c8cc4ed1db6eef7", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9", "actor_id": "4a03b12afe5598a00eadcf9503000000", "task_id": "0ab01f2d6283d7194a03b12afe5598a00eadcf9503000000"}
"""

Next, you can also add extra fields to the log entries by using the extra parameter in the logger.info method.

import ray
import logging

ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)

logger = logging.getLogger()
logger.info("Driver process with extra fields", extra={"username": "anyscale"})

# The log entry includes the extra field "username" with the value "anyscale".

# {"asctime": "2024-07-17 21:57:50,891", "levelname": "INFO", "message": "Driver process with extra fields", "filename": "test.py", "lineno": 9, "username": "anyscale", "job_id": "04000000", "worker_id": "04000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "76cdbaa32b3938587dcfa278201b8cef2d20377c80ec2e92430737ae"}

Switch to the Logs tab in the Anyscale Workspace and select Show details of the log with the message "A Ray actor". Then, you can see the structured log entry in JSON format:

{
"timestamp": "2024-07-15T19:06:09.326Z",
"timestampNs": 1721070369326343200,
"payload": {
"actor_id": "4a03b12afe5598a00eadcf9503000000",
"job_id": "03000000",
"levelname": "INFO",
"message": "A Ray actor",
"node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9",
"task_id": "0ab01f2d6283d7194a03b12afe5598a00eadcf9503000000",
"worker_id": "51d62f87e3867cdcad9aecd7b431068ea433b3104c8cc4ed1db6eef7",
...
}
}

Log to driver

By default, Ray worker logs are redirected to the driver process (see Redirecting Worker logs to the Driver). However, this redirection is not scalable and can cause performance issues when running on large clusters. It also makes duplicated logs in both the driver and the worker logs. Since Ray 2.36.0, an environment variable RAY_LOG_TO_STDERR is introduced to be configured to disable the redirection in the entire node. You can set RAY_LOG_TO_STDERR=1 to disable the redirection. Or manually in your ray.init() call to include log_to_driver=False.

For the clouds that have enabled log ingestion through anyscale cloud config update CLI, Anyscale will automatically set RAY_LOG_TO_STDERR to disable the log redirection.