Structured logging
Anyscale enables users to configure the Python logging library to output logs in a structured JSON format. This setup standardizes log entries, making them easier to handle.
Pre-requisites
Use Anyscale Runtime version 2.30 or higher (version 2.32 or higher is recommended).
When using Anyscale runtime version 2.36 or higher, logs automatically output with JSON formatting and include additional metadata. If you have set up log exporting with Vector, make sure your transformations are compatible with JSON logs.
API (alpha)
Method 1: Configure structured logging with ray.init
ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)
Users can configure the following parameters:
-
encoding
: The encoding format for the logs. The default isTEXT
for plain text logs. The other option isJSON
for structured logs. In bothTEXT
andJSON
encoding formats, the logs include Ray-specific fields such asjob_id
,worker_id
,node_id
,actor_id
, andtask_id
, if available. -
log_level
: The log level for the driver process. The default isINFO
. Available log levels are defined in the Python logging library.
When you set up logging_config
in ray.init
, it configures the root loggers for the driver process, Ray actors, and Ray tasks.
The log_to_driver
parameter is set to False
to disable logging to the driver
process as the redirected logs to the driver will include prefixes that made the logs
not JSON parsable.
Method 2: Configure structured logging with an environment variable (Anyscale Runtime 2.32 or higher)
You can configure the RAY_LOGGING_CONFIG_ENCODING
environment variable to set the encoding format for the logs.
You can set the value to TEXT
or JSON
.
Note that the environment variable needs to be set before import ray
.
import os
os.environ["RAY_LOGGING_CONFIG_ENCODING"] = "JSON"
import ray
import logging
ray.init(log_to_driver=False)
# Use the root logger to print log messages.
Example
The following example configures the LoggingConfig
to output logs in a structured JSON format and set the log level to INFO
.
It then logs messages with the root loggers in the driver process, Ray tasks, and Ray actors.
The logs include Ray-specific fields such as job_id
, worker_id
, node_id
, actor_id
, and task_id
.
import ray
import logging
ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)
def init_logger():
"""Get the root logger"""
return logging.getLogger()
logger = logging.getLogger()
logger.info("Driver process")
@ray.remote
def f():
logger = init_logger()
logger.info("A Ray task")
@ray.remote
class actor:
def print_message(self):
logger = init_logger()
logger.info("A Ray actor")
task_obj_ref = f.remote()
ray.get(task_obj_ref)
actor_instance = actor.remote()
ray.get(actor_instance.print_message.remote())
"""
{"asctime": "2024-07-15 19:06:06,469", "levelname": "INFO", "message": "Driver process", "filename": "test.py", "lineno": 12, "job_id": "03000000", "worker_id": "03000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9"}
(f pid=4871) {"asctime": "2024-07-15 19:06:07,435", "levelname": "INFO", "message": "A Ray task", "filename": "test.py", "lineno": 17, "job_id": "03000000", "worker_id": "f8f84d811683e5d9e03744a4386b26a5cd6f6ca09fc5cdc8e1dbe5a3", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9", "task_id": "fa31b89f94899135ffffffffffffffffffffffff03000000"}
(actor pid=4939) {"asctime": "2024-07-15 19:06:08,700", "levelname": "INFO", "message": "A Ray actor", "filename": "test.py", "lineno": 23, "job_id": "03000000", "worker_id": "51d62f87e3867cdcad9aecd7b431068ea433b3104c8cc4ed1db6eef7", "node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9", "actor_id": "4a03b12afe5598a00eadcf9503000000", "task_id": "0ab01f2d6283d7194a03b12afe5598a00eadcf9503000000"}
"""
Next, you can also add extra fields to the log entries by using the extra
parameter in the logger.info
method.
import ray
import logging
ray.init(
log_to_driver=False,
logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO")
)
logger = logging.getLogger()
logger.info("Driver process with extra fields", extra={"username": "anyscale"})
# The log entry includes the extra field "username" with the value "anyscale".
# {"asctime": "2024-07-17 21:57:50,891", "levelname": "INFO", "message": "Driver process with extra fields", "filename": "test.py", "lineno": 9, "username": "anyscale", "job_id": "04000000", "worker_id": "04000000ffffffffffffffffffffffffffffffffffffffffffffffff", "node_id": "76cdbaa32b3938587dcfa278201b8cef2d20377c80ec2e92430737ae"}
Switch to the Logs tab in the Anyscale Workspace and select Show details of the log with the message "A Ray actor". Then, you can see the structured log entry in JSON format:
{
"timestamp": "2024-07-15T19:06:09.326Z",
"timestampNs": 1721070369326343200,
"payload": {
"actor_id": "4a03b12afe5598a00eadcf9503000000",
"job_id": "03000000",
"levelname": "INFO",
"message": "A Ray actor",
"node_id": "824f9d7c6a82a0faf42b91f07b42667df0831034a713f04f28ba84b9",
"task_id": "0ab01f2d6283d7194a03b12afe5598a00eadcf9503000000",
"worker_id": "51d62f87e3867cdcad9aecd7b431068ea433b3104c8cc4ed1db6eef7",
...
}
}
Log to driver
By default, Ray worker logs are redirected to the driver process (see
Redirecting Worker logs to the Driver).
However, this redirection is not scalable and can cause performance issues when running
on large clusters. It also makes duplicated logs in both the driver and the worker logs.
Since Ray 2.36.0, an environment variable RAY_LOG_TO_STDERR
is introduced to be
configured to disable the redirection in the entire node. You can set
RAY_LOG_TO_STDERR=1
to disable the redirection. Or manually in your ray.init()
call to include log_to_driver=False
.
For the clouds that have enabled log ingestion through
anyscale cloud config update
CLI,
Anyscale will automatically set RAY_LOG_TO_STDERR
to disable the log redirection.