Skip to main content

Ecosystem Integrations

Data Scientists and Engineers have an array of third-party tools and libraries which help in their daily work. Some of the ones we have seen so far have been

Each of these four tools have a different pattern for integrating with Anyscale and Ray. These details should assist in developing your integrations.

Fundamentally there are two integration types:

  • Code-level Integrations in which you'll integrate with a particular tool by modifying your code
  • Service-level Integrations in which you'll integrate with a particular tool by setting some configuration that will automatically log information.

Code-level integrations

Most of the code that data scientists and ML engineers use come from third-party libraries and are imported and leveraged from within the Python application. Many integrations with third-party tools are no different. With an API token in hand, all it takes for most integrations is to

  • set your token in a runtime environment variable
  • include the third party integration as a dependency
  • use logging statements or other integrations

Weights and Biases

Weights and Biases is a suite of tools for experiment tracking for Machine Learning practioners and the integration with Anyscale is at the code-level. Although wandb code can be called directly from your Anyscale cluster, the W&B integration with Ray is treated as a first class integration in Anyscale with additional features.

Ray currently provides two lightweight integrations with W&B:

  • WandbLoggerCallback is used to automatically log metrics reported to Ray Tune or Train to W&B
  • setup_wandb initializes a W&B session in a Ray Tune or Train method so wandb.log() can be used as normal to log metrics

To simplify the experience of using the the Ray integration with W&B in Anyscale, the following features are added to the Anyscale integration with W&B. These will all automatically apply if either WandbLoggerCallback or setup_wandb is used in Ray application code running in a workspace, Anyscale production job, or Ray job running on Anyscale.

  • Default values for the W&B project and group populated based on the type of Anyscale execution (Ray job, production job, workspace) that is used, if these fields are not already populated through code or environment variables. This allows for an easy conceptual mapping and organization of W&B runs in context of the execution in Anyscale.
  • Link from the Anyscale execution detail page to W&B resource for easier navigation between Anyscale execution details and W&B metrics. Integration in Anyscale UI
  • Link back to the Anyscale execution detail page from the W&B run config. This is especially useful to view logs related to your W&B run in a single file/viewer, even if the execution may have occured over multiple nodes across machines. Integration in W&B UI
  • If the Anyscale cloud has access to read from the cloud provider's secrets manager, support fetching the W&B API key from the secrets manager based on the user and cloud of the code execution.

Example with Ray Job run on Anyscale

import ray
import os
from ray import tune
from ray.air.integrations.wandb import WandbLoggerCallback
import wandb

def train_fn(config):
for i in range(10):
loss = config["a"] + config["b"]
tune.report(loss=loss, done=True)

tune.run(
train_fn,
config={
# define search space here
"a": tune.choice([1, 2, 3]),
"b": tune.choice([4, 5, 6]),
},
callbacks=[WandbLoggerCallback()]
)

In an Anyscale cluster/workspace that has wandb installed, run wandb login and paste in your API key. Then submit the above code as a Ray job with RAY_ADDRESS=anyscale://my_cluster ray job submit --working-dir . -- python wandb_demo.py

Methods of specifying API key

The W&B API key can be specified through the following methods for code running in workspaces, Anyscale jobs, and Ray jobs running on Anyscale:

  • Running wandb login on the cluster before executing code. This only works if WandbLoggerCallback is being used.
  • Specifying WANDB_API_KEY in the environment where code is being executed.
  • Providing API key as argument to WandbLoggerCallback or setup_wandb
  • Providing API key file as argument to WandbLoggerCallback or setup_wandb. If using setup_wandb, make sure the API key file is synced to all worker nodes (eg: through runtime environments)

API key in Secrets Manager

A more secure approach to specifying the API key is to store it in the secrets manager of the cloud provider associated with your Anyscale cloud following our naming convention:

Alternatively, WANDB_API_KEY_NAME can be specified as an environment variable to use a different naming convention. If the API key is not provided through any other method above and the cluster has access to the secrets manager, the W&B integration on Anyscale will automatically fetch the appropriate secret for the current user and cloud per our naming convention or the value in WANDB_API_KEY_NAME.

Please work with your Anyscale Admin and Solution Architect to set up the correct policies to allow your cluster to access the secrets manager.

Environment Dependencies: Any version of Ray compatible with the open source integration and any Anyscale version can be used if running the open source integration on an Anyscale cluster. However, to get the additional features specific to the Anyscale integration, please use nighly Ray and anyscale>=0.5.74 in your Anyscale cluster. The correct dependencies are installed in the Anyscale Ray nighly base images (eg: anyscale/ray:nightly-py310).

MLflow and Anyscale

MLflow provides management of Machine Learning models and experiment metrics and logs. Including calls to metaflow in your code is similar to Weights and Biases.

In order for the MLflow client library to log metrics and register models with MLflow, provide one or more environment variables to Anyscale.

If you have created your own MLflow server in your cloud account, then you can configure you Anyscale applications to track to it. Here's a ray.init() call that initializes an environment for tracking to MLflow.

ray.init("anyscale://integrations",
runtime_env={"pip":["mlflow"],
"env_vars":{"MLFLOW_TRACKING_URI":'YOUR_MLFLOW_TRACKING_URI'},
"excludes":["tests", "yello*"],
"working_dir"="."})

MLflow Hosted by Databricks

If you have a Databricks account, then include a hostname, token, and experiment name from Databricks and MLflow will log to your Databricks instance. For example:

ray.init("anyscale://integrations",
project_dir=".",
runtime_env={"pip":["mlflow"],
"env_vars":{"MLFLOW_TRACKING_URI":'YOUR_MLFLOW_TRACKING_URI',
"DATABRICKS_HOST":"http://databricks....",
"DATABRICKS_TOKEN":"YOURDATABRICKSTOKEN",
"MLFLOW_EXPERIMENT_NAME":"/Users/xxx@yyy.com/first-experiment"},
"excludes":["tests", "yello*"],
"working_dir"="."})

Here's an example of a task that logs some parameters and metrics to Databricks's MLflow:

@ray.remote
def logging_task():
with mlflow.start_run():
alpha = "ALPHA"
l1_ratio = "L1"
rmse = 0.211
r2 = 0.122
mae = 30
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
return "Done"

print(ray.get(logging_task.remote()))

Service-level Integrations

Datadog

Datadog is a popular platform for general application monitoring and analytics.

To use Datadog, the image backing your cluster nodes must have the Datadog agent installed. Fortunately, Datadog provides a very stable method for installation, and all you need to do is copy the recommended installation method into the "post build commands" of a Cluster Environment, and then leverage that environment when launching clusters.

Integration using an Agent

Copy this into the "Debian" section of your Cluster Environment:

curl

And this into your "post-build commands," using your API key.

DD_AGENT_MAJOR_VERSION=7 DD_INSTALL_ONLY=true DD_API_KEY={YOUR_API_KEY_HERE} DD_SITE="datadoghq.com" bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"
echo "sudo service datadog-agent start" >> ~/.bashrc

The first line ensures that the Datadog Agent is available on each node that Ray provisions. The second line appends a command to start the agent into the .bashrc file, which is run when the cluster launches.

Once the agent is installed and running, and depending on your Datadog plan, you'll see system metrics and logs flowing to Datadog from servers when they are running.