Skip to main content

Read and Write Data from Snowflake

Check your docs version

This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.

Ray Datasets are the standard way to load and exchange data in Ray applications. This guide shows you how to:

  1. Configure credentials required to connect to Snowflake
  2. Read data from Snowflake table into a Ray Dataset
  3. Write data in a Ray Dataset to a Snowflake table

Who can use this feature

The Ray Snowflake APIs are only available to Anyscale customers.

info

If you want to access this feature, contact the Anyscale team.

Before you begin

The Ray Snowflake APIs depend on the Snowflake Connector for Python. To install it, specify the following PyPI package in your cluster environment:

snowflake-connector-python

To learn more about installing dependencies into your environment, read Anyscale environment.

Connecting to Snowflake

To connect to Snowflake, create a SnowflakeDatasource and specify:

  • user: The username you use to log into Snowflake.
  • password: The password you use to log into Snowflake.
  • account: Your account identifier. Format the identifier with a hyphen like <orgname>-<account-name>.
from ray_extensions.data import SnowflakeDatasource

datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)

To learn more about configuring Snowflake credentials, read Connecting to Snowflake.

Reading data from Snowflake

To read a Snowflake table into a Ray Dataset, pass a SnowflakeDatasource to ray.data.read_datasource.

import ray
from ray_extensions.data import SnowflakeDatasource

datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)
ds = ray.data.read_datasource(
datasource,
sql="SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER"
)

Writing data to Snowflake

To write data in a Ray Dataset to a Snowflake table, pass a SnowflakeDatasource to ray.data.write_datasource.

import ray
from ray_extensions.data import SnowflakeDatasource

datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)
dataset = ray.data.from_items([
{"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2},
{"title": "And Now for Something Completely Different", "year": 1971, "score": 7.5},
])
dataset.write_datasource(datasource, table="MY_DATABASE.MY_SCHEMA.MOVIES")

Next steps

To train a model with data stored in Snowflake, visit our Ray Examples library.