Read and Write Data from Snowflake
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
Ray Datasets are the standard way to load and exchange data in Ray applications. This guide shows you how to:
- Configure credentials required to connect to Snowflake
- Read data from Snowflake table into a Ray Dataset
- Write data in a Ray Dataset to a Snowflake table
Who can use this feature
The Ray Snowflake APIs are only available to Anyscale customers.
If you want to access this feature, contact the Anyscale team.
Before you begin
The Ray Snowflake APIs depend on the Snowflake Connector for Python. To install it, specify the following PyPI package in your cluster environment:
snowflake-connector-python
To learn more about installing dependencies into your environment, read Anyscale environment.
Connecting to Snowflake
To connect to Snowflake, create a SnowflakeDatasource
and specify:
user
: The username you use to log into Snowflake.password
: The password you use to log into Snowflake.account
: Your account identifier. Format the identifier with a hyphen like<orgname>-<account-name>
.
from ray_extensions.data import SnowflakeDatasource
datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)
To learn more about configuring Snowflake credentials, read Connecting to Snowflake.
Reading data from Snowflake
To read a Snowflake table into a Ray Dataset, pass a SnowflakeDatasource
to
ray.data.read_datasource
.
import ray
from ray_extensions.data import SnowflakeDatasource
datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)
ds = ray.data.read_datasource(
datasource,
sql="SELECT * FROM SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CUSTOMER"
)
Writing data to Snowflake
To write data in a Ray Dataset to a Snowflake table, pass a SnowflakeDatasource
to ray.data.write_datasource
.
import ray
from ray_extensions.data import SnowflakeDatasource
datasource = SnowflakeDatasource(
user=...,
account="ABCDEFG-ABC12345",
password=...,
)
dataset = ray.data.from_items([
{"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2},
{"title": "And Now for Something Completely Different", "year": 1971, "score": 7.5},
])
dataset.write_datasource(datasource, table="MY_DATABASE.MY_SCHEMA.MOVIES")
Next steps
To train a model with data stored in Snowflake, visit our Ray Examples library.