Ray Data Snowflake API
caution
You need to use proprietary Ray 2.33 or later to use these features.
ray.data.read_snowflake
ray.data.read_snowflake(
sql: str,
connection_parameters: Dict[str, Any],
*,
parallelism: int = -1,
ray_remote_args: Dict[str, Any] = None,
concurrency: Optional[int] = None,
override_num_blocks: Optional[int] = None
)
Read data from a Snowflake data set.
Examples
import ray
connection_parameters = dict(
user=...,
account="ABCDEFG-ABC12345",
password=...,
database="SNOWFLAKE_SAMPLE_DATA",
schema="TPCDS_SF100TCL"
)
ds = ray.data.read_snowflake("SELECT * FROM CUSTOMERS", connection_parameters)
Parameters
sql
-- The SQL query to execute.connection_parameters
-- Keyword arguments to pass tosnowflake.connector.connect
. To view supported parameters, read https://docs.snowflake.com/developer-guide/python-connector/python-connector-api#functions.parallelism
-- This argument is deprecated. Useoverride_num_blocks
argument.ray_remote_args
-- kwargs passed toray.remote
in the read tasks.concurrency
-- The maximum number of Ray tasks to run concurrently. Set this to control number of tasks to run concurrently. This doesn't change the total number of tasks run or the total number of output blocks. By default, concurrency is dynamically decided based on the available resources.override_num_blocks
-- Override the number of output blocks from all read tasks. By default, the number of output blocks is dynamically decided based on input data size and available resources. You shouldn't manually set this value in most cases.
Returns
A Dataset
containing the data from the Snowflake data set.
PublicAPI (beta): This API is in beta and may change before becoming stable.
ray.data.Dataset.write_snowflake
Dataset.write_snowflake(
table: str,
connection_parameters: str,
*,
ray_remote_args: Dict[str, Any] = None,
concurrency: Optional[int] = None,
)
Write this Dataset
to a Snowflake table.
Examples
import ray
connection_parameters = dict(
user=...,
account="ABCDEFG-ABC12345",
password=...,
database="SNOWFLAKE_SAMPLE_DATA",
schema="TPCDS_SF100TCL"
)
ds = ray.data.read_parquet("s3://anonymous@ray-example-data/iris.parquet")
ds.write_snowflake("MY_DATABASE.MY_SCHEMA.IRIS", connection_parameters)
Parameters
table
-- The name of the table to write to.connection_parameters
-- Keyword arguments to pass tosnowflake.connector.connect
. To view supported parameters, read https://docs.snowflake.com/developer-guide/python-connector/python-connector-api#functions.ray_remote_args
-- kwargs passed toray.remote
in the write tasks.concurrency
-- The maximum number of Ray tasks to run concurrently. Set this to control number of tasks to run concurrently. This doesn't change the total number of tasks run. By default, concurrency is dynamically decided based on the available resources.
PublicAPI (beta): This API is in beta and may change before becoming stable.