Databricks API
Check your docs version
This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.
info
If you want to access this feature, contact the Anyscale team.
DatabricksDatasource
DatabricksDatasource(
server_hostname: str,
http_path: str,
access_token: str,
catalog: Optional[str] = None,
schema: Optional[str] = None,
)
A Datasource that reads and writes to Databricks.
Parameters
server_hostname
: The server hostname for the cluster or SQL warehouse.http_path
: The HTTP path of the cluster or SQL warehouse.access_token
: Your Databricks personal access token for the workspace for the cluster or SQL warehouse.catalog
: Initial catalog to use for the connection. Defaults toNone
.schema
: Initial schema to use for the connection. Defaults toNone
.
For detailed instructions on acquiring Databricks connection parameters, read Get started in the Databricks SQL Connector documentation.
Examples
from ray_extensions.data import DatabricksDatasource
datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)
ray.data.read_datasource
ray.data.read_datasource(
datasource: DatabricksDatasource,
*,
sql: str
) -> Dataset
Read data from a Databricks table into a Ray Dataset.
Parameters
datasource
: ADatabricksDatasource
.sql
: The SQL query you want to execute.
Returns
A Ray Dataset that contains the query result set.
Examples
import ray
from ray_extensions.data import DatabricksDatasource
datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)
ds = ray.data.read_datasource(
datasource,
sql="SELECT * FROM samples.tpch.supplier"
)
Dataset.write_datasource
Dataset.write_datasource(
datasource: DatabricksDatasource,
*,
table: str,
stage_uri: str
) -> None
Write data in a Ray Dataset to a Databricks table.
info
Your Databricks cluster or warehouse needs to read the bucket specified by
stage_uri
. To configure access to the bucket,
read Configure S3 access with instance profiles.
Parameters
datasource
: ADatabricksDatasource
.table
: The table you want to write to.stage_uri
: The URI of an S3 bucket where Ray can temporarily stage files.
Examples
import ray
from ray_extensions.data import DatabricksDatasource
datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)
ds = ray.data.from_items([
{"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2},
{"title": "And Now for Something Completely Different", "year": 1971, "score" 7.5},
])
ds.write_datasource(
datasource,
table="my_catalog.my_schema.movies",
stage_uri="s3://ray-staging-bucket"
)