Skip to main content

Databricks API

Check your docs version

This version of the Anyscale docs is deprecated. Go to the latest version for up to date information.

info

If you want to access this feature, contact the Anyscale team.


DatabricksDatasource

DatabricksDatasource(
server_hostname: str,
http_path: str,
access_token: str,
catalog: Optional[str] = None,
schema: Optional[str] = None,
)

A Datasource that reads and writes to Databricks.

Parameters

  • server_hostname: The server hostname for the cluster or SQL warehouse.
  • http_path: The HTTP path of the cluster or SQL warehouse.
  • access_token: Your Databricks personal access token for the workspace for the cluster or SQL warehouse.
  • catalog: Initial catalog to use for the connection. Defaults to None.
  • schema: Initial schema to use for the connection. Defaults to None.

For detailed instructions on acquiring Databricks connection parameters, read Get started in the Databricks SQL Connector documentation.

Examples

from ray_extensions.data import DatabricksDatasource

datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)

ray.data.read_datasource

ray.data.read_datasource(
datasource: DatabricksDatasource,
*,
sql: str
) -> Dataset

Read data from a Databricks table into a Ray Dataset.

Parameters

  • datasource: A DatabricksDatasource.
  • sql: The SQL query you want to execute.

Returns

A Ray Dataset that contains the query result set.

Examples

import ray
from ray_extensions.data import DatabricksDatasource

datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)
ds = ray.data.read_datasource(
datasource,
sql="SELECT * FROM samples.tpch.supplier"
)

Dataset.write_datasource

Dataset.write_datasource(
datasource: DatabricksDatasource,
*,
table: str,
stage_uri: str
) -> None

Write data in a Ray Dataset to a Databricks table.

info

Your Databricks cluster or warehouse needs to read the bucket specified by stage_uri. To configure access to the bucket, read Configure S3 access with instance profiles.

Parameters

  • datasource: A DatabricksDatasource.
  • table: The table you want to write to.
  • stage_uri: The URI of an S3 bucket where Ray can temporarily stage files.

Examples

import ray
from ray_extensions.data import DatabricksDatasource

datasource = DatabricksDatasource(
server_hostname="dbc-a1b2345c-d6e7.cloud.databricks.com",
http_path="/sql/1.0/warehouses/a1b234c567d8e9fa",
access_token="dbapi...",
)
ds = ray.data.from_items([
{"title": "Monty Python and the Holy Grail", "year": 1975, "score": 8.2},
{"title": "And Now for Something Completely Different", "year": 1971, "score" 7.5},
])
ds.write_datasource(
datasource,
table="my_catalog.my_schema.movies",
stage_uri="s3://ray-staging-bucket"
)