Data from Arroyo can be written to a Delta Lake table using the delta sink. This sink shares most of its code with the File System Sink, and as such, it supports all of the same configuration options.

However, it has the following constraints:

  • The format option must be set to parquet, as Delta Lake only supports Parquet files.
  • Delta can only be used as a sink, not a source.
  • Partitioning is not integrated with the delta log.

Example query

CREATE TABLE delta_sink (
    id INTEGER,
    name STRING,
    age INTEGER
) WITH (
    'connector' = 'delta',
    'path' = 's3://my_bucket/my_table',
    'format' = 'parquet',
    'filename.strategy' = 'uuid'
INSERT INTO delta_sink SELECT id, name, age FROM my_source;

Commit Behavior

Arroyo commits to Delta Lake tables using the same two-phase commit protocol as the File System Sink. The files to be written are staged and then committed after taking a reliable checkpoint. Idempotence is ensured by validating the table version before appending and, if different than expected, checking all intervening versions to ensure that the data was not already added.