Skip to content

SQL Reference

Arroyo pipelines are written in SQL. The dialect is based on Apache DataFusion — a well-supported SQL engine built on Apache Arrow — and extended with streaming-specific constructs like windows, watermarks, and event-time joins.

If you’ve used other SQL engines (Postgres, DuckDB, Spark SQL, Flink SQL), most of what you already know carries over. The sections below cover the pieces that are either Arroyo-specific or particularly important for streaming.

  • SQL Data Types — the primitive and complex types supported by Arroyo, and how they map to underlying Rust types.
  • SELECT Statements — the basic query syntax, including projections, filtering, UNNEST, and subqueries.
  • DDL StatementsCREATE TABLE for connection tables, CREATE VIEW, and the WITH options used to configure connectors.
  • Streaming Windows — tumbling, sliding, and session windows for time-bucketed aggregations.
  • Joins — stream-stream joins (windowed and interval), lookup joins against external systems, and the semantics of each.
  • Updating Tables — working with update streams (CDC-style data), including Debezium-formatted inputs and outputs.
  • Scalar Functions — math, string, JSON, time, array, struct, regex, hashing, and more.
  • Aggregate Functionssum, count, avg, array_agg, and other functions usable in GROUP BY queries.
  • Window Functions — OVER-clause analytical functions like row_number, rank, and lag/lead.

For user-defined functions in Rust or Python, see the UDF docs.