SQL Reference
Arroyo pipelines are written in SQL. The dialect is based on Apache DataFusion — a well-supported SQL engine built on Apache Arrow — and extended with streaming-specific constructs like windows, watermarks, and event-time joins.
If you’ve used other SQL engines (Postgres, DuckDB, Spark SQL, Flink SQL), most of what you already know carries over. The sections below cover the pieces that are either Arroyo-specific or particularly important for streaming.
Core reference
Section titled “Core reference”- SQL Data Types — the primitive and complex types supported by Arroyo, and how they map to underlying Rust types.
- SELECT Statements — the basic query syntax, including projections,
filtering,
UNNEST, and subqueries. - DDL Statements —
CREATE TABLEfor connection tables,CREATE VIEW, and theWITHoptions used to configure connectors.
Streaming
Section titled “Streaming”- Streaming Windows — tumbling, sliding, and session windows for time-bucketed aggregations.
- Joins — stream-stream joins (windowed and interval), lookup joins against external systems, and the semantics of each.
- Updating Tables — working with update streams (CDC-style data), including Debezium-formatted inputs and outputs.
Functions
Section titled “Functions”- Scalar Functions — math, string, JSON, time, array, struct, regex, hashing, and more.
- Aggregate Functions —
sum,count,avg,array_agg, and other functions usable inGROUP BYqueries. - Window Functions — OVER-clause analytical functions like
row_number,rank, andlag/lead.
For user-defined functions in Rust or Python, see the UDF docs.