Arroyo SQL supports User-Defined Aggregate Functions, allowing you to extend the system’s capabilities with custom aggregation logic written in Rust. UDAFs are deployed and managed in the same way as scalar UDFs; see the UDF docs for details.

Defining a UDAF

UDAFs are defined as functions that take some number of Vec<T> as an argument, where T is any valid SQL data type. Note that all arguments must be Vec—you can’t mix-and-match vector and scalar parameters.

Example

Here’s a simple UDAF for calculating the median of a set of integers:

use arroyo_sql_plugin::udf;

#[udf]
fn my_median(mut args: Vec<i64>) -> Option<f64> {
    if args.is_empty() {
        return None;
    }

    args.sort();

    let mid = args.len() / 2;
    if args.len() % 2 == 0 {
        Some((args[mid] + args[mid - 1]) as f64 / 2.0)
    } else {
        Some(args[mid] as f64)
    }
}

Once defined, a UDAF can be used within a SQL query like the built-in aggregate functions:

SELECT
    my_median(bid.price),
    bid.auction,
    SESSION(interval '10 second') as session
FROM nexmark
GROUP BY 2,3

Nullability

If a UDAF argument type is an Option (e.g., Vec<Option<i64>>), the UDAF will be invoked with all inputs, including NULLs. Otherwise, only non-null arguments will be passed down to the UDAF. Unlike UDFs, UDAFs will always be invoked, although possibly with empty input vectors.