User-defined Aggregate Functions (UDAFs)
Arroyo SQL supports User-Defined Aggregate Functions, allowing you to extend the system’s capabilities with custom aggregation logic written in Rust. UDAFs can be registered via the SQL API call or the Web UI.
How to Define a UDAF
In Rust, UDAFs are defined as functions that take some number of Vec<T>
as an argument, where T
is any valid SQL data type.
Example
Here’s a simple UDAF for calculating the median of a set of integers:
fn my_median(mut args: Vec<i64>) -> Option<f64> {
if args.is_empty() {
return None;
}
args.sort();
let mid = args.len() / 2;
if args.len() % 2 == 0 {
Some((args[mid] + args[mid - 1]) as f64 / 2.0)
} else {
Some(args[mid] as f64)
}
}
You can then use this UDAF within an SQL query:
SELECT my_median(bid.price), bid.auction, session(interval '10 second') as session
FROM nexmark
GROUP BY 2,3
Nullability
If a UDAF argument type is an Option
(e.g., Vec<Option<i64>>
), the UDAF will be invoked with all inputs, including NULL
s.
Otherwise, only non-null arguments will be passed down to the UDAF.
Unlike UDFs, UDAFs will always be invoked, although possibly with empty input vectors.
Debugging
If a UDAF encounters a panic while, diagnostic information is logged to the worker logs. Note that these logs are not currently accessible via the Web UI. If the UDAF fails to compile this will show up in the compilation service.
Future Work
UDAF support is still in its early stages. Here are some features we plan to add in the future:
- Support for ORDER BY clauses within the UDAF SQL expression.
- Support for UDAFs in non-windowed aggregations.
- Support for incrementally computed UDAFs.