Aggregate Functions
Supported Aggregate Functions in Arroyo SQL
Arroyo’s Aggregate function implementations are based on Apache DataFusion and these docs are derived from the DataFusion function reference.
General
avg
Returns the average of numeric values in the specified column.
avg(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
Aliases
mean
bit_and
Computes the bitwise AND of all nonnull input values.
bit_and(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
bit_or
Computes the bitwise OR of all nonnull input values.
bit_or(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
bit_xor
Computes the bitwise exclusive OR of all nonnull input values.
bit_xor(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
bool_and
Returns true if all nonnull input values are true, otherwise false.
bool_and(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
bool_or
Returns true if any nonnull input value is true, otherwise false.
bool_or(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
count
Returns the number of rows in the specified column.
Count includes null values in the total count.
To exclude null values from the total count, include <column> IS NOT NULL
in the WHERE
clause.
count(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
max
Returns the maximum value in the specified column.
max(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
mean
Alias of avg.
median
Returns the median value in the specified column.
median(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
min
Returns the minimum value in the specified column.
min(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
sum
Returns the sum of all values in the specified column.
sum(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
array_agg
Returns an array created from the expression elements. If ordering requirement is given, elements are inserted in the order of required ordering.
array_agg(expression [ORDER BY expression])
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
first_value
Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
first_value(expression [ORDER BY expression])
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
last_value
Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
last_value(expression [ORDER BY expression])
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
Statistical
corr
Returns the coefficient of correlation between two numeric values.
corr(expression1, expression2)
Arguments
 expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
covar
Returns the covariance of a set of number pairs.
covar(expression1, expression2)
Arguments
 expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
covar_pop
Returns the population covariance of a set of number pairs.
covar_pop(expression1, expression2)
Arguments
 expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
covar_samp
Returns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
Arguments
 expression1: First expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
stddev
Returns the standard deviation of a set of numbers.
stddev(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
stddev_pop
Returns the population standard deviation of a set of numbers.
stddev_pop(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
stddev_samp
Returns the sample standard deviation of a set of numbers.
stddev_samp(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
var
Returns the statistical variance of a set of numbers.
var(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
var_pop
Returns the statistical population variance of a set of numbers.
var_pop(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
var_samp
Returns the statistical sample variance of a set of numbers.
var_samp(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_slope
Returns the slope of the linear regression line for nonnull pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.
regr_slope(expression1, expression2)
Arguments
 expression_y: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_avgx
Computes the average of the independent variable (input) expression_x
for the nonnull paired data points.
regr_avgx(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_avgy
Computes the average of the dependent variable (output) expression_y
for the nonnull paired data points.
regr_avgy(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_count
Counts the number of nonnull paired data points.
regr_count(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_intercept
Computes the yintercept of the linear regression line. For the equation (y = kx + b), this function returns b
.
regr_intercept(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_r2
Computes the square of the correlation coefficient between the independent and dependent variables.
regr_r2(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_sxx
Computes the sum of squares of the independent variable.
regr_sxx(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_syy
Computes the sum of squares of the dependent variable.
regr_syy(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
regr_sxy
Computes the sum of products of paired data points.
regr_sxy(expression_y, expression_x)
Arguments
 expression_y: Dependent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
 expression_x: Independent variable. Can be a constant, column, or function, and any combination of arithmetic operators.
Approximate
approx_distinct
Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
approx_distinct(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
approx_median
Returns the approximate median (50th percentile) of input values.
It is an alias of approx_percentile_cont(x, 0.5)
.
approx_median(expression)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
approx_percentile_cont
Returns the approximate percentile of input values using the tdigest algorithm.
approx_percentile_cont(expression, percentile, centroids)
Arguments

expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.

percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).

centroids: Number of centroids to use in the tdigest algorithm. Default is 100.
If there are this number or fewer unique values, you can expect an exact result. A higher number of centroids results in a more accurate approximation, but requires more memory to compute.
approx_percentile_cont_with_weight
Returns the weighted approximate percentile of input values using the tdigest algorithm.
approx_percentile_cont_with_weight(expression, weight, percentile)
Arguments
 expression: Expression to operate on. Can be a constant, column, or function, and any combination of arithmetic operators.
 weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.
 percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).