Arroyo’s Scalar function support mainly mirrors that of DataFusion, which in turn uses the semantics of the equivalent Postgres functions.

Math Functions

All single-argument math functions return DOUBLE values and are cast to f64 before being called.

abs

Returns the absolute value of a number.

SELECT abs(-3) FROM source;
3

acos

Returns the arc cosine of a number.

SELECT acos(0.5) FROM source;
1.0471975511965976

acosh

Returns the inverse hyperbolic cosine of a number.

SELECT acosh(1) FROM source;
0.0

asin

Returns the arc sine of a number.

SELECT asin(0.5) FROM source;
0.5235987755982988

asinh

Returns the inverse hyperbolic sine of a number.

SELECT asinh(0.5) FROM source;
0.48121182505960347

atan

Returns the arc tangent of a number.

SELECT atan(0.5) FROM source;
0.46364760900080615

ceil

Returns the smallest integer greater than, or equal to, a number.

SELECT ceil(0.5) FROM source;
1.0

cos

Returns the cosine of a number.

SELECT cos(0.5) FROM source;
0.8775825618903728

cosh

Returns the hyperbolic cosine of a number.

SELECT cosh(0.5) FROM source;
1.1276259652063807

floor

Returns the largest integer less than, or equal to, a number.

SELECT floor(0.5) FROM source;
0.0

exp

Returns e raised to the power of a number.

SELECT exp(0.5) FROM source;
1.6487212707001282

ln

Returns the natural logarithm of a number.

SELECT ln(0.5) FROM source;
-0.6931471805599453

log2

Returns the base 2 logarithm of a number.

SELECT log2(0.5) FROM source;
-1.0

signum

Returns the sign of a number.

SELECT signum(0.5) FROM source;
1.0

sin

Returns the sine of a number.

SELECT sin(0.5) FROM source;
0.479425538604203

sinh

Returns the hyperbolic sine of a number.

SELECT sinh(0.5) FROM source;
0.5210953054937474

String Functions

ascii

Returns the numeric value of the first character of a string.

SELECT ascii('hello world') FROM source;
104

bit_length

Returns the number of bits in a string.

SELECT bit_length('hello world') FROM source;
88

btrim

Removes the longest string consisting only of characters in the second argument (a space by default) from the start and end of string.

SELECT btrim(' hello world ') FROM source;
'hello world'
SELECT btrim('hello world', 'hed') FROM source;
'llo worl'

char_length

Returns the number of characters in a string.

SELECT char_length('hello world') FROM source;
11

chr

Returns the character with the given code.

SELECT chr(104) FROM source;
'h'

concat

Concatenates any number of strings.

SELECT concat('hello', ' ', 'world') FROM source;
'hello world'

concat_ws

Concatenates using the rest of the arguments using first argument as the separator.

SELECT concat_ws('-', 'hello', 'world') FROM source;
'hello-world'

initcap

Converts the first letter of each word to upper case and the rest to lower case.

SELECT initcap('heLLo woRld') FROM source;
'Hello World'

left

Returns the first n characters in the string.

SELECT left('hello world', 5) FROM source;
'hello'

length

Returns the number of characters in a string.


SELECT length('hello world') FROM source;
11

lpad

Pads the string to the left with the given string until the string is n characters long.

SELECT lpad('hello', 10, 'x') FROM source;
'xxxxxhello'

lower

Converts the string to lower case.

SELECT lower('HELLO WORLD') FROM source;
'hello world'

ltrim

Removes the all characters in the second string from the first string. Defaults to space.

SELECT ltrim('hello world', 'hed') FROM source;
'llo world'
SELECT ltrim(' hello world ') FROM source;
'hello world '

octet_length

Returns the number of bytes in a string.

SELECT octet_length('hello world') FROM source;
11

repeat

Repeats the string n times.

SELECT repeat('hello', 3) FROM source;
'hellohellohello'

replace

Updates the first argument by replacing all instances of the second argument with the third.

SELECT replace('hello world', 'hello', 'goodbye') FROM source;
'goodbye world'

reverse

Reverses the string.

SELECT reverse('dlrow elloh') FROM source;
'hello world'

Returns the last n characters in the string.

SELECT right('hello world', 5) FROM source;
'world'

rpad

Pads the string to the right with the given string until the string is n characters long.

SELECT rpad('hello', 10, 'x') FROM source;
'helloxxxxx'

rtrim

Removes the all characters in the second string from the first string. Defaults to space.

SELECT rtrim('hello world', 'dlo') FROM source;
'hello wor'
SELECT rtrim(' hello world ') FROM source;
' hello world'

split_part

Returns the nth part of the string split by the given delimiter.

SELECT split_part('hello world', ' ', 2) FROM source;
'world'

strpos

Returns the position of the first occurrence of the second argument in the first argument.

SELECT strpos('hello world', 'world') FROM source;
7

substr

Returns the substring of the first argument starting at the second argument with the given length.

SELECT substr('hello world', 7, 5) FROM source;
'world'

Hash Functions

Postgres-equivalent hash functions.

md5

Returns the MD5 hash of the input string as a hex string

SELECT md5('Hello, World') FROM source;
82bb413746aee42f89dea2b59614f9ef

sha224

Returns the SHA224 hash of the input string.

SELECT sha224('hello world') FROM source;
'\x2f05477fc24bb4faefd86517156dafdecec45b8ad3cf2522a563582b'

sha256

Returns the SHA256 hash of the input string.

SELECT sha256('hello world') FROM source;
'\xb94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'

sha384

Returns the SHA384 hash of the input string.

SELECT sha384('hello world') FROM source;
'\x59e1748777448c69de6b800d7a33bbfb9ff1b463e44354c3553bcdb9c666fa901b765c6b\
  8cdcd1f6e5f39bfabec89b2d'

sha512

Returns the SHA512 hash of the input string.

SELECT sha512('hello world') FROM source;
'\x309ecc489c12d6eb4cc40f50c902f2b4d0ed77ee511a7c7a9bcd3ca86d4cd86f\
  989dd35bc5ff499670da34255b45b0cfd830e81f605dcf7dc5542e93ae9cd76f'

Date/Time Functions

date_part

Returns the part of the date/time as a number.

SELECT date_part('year', '2019-01-01') FROM source;
2019

Supported date parts are:

year, month, week, day, hour, minute, second, millisecond, microsecond, nanosecond, dow, doy

date_trunc

Truncates the date/time to the given precision.

SELECT date_trunc('day', '2019-01-01 12:34:56') FROM source;
'2019-01-01 00:00:00'

Supported precisions are:

year, month, week, day, hour, minute, second

to_timestamp_seconds

Converts the given string to a timestamp with seconds.

SELECT to_timestamp_seconds('2019-01-01 12:34:56') FROM source;
'2019-01-01 12:34:56'

to_timestamp_millis

Converts the given string to a timestamp with milliseconds.

SELECT to_timestamp_millis('2019-01-01 12:34:56.789') FROM source;
'2019-01-01 12:34:56.789'

to_timestamp_micros

Converts the given string to a timestamp with microseconds.

SELECT to_timestamp_micros('2019-01-01 12:34:56.789012') FROM source;
'2019-01-01 12:34:56.789012'

to_timestamp

Converts the given string to a timestamp with nanosecond precision

SELECT to_timestamp('2019-01-01 12:34:56') FROM source;
'2019-01-01T12:34:56.000000000Z'

from_unixtime

Converts an integer to RFC3339 timestamp format. Input is parsed as nanoseconds since the UNIX Epoch.

SELECT from_unixtime(1546303496000000000) FROM source;
'2018-12-31T16:44:56.000000000Z'

JSON Functions

The JSON functions provide basic json parsing functions using JsonPath an evolving standard for querying JSON objects.

extract_json

Returns the JSON elements in the first argument that match the JsonPath in the second argument. The returned value is an array of json strings.

SELECT extract_json('{"a": 1, "b": 2, "c": 3}', '$.a') FROM source;
['1']

extract_json_string

Returns an unescaped String for the first item matching the JsonPath, if it is a string.

SELECT extract_json_string('{"a": "a", "b": 2, "c": 3}', '$.a') FROM source;
'a'

Regex Functions

Arroyo supports several regex functions. Regex functions are implemented using the regex crate and support the syntax documented there.

For more complex regex operations, consider using the regex library directly via a Rust UDF.

regexp_match

Returns capture groups (parenthesized expressions) of the first match of a regular expression on the string as an array. If no capture groups are present, the entire match is returned.

SELECT regexp_match('https://www.example.com/page/path', '(https?)://(^[/]+)/(.*)') FROM source;
['https', 'www.example.com', 'page/path']

regexp_replace

Replaces all matches of the regex in the string with the replacement string.

SELECT regexp_replace('this is a   sentence', '\s+', '_') FROM source;
'this_is_a_sentence'