Scalar Functions
Arroyo’s Scalar function support mainly mirrors that of DataFusion, which in turn uses the semantics of the equivalent Postgres functions.
Math Functions
All single-argument math functions return DOUBLE
values and are cast to f64 before being called.
abs
Returns the absolute value of a number.
SELECT abs(-3) FROM source;
3
acos
Returns the arc cosine of a number.
SELECT acos(0.5) FROM source;
1.0471975511965976
acosh
Returns the inverse hyperbolic cosine of a number.
SELECT acosh(1) FROM source;
0.0
asin
Returns the arc sine of a number.
SELECT asin(0.5) FROM source;
0.5235987755982988
asinh
Returns the inverse hyperbolic sine of a number.
SELECT asinh(0.5) FROM source;
0.48121182505960347
atan
Returns the arc tangent of a number.
SELECT atan(0.5) FROM source;
0.46364760900080615
ceil
Returns the smallest integer greater than, or equal to, a number.
SELECT ceil(0.5) FROM source;
1.0
cos
Returns the cosine of a number.
SELECT cos(0.5) FROM source;
0.8775825618903728
cosh
Returns the hyperbolic cosine of a number.
SELECT cosh(0.5) FROM source;
1.1276259652063807
floor
Returns the largest integer less than, or equal to, a number.
SELECT floor(0.5) FROM source;
0.0
exp
Returns e raised to the power of a number.
SELECT exp(0.5) FROM source;
1.6487212707001282
ln
Returns the natural logarithm of a number.
SELECT ln(0.5) FROM source;
-0.6931471805599453
log2
Returns the base 2 logarithm of a number.
SELECT log2(0.5) FROM source;
-1.0
signum
Returns the sign of a number.
SELECT signum(0.5) FROM source;
1.0
sin
Returns the sine of a number.
SELECT sin(0.5) FROM source;
0.479425538604203
sinh
Returns the hyperbolic sine of a number.
SELECT sinh(0.5) FROM source;
0.5210953054937474
String Functions
ascii
Returns the numeric value of the first character of a string.
SELECT ascii('hello world') FROM source;
104
bit_length
Returns the number of bits in a string.
SELECT bit_length('hello world') FROM source;
88
btrim
Removes the longest string consisting only of characters in the second argument (a space by default) from the start and end of string.
SELECT btrim(' hello world ') FROM source;
'hello world'
SELECT btrim('hello world', 'hed') FROM source;
'llo worl'
char_length
Returns the number of characters in a string.
SELECT char_length('hello world') FROM source;
11
chr
Returns the character with the given code.
SELECT chr(104) FROM source;
'h'
concat
Concatenates any number of strings.
SELECT concat('hello', ' ', 'world') FROM source;
'hello world'
concat_ws
Concatenates using the rest of the arguments using first argument as the separator.
SELECT concat_ws('-', 'hello', 'world') FROM source;
'hello-world'
initcap
Converts the first letter of each word to upper case and the rest to lower case.
SELECT initcap('heLLo woRld') FROM source;
'Hello World'
left
Returns the first n characters in the string.
SELECT left('hello world', 5) FROM source;
'hello'
length
Returns the number of characters in a string.
SELECT length('hello world') FROM source;
11
lpad
Pads the string to the left with the given string until the string is n characters long.
SELECT lpad('hello', 10, 'x') FROM source;
'xxxxxhello'
lower
Converts the string to lower case.
SELECT lower('HELLO WORLD') FROM source;
'hello world'
ltrim
Removes the all characters in the second string from the first string. Defaults to space.
SELECT ltrim('hello world', 'hed') FROM source;
'llo world'
SELECT ltrim(' hello world ') FROM source;
'hello world '
octet_length
Returns the number of bytes in a string.
SELECT octet_length('hello world') FROM source;
11
repeat
Repeats the string n times.
SELECT repeat('hello', 3) FROM source;
'hellohellohello'
replace
Updates the first argument by replacing all instances of the second argument with the third.
SELECT replace('hello world', 'hello', 'goodbye') FROM source;
'goodbye world'
reverse
Reverses the string.
SELECT reverse('dlrow elloh') FROM source;
'hello world'
right
Returns the last n characters in the string.
SELECT right('hello world', 5) FROM source;
'world'
rpad
Pads the string to the right with the given string until the string is n characters long.
SELECT rpad('hello', 10, 'x') FROM source;
'helloxxxxx'
rtrim
Removes the all characters in the second string from the first string. Defaults to space.
SELECT rtrim('hello world', 'dlo') FROM source;
'hello wor'
SELECT rtrim(' hello world ') FROM source;
' hello world'
split_part
Returns the nth part of the string split by the given delimiter.
SELECT split_part('hello world', ' ', 2) FROM source;
'world'
strpos
Returns the position of the first occurrence of the second argument in the first argument.
SELECT strpos('hello world', 'world') FROM source;
7
substr
Returns the substring of the first argument starting at the second argument with the given length.
SELECT substr('hello world', 7, 5) FROM source;
'world'
Hash Functions
Postgres-equivalent hash functions.
md5
Returns the MD5 hash of the input string as a hex string
SELECT md5('Hello, World') FROM source;
82bb413746aee42f89dea2b59614f9ef
sha224
Returns the SHA224 hash of the input string.
SELECT sha224('hello world') FROM source;
'\x2f05477fc24bb4faefd86517156dafdecec45b8ad3cf2522a563582b'
sha256
Returns the SHA256 hash of the input string.
SELECT sha256('hello world') FROM source;
'\xb94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
sha384
Returns the SHA384 hash of the input string.
SELECT sha384('hello world') FROM source;
'\x59e1748777448c69de6b800d7a33bbfb9ff1b463e44354c3553bcdb9c666fa901b765c6b\
8cdcd1f6e5f39bfabec89b2d'
sha512
Returns the SHA512 hash of the input string.
SELECT sha512('hello world') FROM source;
'\x309ecc489c12d6eb4cc40f50c902f2b4d0ed77ee511a7c7a9bcd3ca86d4cd86f\
989dd35bc5ff499670da34255b45b0cfd830e81f605dcf7dc5542e93ae9cd76f'
Date/Time Functions
date_part
Returns the part of the date/time as a number.
SELECT date_part('year', '2019-01-01') FROM source;
2019
Supported date parts are:
year
, month
, week
, day
, hour
, minute
, second
, millisecond
, microsecond
, nanosecond
, dow
, doy
date_trunc
Truncates the date/time to the given precision.
SELECT date_trunc('day', '2019-01-01 12:34:56') FROM source;
'2019-01-01 00:00:00'
Supported precisions are:
year
, month
, week
, day
, hour
, minute
, second
to_timestamp_seconds
Converts the given string to a timestamp with seconds.
SELECT to_timestamp_seconds('2019-01-01 12:34:56') FROM source;
'2019-01-01 12:34:56'
to_timestamp_millis
Converts the given string to a timestamp with milliseconds.
SELECT to_timestamp_millis('2019-01-01 12:34:56.789') FROM source;
'2019-01-01 12:34:56.789'
to_timestamp_micros
Converts the given string to a timestamp with microseconds.
SELECT to_timestamp_micros('2019-01-01 12:34:56.789012') FROM source;
'2019-01-01 12:34:56.789012'
to_timestamp
Converts the given string to a timestamp with nanosecond precision
SELECT to_timestamp('2019-01-01 12:34:56') FROM source;
'2019-01-01T12:34:56.000000000Z'
from_unixtime
Converts an integer to RFC3339 timestamp format. Input is parsed as nanoseconds since the UNIX Epoch.
SELECT from_unixtime(1546303496000000000) FROM source;
'2018-12-31T16:44:56.000000000Z'
JSON Functions
The JSON functions provide basic json parsing functions using JsonPath an evolving standard for querying JSON objects.
extract_json
Returns the JSON elements in the first argument that match the JsonPath in the second argument. The returned value is an array of json strings.
SELECT extract_json('{"a": 1, "b": 2, "c": 3}', '$.a') FROM source;
['1']
extract_json_string
Returns an unescaped String for the first item matching the JsonPath, if it is a string.
SELECT extract_json_string('{"a": "a", "b": 2, "c": 3}', '$.a') FROM source;
'a'
Regex Functions
Arroyo supports several regex functions. Regex functions are implemented using the regex crate and support the syntax documented there.
For more complex regex operations, consider using the regex library directly via a Rust UDF.
regexp_match
Returns capture groups (parenthesized expressions) of the first match of a regular expression on the string as an array. If no capture groups are present, the entire match is returned.
SELECT regexp_match('https://www.example.com/page/path', '(https?)://(^[/]+)/(.*)') FROM source;
['https', 'www.example.com', 'page/path']
regexp_replace
Replaces all matches of the regex in the string with the replacement string.
SELECT regexp_replace('this is a sentence', '\s+', '_') FROM source;
'this_is_a_sentence'