String functions
Scalar functions for manipulating strings
Arroyo’s Scalar function implementations are based on Apache DataFusion and these docs are derived from the DataFusion function reference.
ascii
Returns the ASCII value of the first character in a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: chr
bit_length
Returns the bit length of a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: length, octet_length
btrim
Trims the specified trim string from the start and end of a string. If no trim string is provided, all whitespace is removed from the start and end of the input string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- trim_str: String expression to trim from the beginning and end of the input string. Can be a constant, column, or function, and any combination of arithmetic operators. Default is whitespace characters.
Related functions: ltrim, rtrim
Aliases
- trim
char_length
Alias of length.
character_length
Alias of length.
concat
Concatenates multiple strings together.
Arguments
- str: String expression to concatenate. Can be a constant, column, or function, and any combination of string operators.
- str_n: Subsequent string column or literal string to concatenate.
Related functions: concat_ws
concat_ws
Concatenates multiple strings together with a specified separator.
Arguments
- separator: Separator to insert between concatenated strings.
- str: String expression to concatenate. Can be a constant, column, or function, and any combination of string operators.
- str_n: Subsequent string column or literal string to concatenate.
Related functions: concat
chr
Returns the character with the specified ASCII or Unicode code value.
Arguments
- expression: Expression containing the ASCII or Unicode code value to operate on. Can be a constant, column, or function, and any combination of arithmetic or string operators.
Related functions: ascii
ends_with
Tests if a string ends with a substring.
Arguments
- str: String expression to test. Can be a constant, column, or function, and any combination of string operators.
- substr: Substring to test for.
initcap
Capitalizes the first character in each word in the input string. Words are delimited by non-alphanumeric characters.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: lower, upper
instr
Alias of strpos.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- substr: Substring expression to search for. Can be a constant, column, or function, and any combination of string operators.
left
Returns a specified number of characters from the left side of a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- n: Number of characters to return.
Related functions: right
length
Returns the number of characters in a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Aliases
- char_length
- character_length
Related functions: bit_length, octet_length
lower
Converts a string to lower-case.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: initcap, upper
lpad
Pads the left side of a string with another string to a specified string length.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- n: String length to pad to.
- padding_str: String expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.
Related functions: rpad
ltrim
Trims the specified trim string from the beginning of a string. If no trim string is provided, all whitespace is removed from the start of the input string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- trim_str: String expression to trim from the beginning of the input string. Can be a constant, column, or function, and any combination of arithmetic operators. Default is whitespace characters.
Related functions: btrim, rtrim
octet_length
Returns the length of a string in bytes.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: bit_length, length
repeat
Returns a string with an input string repeated a specified number.
Arguments
- str: String expression to repeat. Can be a constant, column, or function, and any combination of string operators.
- n: Number of times to repeat the input string.
replace
Replaces all occurrences of a specified substring in a string with a new substring.
Arguments
- str: String expression to repeat. Can be a constant, column, or function, and any combination of string operators.
- substr: Substring expression to replace in the input string. Can be a constant, column, or function, and any combination of string operators.
- replacement: Replacement substring expression. Can be a constant, column, or function, and any combination of string operators.
reverse
Reverses the character order of a string.
Arguments
- str: String expression to repeat. Can be a constant, column, or function, and any combination of string operators.
right
Returns a specified number of characters from the right side of a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- n: Number of characters to return.
Related functions: left
rpad
Pads the right side of a string with another string to a specified string length.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- n: String length to pad to.
- padding_str: String expression to pad with. Can be a constant, column, or function, and any combination of string operators. Default is a space.
Related functions: lpad
rtrim
Trims the specified trim string from the end of a string. If no trim string is provided, all whitespace is removed from the end of the input string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- trim_str: String expression to trim from the end of the input string. Can be a constant, column, or function, and any combination of arithmetic operators. Default is whitespace characters.
Related functions: btrim, ltrim
split_part
Splits a string based on a specified delimiter and returns the substring in the specified position.
Arguments
- str: String expression to spit. Can be a constant, column, or function, and any combination of string operators.
- delimiter: String or character to split on.
- pos: Position of the part to return.
starts_with
Tests if a string starts with a substring.
Arguments
- str: String expression to test. Can be a constant, column, or function, and any combination of string operators.
- substr: Substring to test for.
strpos
Returns the starting position of a specified substring in a string. Positions begin at 1. If the substring does not exist in the string, the function returns 0.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- substr: Substring expression to search for. Can be a constant, column, or function, and any combination of string operators.
Aliases
- instr
substr
Extracts a substring of a specified number of characters from a specific starting position in a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- start_pos: Character position to start the substring at. The first character in the string has a position of 1.
- length: Number of characters to extract. If not specified, returns the rest of the string after the start position.
translate
Translates characters in a string to specified translation characters.
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- chars: Characters to translate.
- translation: Translation characters. Translation characters replace only characters at the same position in the chars string.
to_hex
Converts an integer to a hexadecimal string.
Arguments
- int: Integer expression to convert. Can be a constant, column, or function, and any combination of arithmetic operators.
trim
Alias of btrim.
upper
Converts a string to upper-case.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
Related functions: initcap, lower
uuid
Returns UUID v4 string value which is unique per row.
overlay
Returns the string which is replaced by another string from the specified position and specified count length.
For example, overlay('Txxxxas' placing 'hom' from 2 for 4) → Thomas
Arguments
- str: String expression to operate on.
- substr: the string to replace part of str.
- pos: the start position to replace of str.
- count: the count of characters to be replaced from start position of str. If not specified, will use substr length instead.
levenshtein
Returns the Levenshtein distance between the two given strings.
For example, levenshtein('kitten', 'sitting') = 3
Arguments
- str1: String expression to compute Levenshtein distance with str2.
- str2: String expression to compute Levenshtein distance with str1.
substr_index
Returns the substring from str before count occurrences of the delimiter delim.
If count is positive, everything to the left of the final delimiter (counting from the left) is returned.
If count is negative, everything to the right of the final delimiter (counting from the right) is returned.
For example, substr_index('www.apache.org', '.', 1) = www
, substr_index('www.apache.org', '.', -1) = org
Arguments
- str: String expression to operate on.
- delim: the string to find in str to split str.
- count: The number of times to search for the delimiter. Can be both a positive or negative number.
find_in_set
Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings.
For example, find_in_set('b', 'a,b,c,d') = 2
Arguments
- str: String expression to find in strlist.
- strlist: A string list is a string composed of substrings separated by , characters.