Regex functions
Scalar functions for regular expressions
Arroyo’s Scalar function implementations are based on Apache DataFusion and these docs are derived from the DataFusion function reference.
Arroyo uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences).
regexp_like
Returns true if a regular expression has at least one match in a string, false otherwise.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- regexp: Regular expression to test against the string expression. Can be a constant, column, or function.
- flags: Optional regular expression flags that control the behavior of the
regular expression. The following flags are supported:
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?
Example
Additional examples can be found here
regexp_match
Returns a list of regular expression matches in a string.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- regexp: Regular expression to match against. Can be a constant, column, or function.
- flags: Optional regular expression flags that control the behavior of the
regular expression. The following flags are supported:
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?
Example
Additional examples can be found here
regexp_replace
Replaces substrings in a string that match a regular expression.
Arguments
- str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
- regexp: Regular expression to match against. Can be a constant, column, or function.
- replacement: Replacement string expression. Can be a constant, column, or function, and any combination of string operators.
- flags: Optional regular expression flags that control the behavior of the
regular expression. The following flags are supported:
- g: (global) Search globally and don’t return after the first match
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?
Example
Additional examples can be found here
position
Returns the position of substr
in origstr
(counting from 1). If substr
does
not appear in origstr
, return 0.
Arguments
- substr: The pattern string.
- origstr: The model string.