Regex functions

Arroyo’s Scalar function implementations are based on Apache DataFusion and these docs are derived from the DataFusion function reference. Arroyo uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences).

`regexp_like`

Returns true if a regular expression has at least one match in a string, false otherwise.

regexp_like(str, regexp[, flags])

Arguments

str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
regexp: Regular expression to test against the string expression. Can be a constant, column, or function.
flags: Optional regular expression flags that control the behavior of the regular expression. The following flags are supported:
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?

Example

select regexp_like('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+--------------------------------------------------------+
| regexp_like(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+--------------------------------------------------------+
| true                                                   |
+--------------------------------------------------------+
SELECT regexp_like('aBc', '(b|d)', 'i');
+--------------------------------------------------+
| regexp_like(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+--------------------------------------------------+
| true                                             |
+--------------------------------------------------+

Additional examples can be found here

`regexp_match`

Returns a list of regular expression matches in a string.

regexp_match(str, regexp[, flags])

Arguments

str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
regexp: Regular expression to match against. Can be a constant, column, or function.
flags: Optional regular expression flags that control the behavior of the regular expression. The following flags are supported:
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?

Example

select regexp_match('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+---------------------------------------------------------+
| regexp_match(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+---------------------------------------------------------+
| [Köln]                                                  |
+---------------------------------------------------------+
SELECT regexp_match('aBc', '(b|d)', 'i');
+---------------------------------------------------+
| regexp_match(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+---------------------------------------------------+
| [B]                                               |
+---------------------------------------------------+

Additional examples can be found here

`regexp_replace`

Replaces substrings in a string that match a regular expression.

regexp_replace(str, regexp, replacement[, flags])

Arguments

str: String expression to operate on. Can be a constant, column, or function, and any combination of string operators.
regexp: Regular expression to match against. Can be a constant, column, or function.
replacement: Replacement string expression. Can be a constant, column, or function, and any combination of string operators.
flags: Optional regular expression flags that control the behavior of the regular expression. The following flags are supported:
- g: (global) Search globally and don’t return after the first match
- i: case-insensitive: letters match both upper and lower case
- m: multi-line mode: ^ and $ match begin/end of line
- s: allow . to match \n
- R: enables CRLF mode: when multi-line mode is enabled, \r\n is used
- U: swap the meaning of x* and x*?

Example

SELECT regexp_replace('foobarbaz', 'b(..)', 'X\\1Y', 'g');
+------------------------------------------------------------------------+
| regexp_replace(Utf8("foobarbaz"),Utf8("b(..)"),Utf8("X\1Y"),Utf8("g")) |
+------------------------------------------------------------------------+
| fooXarYXazY                                                            |
+------------------------------------------------------------------------+
SELECT regexp_replace('aBc', '(b|d)', 'Ab\\1a', 'i');
+-------------------------------------------------------------------+
| regexp_replace(Utf8("aBc"),Utf8("(b|d)"),Utf8("Ab\1a"),Utf8("i")) |
+-------------------------------------------------------------------+
| aAbBac                                                            |
+-------------------------------------------------------------------+

Additional examples can be found here

`position`

Returns the position of substr in origstr (counting from 1). If substr does not appear in origstr, return 0.

position(substr in origstr)

Arguments

substr: The pattern string.
origstr: The model string.

Home

Tutorial

Sources and Sinks

SQL Reference

User-Defined Functions

Deployment

Arroyo Development

`regexp_like`

`regexp_match`

`regexp_replace`

`position`

Home

Tutorial

Sources and Sinks

SQL Reference

User-Defined Functions

Deployment

Arroyo Development

​regexp_like

​regexp_match

​regexp_replace

​position

`regexp_like`

`regexp_match`

`regexp_replace`

`position`