Write UDFs in Python
@udf
decorator.
Python UDFs are available in Arroyo 0.12.0 and later. Currently only scalar UDFs
are supported in Python, although support for UDAFs and async UDFs is planned for
future releases.
Here’s an example of a simple UDF that squares an integer:
@udf
. The function must be a valid Python function, and the parameters and
return type must be Python types that have a SQL mapping. For the full list of
supported types, see the SQL data types.
In order to determine the types of the UDF parameters and return value, Arroyo
expects Python type hints.
Note that because Python does not have as many numeric types as SQL,
multiple SQL types may map to the same Python type. For example, INT
and
BIGINT
both map to int
, and FLOAT
and DOUBLE
both map to float
.
Optional
type (for example Optional[int]
), then it will
be called with all inputs, even if they are NULL
. If the parameter is not an
Optional
type (for example int
), then it will only be called with non-NULL
inputs.
Similarly, if the return type is an Optional
type, then the output type is
nullable, otherwise it is non-nullable.
In table form:
Input | Parameter type | Return type | Called on | Nullability |
---|---|---|---|---|
Nullable | T | T | Non-null values | Nullable |
Nullable | Optional[T] | T | All values | Non-null |
Nullable | T | Optional[T] | Non-null values | Nullable |
Nullable | Optional[T] | Optional[T] | All values | Nullable |
Non-null | T | T | All values | Non-null |
Non-null | Optional[T] | T | All values | Non-null |
Non-null | T | Optional[T] | All values | Nullable |
Non-null | Optional[T] | Optional[T] | All values | Nullable |
Optional
type is from the Python typing
module and must be
imported to use, like this: