@udf
decorator.
Python UDFs are available in Arroyo 0.12.0 and later. Currently only scalar UDFs
are supported in Python, although support for UDAFs and async UDFs is planned for
future releases.
Here’s an example of a simple UDF that squares an integer:
UDF signature
The signature of the UDF (the function name, parameters, and return type) is determined automatically from the definition of the function annotated with@udf
. The function must be a valid Python function, and the parameters and
return type must be Python types that have a SQL mapping. For the full list of
supported types, see the SQL data types.
In order to determine the types of the UDF parameters and return value, Arroyo
expects Python type hints.
Note that because Python does not have as many numeric types as SQL,
multiple SQL types may map to the same Python type. For example, INT
and
BIGINT
both map to int
, and FLOAT
and DOUBLE
both map to float
.
Nullability
SQL values are generally allowed to be null. How null values interact with your UDF is controlled via the type signature of the UDF parameters and return types. If a parameter is anOptional
type (for example Optional[int]
), then it will
be called with all inputs, even if they are NULL
. If the parameter is not an
Optional
type (for example int
), then it will only be called with non-NULL
inputs.
Similarly, if the return type is an Optional
type, then the output type is
nullable, otherwise it is non-nullable.
In table form:
Input | Parameter type | Return type | Called on | Nullability |
---|---|---|---|---|
Nullable | T | T | Non-null values | Nullable |
Nullable | Optional[T] | T | All values | Non-null |
Nullable | T | Optional[T] | Non-null values | Nullable |
Nullable | Optional[T] | Optional[T] | All values | Nullable |
Non-null | T | T | All values | Non-null |
Non-null | Optional[T] | T | All values | Non-null |
Non-null | T | Optional[T] | All values | Nullable |
Non-null | Optional[T] | Optional[T] | All values | Nullable |
Optional
type is from the Python typing
module and must be
imported to use, like this: