Use user-defined functions to parse custom formats
select *
query.
host.docker.internal
hostname is a special hostname that Docker for Mac sets up to allow
containers to access services running on the host machine. If you’re running Arroyo
directly on your machine or in Docker on Linux, you should use localhost
instead.Error while reading from EventSource
, that means
Arroyo was not able to connect to the SSE server. Make sure the server is running and that
you have the correct hostname and port.dependencies
comment at the top is used to specify the Rust
crates (libraries) that are used by the UDF, specified in the same format as
a Cargo.toml file.Option<String>
, which
allows us to filter out invalid records that fail parsing. If you instead
wanted to fail processing on invalid records, you could return a String
instead and use unwrap()
instead of ?
to handle errors.parse_log
. Once you edit the name, the UI will automatically update to reflect the new name.OnceLock
is used to ensure that the regex is only compiled once, and
then reused on subsequent calls. This is very important for performance, as
the cost of compiling the regex is much higher than the cost of running it.
This one optimization can improve performance by 10x or more.unwrap()
—which is the panicking form of error handling in Rust—in
various places in the UDF. However, in each case we use it we know that the
operation cannot fail. For example, we know that if the Regex successfully
matched, then the capture group must exist, so we can safely call unwrap()
on it. Similarly, the regex has already validated that the status and bytes
fields are valid integers, so we can safely call unwrap()
on the
string-to-integer conversions.json!
macro from the serde_json library is a very convenient way to
construct JSON in Rust, allowing you to use a syntax that looks very similar
to the output JSON.extract_json_string
function to extract the timestamp from the JSON string.parse_log
call in the event_time
column.