Connect Arroyo to external systems
Name | Source | Sink | Lookup |
---|---|---|---|
Blackhole | No | Yes | No |
Confluent Cloud | Yes | Yes | No |
Filesystem | Yes | Yes | No |
Fluvio | Yes | Yes | No |
Impulse | Yes | No | No |
Kafka | Yes | Yes | No |
Kinesis | Yes | Yes | No |
MySQL | Yes | Yes | No |
MQTT | Yes | Yes | No |
NATS | Yes | Yes | No |
Nexmark | Yes | No | No |
Polling HTTP | Yes | No | No |
Postgres | Yes | Yes | No |
RabbitMQ | Yes | No | No |
Redis | No | Yes | Yes |
Redpanda | Yes | Yes | No |
Server-Sent Events | Yes | No | No |
Webhook | No | Yes | No |
WebSocket | Yes | No | No |
CREATE TABLE
statements in SQL, as described
in the DDL docs.
A particular connection table is either a source (meaning it reads data from
an external system) or a sink (meaning it writes data). Some connectors support
only one of these, while others support both.
See the individual connector docs for details on on their capabilities and how
to configure them in Arroyo.
fieldname
is a TIMESTAMP field on the table, and watermark_expr
is a SQL expression
over the table’s fields that produces a timestamp which will generate the watermark. For example:
ts
field, and a watermark with a fixed
5-second delay (i.e., we’ll consider data late once it’s 5 seconds older than the most recently
received data).
connection_profile
option in the WITH
clause.
Alternately, SQL users can manually specify the various options via the WITH clause.
format
option
when creating a SQL table.
See the format docs for more details on how to configure each format.
none
, which means that the source will emit a
single record for each message it reads.
Framing is configured via the framing
option when creating a SQL table. Currently
only newline
is supported, which splits the input on newlines.
You may also set framing.newline.max_length
to a number of bytes to limit the
maximum length of a single record. Records that exceed this length will be truncated.
For example:
bad_data
option allows you to configure how Arroyo handles records that fail
to parse.
fail
: fail the job (default)drop
: drop the record and continue processingMETADATA FROM key
syntax:
metadata
function; in this example we use
the offset_id
and partition
fields exposed by the Kafka connector.
See the individual connector docs for supported metadata fields.
idle_micros
options, or disabled
by setting it to -1
.
A special case of idleness occurs when there are more Arroyo source tasks than partitions
(for example, a Kafka topic with 4 partitions read by 8 Arroyo tasks). This means that
some tasks will never receive data, and so will never advance their watermarks. This can
occur as well for non-partitioned sources like WebSocket, where only a single task is
able to read data.
To address this, source tasks with no work assigned to them will mark themselves as idle
immediately.
{{ }}
) to indicate a substitution.
For example, in the Web UI: