DDL Statements
In addition to its SELECT capabilities Arroyo supports several Data Description Language (DDL) capabilities.
CREATE TABLE
Section titled “CREATE TABLE”Arroyo’s CREATE TABLE statements come in three flavors: Create Table As, Memory Table and Connection Table.
CREATE TABLE AS
Section titled “CREATE TABLE AS”This command creates a table from the query included in it, e.g.
CREATE TABLE orders AS SELECT customer_id, order_id FROM orders;The schema is inferred from the query and other queries within the same context can select from the new table.
CREATE TABLE (In-Memory)
Section titled “CREATE TABLE (In-Memory)”CREATE TABLE statements without any connection info are presumed to be in
memory. It can be written to within the same query context and then read from. A
memory table may be used in only one INSERT INTO statement, but can then be
selected from multiple times. For example, you could create an orders table
with a statement like
CREATE TABLE orders (customer_id INT, order_id INT);CREATE VIEW is simply an alias for creating a memory table.
CREATE TABLE (Connection)
Section titled “CREATE TABLE (Connection)”Connection tables allow Arroyo to read and write to external systems like Kafka
clusters. Connection tables may be used as sources or sinks depending on
the type of connection. For details on all of the supported connectors, see the
Connectors docs. Connection tables can be created via the
Connections tabs of the Web UI, or directly in SQL via the CREATE TABLE
statement.
Connection tables are created via special CREATE TABLE statements that include
a WITH clause. The WITH clause specifies the connector, the
format that the data is encoded with, and various other
options that are specific to the connector, as documented on the individual
connector pages. The general form of the statement is:
CREATE TABLE <table name> ( [<field name> <field type>,]* [WATERMARK FOR <field_name> [AS <expression>]]) WITH ( connector = '<connector name>', format = '<format name>', [format options] [connector options])where connector is one of the connectors documented here and
format is one of of the supported formats.
For example, to create a Kafka source for the topic order_topic:
CREATE TABLE orders ( customer_id INT, order_id INT, date_string TEXT) WITH ( connector = 'kafka', format = 'json', type = 'source', bootstrap_servers = 'localhost:9092', topic = 'order_topic');For full details on how to create connection tables, see the connector docs.
Schema inference
Section titled “Schema inference”When creating a connection table, you can specify the schema explicitly by listing fields in the CREATE TABLE statement, or you can let Arroyo infer the schema from how it’s used. This is mostly relevant for sinks, where the schema can be inferred from the query that writes to the table.
For example, a Kafka sink could be created like this
CREATE TABLE results WITH ( connector = 'kafka', format = 'json', type = 'sink', bootstrap_servers = 'localhost:9092', topic = 'results');and written to like this
INSERT INTO resultsSELECT customer_id as customer_id, count(*) as countFROM ordersGROUP BY customer_id, TUMBLE(INTERVAL '1' HOUR);That will result in records like this being written to the results topic
{"customer_id": 1, "count": 10}{"customer_id": 2, "count": 5}Schema-inferred sinks can also be created in the Web UI by selecting the “infer schema” option when creating the table.
Note that when relying on schema inference, the column names will be determined
by the query, so you will generally want to alias them using as to ensure they
are what you expect.
Options
Section titled “Options”Connection tables allow you to configure a number of options that specify and
modify the behavior. They are specified via the WITH clause, with an unquoted
key and a single or double-quoted value. The following options are supported
across all connections. Specific connections have their own options. To see all
of the supported options, refer to the Connector docs.
| Option | required | Description |
|---|---|---|
connector | yes | The name of the connector to use. |
format | if connector does not have a built-in format | The format of the data to be deserialized. |
idle_micros | no | The number of microseconds to wait before considering a source idle. Defaults to 30000000 (30 seconds). Set to -1 to disable source idleness. |
Virtual Fields
Section titled “Virtual Fields”Virtual fields can be created within the CREATE TABLE statement. These are
done using the GENERATED ALWAYS AS (expression) syntax. expression must be a
valid Arroyo SQL expression that only depends on non-virtual fields within the
table. For example
CREATE TABLE events ( id TEXT, event_type TEXT, user_id TEXT, key TEXT GENERATED ALWAYS AS (concat(id, '-', user_id))) WITH ( ...)INSERT INTO
Section titled “INSERT INTO”Arroyo supports INSERT INTO statements for both memory and connection tables. In
line with standard SQL the insertion will happen column-wise, attempting
coercion to the SQL types. For example, if you have a memory table orders with
columns customer_id and order_id you could insert into it with a statement
like
INSERT INTO orders SELECT customer, order FROM source_table;If the table is a connection table this will result in a sink, otherwise it will be a memory table that can then be read from.