SELECT Statements
Select statements are used to manipulate data in Arroyo. The general form of that statement is:
[WITH with_query [, ...]]SELECT select_expr [, ...]FROM from_item[JOIN join_item [, ...]][WHERE condition][GROUP BY grouping_element [, ...]][HAVING condition]WITH clause
Section titled “WITH clause”The with clauses allow you to give names to subquery which you can then reference. The syntax for a with clause is:
WITH query_name AS (subquery) [,...]For example, using the nexmark source, you can create datasets for bids and price and then join.
WITH bids AS (SELECT bid.auction AS auction, bid.price AS price FROM nexmark where bid is not null),auctions AS (SELECT auction.id AS id FROM nexmark where auction is not null)SELECT * FROM bids bids JOIN auctions auctions ON bids.auction = auctions.id;SELECT clause
Section titled “SELECT clause”The select cause is a comma-separated list of expressions, with an optional alias.
Column names must be unique.
SELECT select_expr [, ...]FROM clause
Section titled “FROM clause”The FROM clause specifies the primary source of data.
It will be either a table name or subquery.
The table name can be either a saved source,
a table created in the WITH clause or a table created via CREATE TABLE and inserted into.
Tables can be given aliases, but will default to their name as the alias for things like joins.
FROM from_itemJOIN clause
Section titled “JOIN clause”The JOIN clause allows you to join multiple tables together.
See the join documentation for more details.
WHERE clause
Section titled “WHERE clause”The WHERE clause allows you to filter the data with a boolean condition.
This predicate is applied to the incoming rows, so cannot include conditions on the resulting columns.
WHERE conditionGROUP BY clause
Section titled “GROUP BY clause”The GROUP BY clause is used to compute aggregates over some set of fields.
All GROUP BY queries will implicitly include a time window,
and if the input doesn’t already have a time window,
it should be specified as one of the grouping fields.
For example,
SELECT count(*) AS bids, count(distinct auction_id) AS distinct_auctions, tumble(interval '1 minute') AS windowFROM BIDS GROUP BY 3HAVING clause
Section titled “HAVING clause”The HAVING clause allows filtering on the result of aggregations (as opposed to WHERE, which filters on the inputs to the aggregation). For example:
SELECT user_id, count(*) as countFROM eventsWHERE event_type = 'pageview'GROUP BY user_idHAVING count > 5;This query counts pageview events by user, then returns those users with more than 5 pageviews.
UNNEST operator
Section titled “UNNEST operator”The UNNEST operator allows you to unnest arrays into multiple rows. This can be used
as a normal scalar function with the following restrictions:
- It may only appear in the
SELECTclause - Only one array may be unnested per select statement For example,
SELECT UNNEST(make_array(1, 2, 3)) as xFROM BIDS;which will produce the following output:
+---+| x |+---+| 1 || 2 || 3 |+---+