Tumbling Windows
Tumbling windows are consecutive, non-overlapping windows of a fixed size. Usually that size will be some human-friendly time unit like minutes or hours, but that isn’t a requirement. Whereas in normal SQL you might group by adate_trunc
call,
in streaming SQL you’d use a tumbling window.
In Arroyo windowing is enabled via special UDFs, in this case TUMBLE()
.
For example, to get the number of distinct auction IDs across bids for each minute,
you’d write a query like
TUMBLE()
function.
The resulting records will have a timestamp of
the end of the window minus 1 nanosecond.
Sliding Windows
Sliding windows are an extension of tumbling windows, with the addition of a “slide”. It is defined by two time durations, a width for each window and a slide that designates the time between the start of consecutive windows. Typically the slide is less than the window. A sliding window can be used to provide a view of data over some lookback time (the width), updated with some frequency (the slide). In Arroyo theHOP()
function is used to create sliding windows.
It takes two arguments, the first is the slide and the second is the window size.
For example, to get the number of distinct auction IDs across bids for the previous minute every second,
you’d write a query like
Session Windows
Session windows are non-fixed-width windows that are defined by a gap in activity. For example, a session window with gap size 30 minutes, defined on a stream of user clicks might be would aggregate over all clicks that occur within 30 minutes of each other. Once there has been a 30 minute gap for a given user, the session window would close and a new one would be opened for the next click. In Arroyo theSESSION()
function is used to create session windows.
For example, to get the number of distinct auction IDs across bids for each session,