How Arroyo works
source [0]
and source [1]
represents two subtasks of the source task (and similarly for the join).
Each subtask may be scheduled independently. If two communicating subtasks are scheduled on the same worker, the
dataflow is performed via in-memory queues; otherwise the dataflow is performed via Arroyo’s network stack which forms
a logical MxN connection between every communicating subtask.