Skip to content

Tutorials

These tutorials walk through building real pipelines with Arroyo, from your first query to integrating with external systems and writing custom logic.

If you’re just getting started, work through them in order. Otherwise, jump to whichever one matches what you’re trying to build.

Build a simple pipeline against a generated data source. This is the best place to start if you’ve never used Arroyo before—you’ll learn how to create sources, run queries, and inspect results in the Web UI.

Connect Arroyo to a Kafka cluster, create sources from topics, and process real streaming data. Covers setting up a local broker, generating sample data, and defining Kafka-backed tables in SQL.

Use Debezium to stream changes out of an operational database (Postgres or MySQL) and process them with Arroyo. A practical introduction to change data capture (CDC) as a source for streaming analytics.

Build a real-time trending-hashtags pipeline over the public Mastodon firehose. Demonstrates Server-Sent Event sources, JSON parsing, windowed aggregations, and unnest operations on a real-world data stream.

Extend Arroyo with user-defined functions to parse custom data formats. A great follow-on once you’re comfortable with basic pipelines and want to handle data that isn’t JSON, Avro, or Parquet.