Overview
While the single-node Arroyo cluster is useful for testing and development, it is not suitable for production. This page describes how to run a production-ready distributed Arroyo cluster using Arroyo’s built-in scheduler or Kubernetes.
Before attempting to run a cluster, you should familiarize yourself with the Arroyo architecture. We are also happy to support users rolling out their own clusters, so please reach out to us at support@arroyo.systems or on Discord with any questions.
Common Setup
Section titled “Common Setup”Database
Section titled “Database”The Arroyo control plane relies on a database to store its configuration and metadata (like the set of existing tables, the pipelines that are meant to be running, etc.) and to power the API and Web UI. As of 0.11, two databases are supported: Sqlite and Postgres.
Sqlite is recommended for local use and single-node deployments, while postgres should be used for scaled-out production deployments on Kubernetes.
See the database configuration options.
Storage
Section titled “Storage”You will need a place to store pipeline artifacts (binaries) and checkpoint
data. This needs to be accessible to all nodes in your cluster, including
Arroyo the control plane and pipeline workers. Arroyo supports several storage
backends, including S3 and S3-compatible object stores, R2, GCS, ABS, and local
filesystem. For local testing, a filesystem that’s3 mounted on all nodes is
sufficient, but for production you will likely want to use an object store like
S3 or GCS. We also support S3-compatible object stores like MinIO and
Localstack; endpoints can be set via the s3:: prefix or the
AWS_ENDPOINT_URL environment variable.
The storage backend is configured by the following config properties:
checkpoint-url(env var:ARROYO__CHECKPOINT_URL) configures where checkpoints are written; for high-availability this should be an object store, but may be a local directory for testing and developingcompiler.artifact-url(env var:ARROYO__COMPILER__ARTIFACT_URL) controls where compiled UDF libs are stored
The values for these variables are URLs that specify the storage location. We support a number of ways of specifying these, for example:
s3://my-bucket/key/paths3::https://my-custom-s3:1234/my-bucket/key/pathr2://accountid@my-bucket/my-pathhttps://s3.us-east-1.amazonaws.com/my-bucketfile:///my/local/filesystem/my/local/filesystemgs://my-gcs-bucketabfs://container@account.dfs.core.windows.net/path
Reverse Proxy Configuration
Section titled “Reverse Proxy Configuration”If you’re running the Arroyo Web UI behind a reverse proxy on a non-root path (e.g., https://myapp.com/arroyo/
instead of https://arroyo.myapp.com/), you’ll need to configure your reverse proxy to pass the base path
to Arroyo.
The Web UI determines its base path from the X-Arroyo-Basename header. Your reverse proxy should set this
header to the base path where Arroyo is mounted.
IPv6 Support
Section titled “IPv6 Support”Arroyo supports IPv6 networking for all service-to-service communication. Workers will automatically
detect and use IPv6 addresses when the configured bind-address is an IPv6 address.
To enable IPv6, set the bind address to an IPv6 address in your configuration:
[controller]bind-address = '::0'
[api]bind-address = '::0'
[admin]bind-address = '::0'
[compiler]bind-address = '::0'Arroyo will automatically handle IPv6 addresses in worker-to-worker communication and service discovery.