Skip to main content
While the single-node Arroyo cluster is useful for testing and development, it is not suitable for production. This page describes how to run a production-ready distributed Arroyo cluster using Arroyo’s built-in scheduler or Kubernetes. Before attempting to run a cluster, you should familiarize yourself with the Arroyo architecture. We are also happy to support users rolling out their own clusters, so please reach out to us at [email protected] or on Discord with any questions.

Common Setup

Database

The Arroyo control plane relies on a database to store its configuration and metadata (like the set of existing tables, the pipelines that are meant to be running, etc.) and to power the API and Web UI. As of 0.11, two databases are supported: Sqlite and Postgres. Sqlite is recommended for local use and single-node deployments, while postgres should be used for scaled-out production deployments on Kubernetes. See the database configuration options.

Storage

You will need a place to store pipeline artifacts (binaries) and checkpoint data. This needs to be accessible to all nodes in your cluster, including Arroyo the control plane and pipeline workers. Arroyo supports several storage backends, including S3 and S3-compatible object stores, R2, GCS, ABS, and local filesystem. For local testing, a filesystem that’s3 mounted on all nodes is sufficient, but for production you will likely want to use an object store like S3 or GCS. We also support S3-compatible object stores like MinIO and Localstack; endpoints can be set via the s3:: prefix or the AWS_ENDPOINT_URL environment variable. The storage backend is configured by the following config properties:
  • checkpoint-url (env var: ARROYO__CHECKPOINT_URL) configures where checkpoints are written; for high-availability this should be an object store, but may be a local directory for testing and developing
  • compiler.artifact-url (env var: ARROYO__COMPILER__ARTIFACT_URL) controls where compiled UDF libs are stored
The values for these variables are URLs that specify the storage location. We support a number of ways of specifying these, for example:
  • s3://my-bucket/key/path
  • s3::https://my-custom-s3:1234/my-bucket/key/path
  • r2://accountid@my-bucket/my-path
  • https://s3.us-east-1.amazonaws.com/my-bucket
  • file:///my/local/filesystem
  • /my/local/filesystem
  • gs://my-gcs-bucket
  • abfs://[email protected]/path

Reverse Proxy Configuration

If you’re running the Arroyo Web UI behind a reverse proxy on a non-root path (e.g., https://myapp.com/arroyo/ instead of https://arroyo.myapp.com/), you’ll need to configure your reverse proxy to pass the base path to Arroyo. The Web UI determines its base path from the X-Arroyo-Basename header. Your reverse proxy should set this header to the base path where Arroyo is mounted.

IPv6 Support

Arroyo supports IPv6 networking for all service-to-service communication. Workers will automatically detect and use IPv6 addresses when the configured bind-address is an IPv6 address. To enable IPv6, set the bind address to an IPv6 address in your configuration:
[controller]
bind-address = '::0'

[api]
bind-address = '::0'

[admin]
bind-address = '::0'

[compiler]
bind-address = '::0'
Arroyo will automatically handle IPv6 addresses in worker-to-worker communication and service discovery.