Common Setup
Database
The Arroyo control plane relies on a database to store its configuration and metadata (like the set of existing tables, the pipelines that are meant to be running, etc.) and to power the API and Web UI. As of 0.11, two databases are supported: Sqlite and Postgres. Sqlite is recommended for local use and single-node deployments, while postgres should be used for scaled-out production deployments on Kubernetes. See the database configuration options.Storage
You will need a place to store pipeline artifacts (binaries) and checkpoint data. This needs to be accessible to all nodes in your cluster, including Arroyo the control plane and pipeline workers. Arroyo supports several storage backends, including S3, GCS, ABS, and local filesystem. For local testing, a filesystem that’s3 mounted on all nodes is sufficient, but for production you will likely want to use an object store like S3 or GCS. We also support S3-compatible object stores like MinIO and Localstack; endpoints can be set via thes3::
prefix
or the AWS_ENDPOINT_URL
environment variable.
The storage backend is configured by the following config properties:
checkpoint-url
(env var:ARROYO__CHECKPOINT_URL
) configures where checkpoints are written; for high-availability this should be an object store, but may be a local directory for testing and developingcompiler.artifact-url
(env var:ARROYO__COMPILER__ARTIFACT_URL
) controls where compiled UDF libs are stored
s3://my-bucket/key/path
s3::https://my-custom-s3:1234/my-bucket/key/path
https://s3.us-east-1.amazonaws.com/my-bucket
file:///my/local/filesystem
/my/local/filesystem
gs://my-gcs-bucket