Getting started
Setting up Arroyo locally
Arroyo ships as a single, self-contained binary for Linux and MacoS or as a docker container. It can be run in two modes: as a persistent multi-job session cluster, or a single-job pipeline cluster. When run locally or in Docker, the default sqlite database is used to store configuration data.
Starting a cluster
Locally
The easiest way to try out Arroyo is to run it locally. Currently Linux and MacOS are well supported.
For MacOS, we provide a Homebrew tap that can be used to install Arroyo:
brew install arroyosystems/tap/arroyo
For MacOS and Linux, you can also install the binary with the following script:
curl -LsSf https://arroyo.dev/install.sh | sh
Alternatively, you can download the binary for your OS and architecture from the releases page.
Once you’ve installed Arroyo, you can run it with the arroyo
command:
$ arroyo --help
Usage: arroyo [OPTIONS] <COMMAND>
Commands:
run Run a query as a local pipeline cluster
api Starts an Arroyo API server
controller Starts an Arroyo Controller
cluster Starts a complete Arroyo cluster
worker Starts an Arroyo worker
compiler Starts an Arroyo compiler server
node Starts an Arroyo node server
migrate Runs database migrations on the configured Postgres database
help Print this message or the help of the given subcommand(s)
Options:
-c, --config <CONFIG> Path to an Arroyo config file, in TOML or YAML
format
--config-dir <CONFIG_DIR> Directory in which to look for configuration files
-h, --help Print help
-V, --version Print version
A local cluster can be started with
$ arroyo cluster
2024-07-01T22:58:29.316336Z INFO arroyo_server_common: Starting cluster admin server on 0.0.0.0:8119
2024-07-01T22:58:29.339237Z INFO arroyo_api: Starting API server on 0.0.0.0:5115
2024-07-01T22:58:29.342200Z INFO arroyo_controller: Using process scheduler
2024-07-01T22:58:29.348490Z INFO arroyo_controller: Starting arroyo-controller on 0.0.0.0:5116
2024-07-01T22:58:29.364186Z INFO arroyo_compiler_service: Starting compiler service at 0.0.0.0:5117
Then, open the Web UI in your browser at http://localhost:5115.
With Docker
Arroyo can also run in Docker. Note that by default, a docker cluster will not persist the set of pipelines and tables.
$ docker run -p 5115:5115 ghcr.io/arroyosystems/arroyo:latest
If you get an error like
Error starting userland proxy: listen tcp4 0.0.0.0:5115:
bind: address already in use.
Then you have another service running on that port. Either stop that service, or rebind to a different port with
-p 5215:5115
for example.
Then, open the Web UI at http://localhost:5115.
Running a single pipeline
In addition to the multi-tenant session cluster mode, Arroyo can also be configured to run a single pipeline
via the CLI as a pipeline cluster via the arroyo run
subcommand:
$ arroyo run --help
Run a query as a local pipeline cluster
Usage: arroyo run [OPTIONS] [QUERY]
Arguments:
[QUERY] The query to run [default: -]
Options:
-n, --name <NAME> Name for this pipeline
-s, --state-dir <STATE_DIR> Directory or URL where checkpoints and metadata
will be written and restored from
-p, --parallelism <PARALLELISM> Number of parallel subtasks to run [default: 1]
-f, --force Force the pipeline to start even if the state
file does not match the query
-h, --help Print help
By default, arroyo run
will read a SQL query from STDIN, or the query can be provided
as an argument.
See the pipeline cluster docs for more details.
Next steps
Now you’re ready to create your first pipeline! Continue on to the tutorial to learn how.