Arroyo ships as a single, self-contained binary for Linux and MacoS or as a docker container. It can be run in two
modes: as a persistent multi-job session cluster, or a single-job pipeline cluster. When run locally or in Docker,
the default sqlite database is used to store configuration data.
The easiest way to try out Arroyo is to run it locally. Currently Linux and MacOS are well supported.For MacOS, we provide a Homebrew tap that can be used to install Arroyo:
Copy
brew install arroyosystems/tap/arroyo
For MacOS and Linux, you can also install the binary with the following script:
Copy
curl -LsSf https://arroyo.dev/install.sh | sh
Alternatively, you can download the binary for your OS and architecture from the
releases page.Once you’ve installed Arroyo, you can run it with the arroyo command:
Copy
$ arroyo --helpUsage: arroyo [OPTIONS] <COMMAND>Commands: run Run a query as a local pipeline cluster api Starts an Arroyo API server controller Starts an Arroyo Controller cluster Starts a complete Arroyo cluster worker Starts an Arroyo worker compiler Starts an Arroyo compiler server node Starts an Arroyo node server migrate Runs database migrations on the configured Postgres database help Print this message or the help of the given subcommand(s)Options: -c, --config <CONFIG> Path to an Arroyo config file, in TOML or YAML format --config-dir <CONFIG_DIR> Directory in which to look for configuration files -h, --help Print help -V, --version Print version
A local cluster can be started with
Copy
$ arroyo cluster2024-07-01T22:58:29.316336Z INFO arroyo_server_common: Starting cluster admin server on 0.0.0.0:81192024-07-01T22:58:29.339237Z INFO arroyo_api: Starting API server on 0.0.0.0:51152024-07-01T22:58:29.342200Z INFO arroyo_controller: Using process scheduler2024-07-01T22:58:29.348490Z INFO arroyo_controller: Starting arroyo-controller on 0.0.0.0:51162024-07-01T22:58:29.364186Z INFO arroyo_compiler_service: Starting compiler service at 0.0.0.0:5117
In addition to the multi-tenant session cluster mode, Arroyo can also be configured to run a single pipeline
via the CLI as a pipeline cluster via the arroyo run subcommand:
Copy
$ arroyo run --helpRun a query as a local pipeline clusterUsage: arroyo run [OPTIONS] [QUERY]Arguments: [QUERY] The query to run [default: -]Options: -n, --name <NAME> Name for this pipeline -s, --state-dir <STATE_DIR> Directory or URL where checkpoints and metadata will be written and restored from -p, --parallelism <PARALLELISM> Number of parallel subtasks to run [default: 1] -f, --force Force the pipeline to start even if the state file does not match the query -h, --help Print help
By default, arroyo run will read a SQL query from STDIN, or the query can be provided
as an argument.See the pipeline cluster docs for more details.