Deploying to Nomad
Note: Nomad support is deprecated as of 0.6
If you’re interested in running Arroyo on Nomad, please reach out to us at support@arroyo.systems or on our Discord.
Arroyo supports Nomad as both a scheduler (for running Arroyo pipeline tasks) and as as a deploy target for the Arroyo control plane.
Before starting this guide, follow the common setup steps in the deployment overview guide.
This guide assumes a working Nomad cluster. It has been tested with Nomad >= 1.4, but should work with 1.3 as well. See the Nomad documentation for more information.
Note that all of the components of Arroyo (controller, compiler, and workers) need to be able to access S3. You will need to ensure that the Nomad cluster has access to the S3 bucket you will be using.
Install nomad pack
For ease of installation, we distribute a nomad pack that can be used to install Arroyo on Nomad. To use the pack, you will first need to install nomad-pack. Follow the documentation here.
Once nomad-pack
is available on your machine you are ready to proceed.
Add the Arroyo registry
The Arroyo pack is available in the Arroyo registry
To add the registry, run the following command:
$ nomad-pack registry add arroyo \
https://github.com/ArroyoSystems/arroyo-nomad-pack.git
Configuring the pack
There are a number of variables that can be configured to customize the Arroyo deployment:
Variable | Description |
---|---|
job_name | The name of Nomad job for the Arroyo cluster |
region | The region where jobs will be deployed |
datacenters | A list of datacenters in the region which are eligible for task placement |
prometheus_endpoint | Endpoint for prometheus with protocol, required for job metrics (for example http://prometheus.service:9090 ) |
prometheus_auth | Basic authentication for prometheus if required |
postgres_host | Host of your postgres database |
postgres_port | Port of your postgres database |
postgres_db | Name of your postgres database |
postgres_user | User of your postgres database |
postgres_password | Password of your postgres database |
s3_bucket | S3 bucket to store checkpoints and pipeline artifacts |
s3_region | Region for the s3 bucket |
nomad_api | Nomad API endpoint with protocol (for example http://nomad.service:4646 ) |
compiler_resources | Controls the CPU and memory to use for the compiler; at least 2 GB of memory is required |
controller_resources | The resources for the controller and API |
Of these, at least the postgres configuration and the s3 bucket configuration are required.
Deploying the Arroyo pack
Now we’re ready to actually deploy our Arroyo cluster! Here’s an example command line:
$ nomad-pack run arroyo --registry=arroyo \
--var arroyo.postgres_db=arroyo \
--var arroyo.postgres_host=postgres-host.cluster \
--var arroyo.postgres_user=arroyodb \
--var arroyo.postgres_password=arroyodb \
--var arroyo.datacenters='["us-east-1"]' \
--var arroyo.s3_bucket=arroyo-prod \
--var arroyo.prometheus_endpoint="http://prometheus.cluster:9090"
You will need to adjust the variables as appropriate for your environment.
Accesing the Arroyo API
Once the pack has been deployed, you can access the Arroyo UI by visiting the address of the api-http
service. By
default, this has a dynamic port.
To find the endpoint and port, run the following command:
$ nomad service info api-http
Visit the address in your browser to access the Arroyo UI.
Having trouble?
Reach out to us at support@arroyo.systems or on our Discord if you have any questions or issues.