Introduction

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processors, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

In short: Arroyo lets you ask complex questions of high-volume real-time data with sub-second results.

Arroyo can be self-hosted, or used via the Arroyo Cloud service managed by Arroyo Systems.

Features

Pipelines defined in SQL, with support for complex analytical queries
Scales up to millions of events per second
Stateful operations like windows and joins
State checkpointing for fault-tolerance and pipeline recovery
Event time processing with watermark support

Use cases

Some example use cases include:

Detecting fraud and security incidents
Real-time product and business analytics
Real-time ingestion into your data warehouse or data lake
Real-time ML feature generation

Why Arroyo

There are already a number of existing streaming engines out there, including Apache Flink, Spark Streaming, and Kafka Streams. Why create a new one?

Serverless operations: Arroyo pipelines are designed to run in modern cloud environments, supporting seamless scaling, recovery, and rescheduling
High performance SQL: SQL is a first-class concern, with consistently excellent performance
Designed for everyone: Arroyo cleanly separates the pipeline APIs from its internal implementation. You don’t need to be a streaming expert to build real-time data pipelines.

Getting Started

Arroyo ships as a single binary, which can be easily installed locally or run in a container.

brew install arroyosystems/tap/arroyo

See the getting started guide to get up and running with a local Arroyo deployment.

In production

Arroyo supports several deployment targets for production use, including native support for Kubernetes. See the deployment docs for more information.

License

Arroyo is fully open-source under the Apache 2.0 license.

Support

Commerical support

Commercial support is offered by Arroyo Systems, the creators of Arroyo. Reach out to support@arroyo.systems to get in touch.

Community support

Community support is offered via the Arroyo Discord where the Arroyo development team and community are actively engaged in helping users get started and solve their probelms with Arroyo.

Telemetry

By default, Arroyo collects limited and anonymous usage data to help us understand how the system is being used and to help prioritize future development.

You can opt out of telemetry by setting DISABLE_TELEMETRY=true when running Arroyo services.

Home

Tutorial

Sources and Sinks

SQL Reference

User-Defined Functions

Deployment

Arroyo Development

Features

Use cases

Why Arroyo

Getting Started

In production

License

Support

Commerical support

Community support

Telemetry

Home

Tutorial

Sources and Sinks

SQL Reference

User-Defined Functions

Deployment

Arroyo Development

​Features

​Use cases

​Why Arroyo

​Getting Started

​In production

​License

​Support

​Commerical support

​Community support

​Telemetry

Features

Use cases

Why Arroyo

Getting Started

In production

License

Support

Commerical support

Community support

Telemetry