Arroyo is a cloud-native stream processing engine that allows users to run SQL queries against streaming data with ease. Designed for real-time data transformation, filtering, aggregation, and joining, Arroyo delivers sub-second query results and scales from zero to millions of events per second without the need for an operations team. The tool can be installed as a single, compact binary and is deployable in various environments including Docker and Kubernetes.
Arroyo's architecture is optimized for modern cloud environments, ensuring reliability and performance even in elastic cloud setups. It can handle workloads of all sizes, from small-scale applications requiring minimal resources to large-scale deployments processing tens of millions of events per second. Built with Rust and leveraging the Arrow in-memory analytics format, Arroyo offers performance that surpasses similar systems by up to 5x.
The platform supports a wide range of SQL functions and features, including over 300 window, aggregate, and scalar functions. It provides exactly-once processing semantics, ensuring no duplicated or dropped events. Users can manage their streaming pipelines through a powerful web UI or a REST API, and integrate seamlessly with various data sources and sinks such as Kafka, Redis, and MySQL.
Arroyo is designed to be user-friendly for data scientists and engineers, enabling them to build real-time applications, models, and dashboards without requiring a dedicated team of streaming experts. Its capabilities include processing data using sliding, tumbling, and session windows, supporting various join types, and extending SQL with user-defined functions written in Rust, with Python support forthcoming.
The tool also features comprehensive documentation and a supportive community, making it accessible for both development and production environments. Arroyo's recent updates include enhancements like pipeline clusters for lightweight job execution and improved UI for pipeline management.
Pricing
Arroyo Enterprise offers a comprehensive solution for building real-time pipelines without the need for a dedicated streaming engineering team. Key features include role-based access control, integration with enterprise identity providers for single sign-on, secret management, a highly-available control plane, pipeline autoscaling, and expert support from Arroyo's creators. Additionally, Arroyo is developing a hosted solution, Arroyo Cloud, for cost-efficient streaming applications. The platform is optimized for SQL, designed for cloud environments, and supports both small and large-scale deployments with operational simplicity.