Berlin Buzzwords 2024

S3 as the state store for stream processing systems
2024-06-10 , Maschinenhaus

S3 is cheap; but S3 is slow. Stream procesisng systems want to maintain internal states at low cost, but cannot tolerate high latency. How could we build a stream processing system on top of S3? I'll tell you what we learned over the last three years.


S3 is cheap; but S3 is slow. Stream procesisng systems want to maintain internal states at low cost, but cannot tolerate high latency. Could we build a stream processing system on top of S3?

We started building RisingWave, a stream processing system, from scratch three years ago. It uses S3 as the state store and implements a storage and compute separation architecture to reduce performance costs. While the idea looks appealing, implementing it and supporting production workloads is full of challenges. In this talk, I will share with you lessons we've learned from building a production-ready system that can achieve: 1) Efficient execution; 2) Fast failure recovery; 3) Dynamic scaling, at a reasonably low cost.

Yingjun Wu is the founder of RisingWave Labs (https://www.risingwave.com/), a database company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.