Joining Dozens of Data Streams in Distributed Stream Processing Systems
2023-06-19 , Palais Atelier

This talk will explore the techniques and best practices for joining dozens of data streams, focusing on different joining mechanisms, such as binary joins and delta joins, as well as pros and cons.


As real-time data processing becomes increasingly essential, organizations face the challenge of efficiently joining and correlating data from multiple streams to gain valuable insights using distributed stream processing systems. This talk will explore the techniques and best practices for joining dozens of data streams, focusing on different joining mechanisms, such as binary joins and delta joins, as well as their pros and cons. Attendees will gain an understanding of various stream join techniques, learn how to optimize performance in distributed environments and apply lessons from industry experiences. Furthermore, the talk will discuss leveraging decoupled compute-storage architecture to reduce join costs. This knowledge will enable participants to harness the full potential of their data, creating efficient and powerful distributed stream processing solutions for their organizations.

See also: Slides (2.3 MB)

Yingjun Wu is the founder of RisingWave Labs, the company developing RisingWave, a distributed SQL database for stream processing. Before running the company, Yingjun was a software engineer at the Redshift team, Amazon Web Services, and a researcher at the Database group, IBM Almaden Research Center. Yingjun received his PhD degree from National University of Singapore, and was a visiting PhD at Carnegie Mellon University. He has been working in the field of stream processing and database systems for over a decade.