2024-06-10 –, Kesselhaus
Flink's SQL engine is the workhorse behind many event processing platforms. We take a deep look into the internals and take the stack apart! Optimizer phases, streaming primitives, watermarks, CDC, and upsert keys. Let me give you a feeling for the power of a simple streaming SQL query.
Apache Flink aims to make stream processing easy and accessible for everyone. It should come as no surprise that a high level of abstraction puts more load on the core. Flink's SQL engine is the workhorse behind many on-prem and managed SQL platforms. Yet very few users know what is really going on under the hood when submitting a SQL query.
In this talk, we take a deep look into the internals of Flink SQL. Let's take the stack apart! We start with some SQL text and go all the way down to Flink's streaming primitives. I will go through the individual optimizer phases. You will learn how event-time operations are tracked when declaring a watermark, how state is managed when using different kinds of joins, and how changelog modes and upsert keys travel through topology when reading from a Change Data Capture connector.
After this talk, you may not be able to write an optimizer rule, but you should, at least, get a feeling for the power of a simple streaming SQL query.
Timo Walther is a Principal Software Engineer at Confluent and a long-time member of Apache Flink’s management committee. He studied Computer Science at TU Berlin and was part of the Database Group there - the origins of Apache Flink. He worked as a software engineer at DataArtisans and led the SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.