Berlin Buzzwords 2026

Apache Spark Declarative Pipelines in Action
2026-06-08 , Palais Atelier

Learn Spark 4.1's brand-new Declarative Pipelines, a paradigm shift replacing imperative code with simple declarations. We'll build a real-time data pipeline together, processing streaming ADS-B flight data from tens of thousands of aircraft overhead.


Spark Declarative Pipelines: Building Data Workflows with Spark 4.1's Game-Changing Feature

Apache Spark 4.1 introduces Spark Declarative Pipelines (SDP), a paradigm shift that transforms how data engineers design and maintain complex data workflows. This hands-on session provides a comprehensive introduction to SDP, demonstrating how declarative configuration can replace traditional imperative Spark code for common data pipeline patterns.

I will present a live example using an open-sourced PySpark data source I built with OpenSky founders from Oxford and ETH Zurich. In just a few lines of code, you'll create a continuous data pipeline with streaming tables ingesting real ADS-B flight data from aircraft overhead—from tiny Cessnas to massive Airbus A380s. No complex "glue code" for incremental ingestion—just define what your pipeline should do while Spark figures out how to do it.

Using streaming tables and materialized views, we'll layer on AI-powered analytics, turning natural language questions like "Show me flights above 30,000 feet over California" into instant SQL queries against live crowdsourced IoT data. I'll demonstrate with a forever-free cloud environment where every attendee can replicate this example hands-on. Attendees will leave with practical knowledge to immediately begin experimenting with SDP and best practices for modernizing their pipeline development.


Tags (legacy): Data Science, Stream, Operations Level: Intermediate

I bring DevEx into products, tech into marketing, and storytelling into demos at Databricks. I presented at the top tier 1 conferences on every continent except Antarctica and built and delivered hands-on workshops for some ten thousand customers per year.

I leverage AI tools to create compelling technical content, from voice-activated data queries using Databricks Genie to AI-generated demo content with synthetic speech, enhancing developer-focused marketing campaigns.

I'm a published author with a Ph.D. in Computer Science (summa cum laude from TU Munich) with over 25 years of expertise in data & AI, cloud computing, and scientific research. Cloud Technologist of the Year (Oracle) and Developer Champion.

At AWS, I kickstarted developer relations in Central EMEA and tripled the size of the team. Presented at Devoxx, JavaOne, re:Invent, KubeCon, Oracle World and Data + AI Summit.

I believe it’s the combination of compelling storytelling and deep technical understanding that allows me to simplify complex concepts and create tech demos that truly resonate with audiences.