Berlin Buzzwords 2025

Siphon : Modern Data Stack with SF-CH & Iceberg
2025-06-17 , Maschinenhaus

Tired of waiting for batch jobs? See how we transformed our data pipeline using Apache Iceberg to stream quality data into Snowflake and Clickhouse simultaneously. Learn about our battle-tested architecture, performance gains, and how we maintain data consistency across dual analytics engines


Ever wondered how to stream data reliably to multiple warehouses without compromising data quality? We'll show you how Siphon uses Apache Iceberg's time travel and ACID properties to ensure data consistency across Snowflake and Clickhouse. Dive into our journey from batch to streaming - covering architecture evolution, data quality frameworks, and performance optimizations. We'll share our battle-tested patterns for handling schema evolution, managing data contracts, and implementing quality gates. Learn how we achieved sub-minute latency while preventing bad data from corrupting our warehouses. Perfect for data engineers and architects looking to modernize their data infrastructure with real-world proven solutions.


Tags:

Data Science, Stream, Scale, Operations

Level:

Intermediate

A Staff Data Engineer with over 15 years of experience in building enterprise data products. Currently pioneering the development of Siphon, a real-time data streaming product that enables reliable data delivery across Snowflake and Clickhouse using Apache Iceberg. Specializes in transforming traditional data pipelines into scalable data products with emphasis on reliability, observability, and user experience.
Their product engineering journey includes developing self-service data platforms, automated data quality frameworks, and real-time analytics solutions using Snowplow, Monte Carlo, and cloud-native technologies. They've successfully led the productization of data infrastructure across GCP and AWS, implementing infrastructure-as-code practices with Terraform and continuous delivery pipelines.
Passionate about building data products that deliver immediate business value, they focus on creating intuitive, reliable data solutions that empower organizations to make data-driven decisions with confidence. Their product-first approach combines technical expertise with user-centric design to deliver data solutions that scale.