Berlin Buzzwords 2025

Melting Icebergs: Direct access to Kafka Data via Iceberg
2025-06-16 , Maschinenhaus

Data in organizations is traditionally split between operational and analytical estates. Join us for an account of our journey combining Apache Kafka and Apache Iceberg to create a solution that addresses both estates with one data source.


An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be?

In the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?

Yes you can and we did!

This isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.

In this talk, we'll cover:

  • How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL!
  • Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more.
  • Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka.

Expect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles:

  • Kafka as the single source of truth—no separate stores.
  • Analytical processors shouldn't need Kafka-specific adjustments.
  • Operational performance must remain uncompromised.
  • Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented.

Join us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!


Tags:

Stream, Store, Stories

Level:

Intermediate

Long time enthusiast of Kafka and all things data integration, Tom has more than 10yrs experience (5yrs+ Kafka) in innovative and efficient ways to store, query and move data. Currently working at Streambased, Tom is building multi tenant, on-prem and cloud Kafka services to attack common Kafka pain points and break down barriers to starting your data journey.

Roman is a Principal Software Engineer at Streambased. His experience includes designing and building business critical event streaming applications and distributed systems in the financial and technology sectors.