Berlin Buzzwords 2025

Navigating Data Lineage in Real-Time Architectures
2025-06-17 , Maschinenhaus

As real-time data processing becomes increasingly critical across industries, the need for reliable, transparent, and actionable data lineage strategies has never been greater. Understanding where your data comes from, how it flows through your systems, and how it is transformed along the way is essential for maintaining trust, ensuring compliance.


In this session, we’ll dive into effective data lineage strategies for Real-Time systems. Data lineage is no longer just about compliance and auditing—it’s now a core capability that empowers teams to debug, optimize, and govern their real-time data pipelines more effectively. As systems become more distributed and event-driven, traditional static lineage approaches fail to capture the dynamic, evolving nature of streaming data.

Key takeaways from this talk will include:

  • Understanding the challenges and limitations of data lineage in real-time, distributed systems.
  • The similarities and differences between OpenTelemetry and OpenLineage
  • How to implement modern data lineage strategies that provide real-time visibility and traceability into streaming data flows.
  • Leveraging automated tools to map complex data transformations and improve data quality.
  • Ensuring that your data lineage strategy aligns with compliance, governance, and performance goals.

Whether you are building real-time applications or managing large-scale data pipelines, this session will equip you with the insights and tools to track and trace data across your systems with ease and confidence, allowing you to unlock the full potential of real-time analytics.


Tags:

Stream, Scale

Level:

Intermediate

Co-founder and CTO of drift.dev, a seasoned leader and speaker, passionate about making advanced stream processing accessible to developers and data professionals.
With a deep background in distributed systems and data engineering, Omri has been instrumental in building scalable, high-performance data platforms that empower teams to harness the power of real-time data.