Column-level lineage is coming to the rescue
06-20, 16:00–16:40 (Europe/Berlin), Kesselhaus

How are the columns containing sensitive data used across the data ecosystem? What input columns were used to produce a given report field? Openlineage can answers those questions automatically.


OpenLineage is a standard for metadata and lineage collection that is growing rapidly. Column-level lineage is one of its most anticipated features of the community that has been developed recently. In this talk, we:
* show foundations for column lineage within OpenLineage standard,
* provide real-life demo on how is it automatically extracted from Spark jobs,
* describe and demo column lineage extraction from SQL queries,
* show how the lineage can be consumed on Marquez backend.

We aim to provide demos to focus on practical aspects of the column-level lineage which are interesting to data practitioners all over the world.

Pawel (@pawel-big-lebowski on github) is OpenLineage contributor. As a data practitioner with decade long experience, he focuses on converting data processing logs and metrics into meaningful observability insights.

Maciej is a software engineer at GetInData and OpenLineage commiter. He loves contributing to open source projects and playing with cats.