Berlin Buzzwords 2026

Why Choose One: Multi-Engine Analytics with Apache Wayang
2026-06-08 , Palais Atelier

Choosing the best engine for each data task sounds right, but in modern data stacks doing so requires expertise and effort. Apache Wayang, a recently graduated TLP, addresses this by decoupling logical dataflows from execution engines. From big data platforms to SQL and ML engines, Wayang enables cross-platform execution that maximizes performance.


Modern analytics pipelines frequently span databases, big data engines, and machine learning frameworks. Connecting these systems manually leads to complex orchestration, high data movement cost, and platform-specific rewrites. This challenge also appears in agent driven workflows where different steps of a task naturally map to different engines.

Apache Wayang is a recently graduated Apache Top Level Project that provides a unified data analytics framework for cross-platform execution. Pipelines are expressed with platform independent operators using Java, Scala, Python, or SQL APIs. A cross-platform optimizer then maps operators to execution backends such as Spark, Flink, JDBC databases, and ML systems, and produces execution plans that may span multiple engines. It models operator and data movement cost and supports runtime re optimization when estimates are wrong. In practice, this lets developers write a pipeline once and run it efficiently across multiple engines without hard-wiring platform choices.

The talk is technical and system focused, aimed at practitioners working with heterogeneous data stacks. It has three parts:

  1. Motivation (10-15min)
    Why single engine execution is often not enough. Concrete ETL, ML, and agent-based workflows that require multiple systems and create optimization and integration challenges.

  2. System architecture and optimizer (20-25min)
    Wayang’s platform agnostic plans, operator mappings, cross-platform data movement handling, and stage-based execution model. How the cost-based optimizer inflates plans, evaluates alternatives, and selects mixed engine execution strategies. Brief coverage of SQL, ML, and multi-language UDF support.

  3. Project history, status, and next steps (5-10min)
    From multi year cross-platform analytics research to Apache and recent Top Level Project graduation. Extensibility for new platforms and current work on improved cost models and optimizer enhancements.

Attendees will gain a practical understanding of how cross-platform analytics can be executed efficiently and how to design pipelines that are not locked to a single processing engine.


Level: Intermediate

Zoi Kaoudi is an Associate Professor in the Computer Science Department at the IT University of Copenhagen (ITU) and the VP/PMC Chair of Apache Wayang. Her current research focus is on (i) leveraging machine learning techniques for data-intensive systems, (ii) improving the performance and ease of use of machine learning systems, and (iii) advance knowledge graph embeddings with ontologies and logical reasoning. Before joining ITU, she has held positions in various places around the world. She has worked as a Senior Researcher at the Technical University of Berlin, as a Scientist at the Qatar Computing Research Institute (QCRI), as a visiting researcher at IMIS-Athena Research Center, and as a postdoctoral researcher at Inria Saclay. She received her Ph.D. from the National and Kapodistrian University of Athens in 2011. She has co-authored articles in both database and ML communities and served as a member of the Program Committee for several international database conferences. She has received the best demonstration award at ICDE 2022 for her work on training data generation for learning-based query optimization.

Haralampos (Harry) Gavriilidis is a data systems researcher and incoming postdoctoral researcher at ICSI / UC Berkeley, focusing on cross-platform and federated query processing, optimizer design, and execution across heterogeneous engines. He recently earned his PhD from TU Berlin, where he built systems for decentralized federated query processing and adaptive cross-system data transfer. His work has appeared at leading data management conferences such as SIGMOD, VLDB, and ICDE.

Beyond papers and prototypes, Harry enjoys explaining complex data systems to real audiences. He has spoken at practitioner conferences such as PGConf and also won a science slam competition for making database research stage-friendly. He is also an active community volunteer, supporting events like Berlin Buzzwords, FOSS Backstage, and Flink Forward. Before academia, he worked as a full stack web developer.