Berlin Buzzwords 2026

Floe: Policy-Based Table Maintenance for Apache Iceberg
2026-06-08 , Palais Atelier

Iceberg maintenance procedures work. Orchestrating them across hundreds of tables is the problem. Floe is an open-source system that treats maintenance as policy: glob patterns, schedules, and health-driven triggers that gate operations on real table metrics. Supports 7 catalogs, executes via Spark or Trino.


Every Iceberg table needs maintenance, but catalogs don't execute and engines don't orchestrate. Teams end up with scripts that become DAGs that become technical debt. Nobody knows which tables are healthy, which are overdue, or what ran last.

Floe is an open-source, policy-based maintenance system for Iceberg. Define rules with glob patterns, schedules, and health-driven triggers that gate operations based on real table metrics: small file percentage, snapshot count, delete file ratio, and partition skew. Priority resolves conflicts when patterns overlap. A maintenance debt score ranks tables by urgency so the most critical work runs first within your resource budget.

Floe connects to REST, Polaris, Lakekeeper, Gravitino, DataHub, Hive Metastore, and Nessie catalogs, then delegates execution to Spark or Trino. A built-in dashboard shows table health trends, operation history, and policy coverage.

This talk covers the policy model, health-driven maintenance planning, and a live demo.


Level: Intermediate

Neelesh Salian builds data platforms. He has led lakehouse and distributed systems work at Datavant, Stitch Fix, dbt Labs, Salesforce, and Cloudera, with a focus on Spark, streaming, and Apache Iceberg in production. He created Floe to solve a problem he kept encountering: orchestrating Iceberg table maintenance at scale.