Kaldb: serverless lucene at petabyte scale
06-19, 11:10–11:50 (Europe/Berlin), Kesselhaus

In this talk, we share our experiences, best practices, and lessons learned in designing and operating a serverless Lucene serving system at PB scale.


Running petabyte-scale columnar stores has become a routine operation in today's data-driven world. However, running a petabyte-scale search system is still a challenging task operationally. Enter Kaldb, an open-source, serverless Lucene serving system designed specifically for petabyte-scale Lucene workloads. We've designed Kaldb to automate and reduce operational toil without sacrificing performance or reliability.

But designing a serverless Lucene system at this scale poses several unique challenges, such as ensuring durability of data, modifying replication and caching protocols for high availability, high fanout reads, managing ephemeral nodes, and more.

In this talk, we'll delve into the details of how our redesigned Kaldb system overcomes these challenges. We've separated durability of the data from storage, separated compute from storage, modified replication algorithms to handle ephemeral nodes, use Kafka as a write ahead log and developed a novel query execution layer to handle high-fanout queries. Our implementation not only reduces operational toil but also adds several self-healing properties to the system. We're proud to say that Kaldb currently runs on Kubernetes at petabyte scale with improved reliability and performance.

Join us in this talk to learn more about how Kaldb can help you overcome the challenges of running a petabyte-scale Lucene serving system. We'll share our experiences, best practices, and lessons learned in designing and operating a serverless Lucene serving system at this scale, and provide practical insights and techniques that you can use to optimize your own search systems.

Suman Karumuri is a Principal Software Engineer and the tech lead for Observability at Airbnb. As an expert in distributed tracing, Suman has been a tech lead of Zipkin and a co-author of the OpenTracing standard, a Linux Foundation project under the CNCF. With extensive experience, Suman has spent years building and operating petabyte-scale log search, distributed tracing, and metrics systems at notable companies like Slack, Pinterest, Twitter, and Amazon. In his leisure time, Suman enjoys engaging in board games, exploring the outdoors through hiking, and spending quality time with his children.